To Blog

AI Assurance: What happened when we audited a deepfake detection tool called FakeFinder

Jan. 04, 2022

IQT Labs recently audited an open-source deep learning tool called FakeFinder that predicts whether or not a video is a deepfake.

This post provides a high-level overview of our audit approach and findings. It is the first in a series and in future posts we will dig into the details of our AI Assurance audit, discussing our cybersecurity “red teaming,” ethics assessment, and bias testing of FakeFinder.

——

There is no such thing as perfect security, only varying levels of insecurity.

Salman Rushdie

If only Rushdie were wrong…

When it comes to software, one thing we know for sure is that some of the time, some of our tools are going to fail. And — as if cybersecurity wasn’t enough of a challenge already — introducing Artificial Intelligence (AI) and Machine Learning (ML) into software tools creates additional vulnerabilities.

Auditing can help organizations identify risks before AI tools are deployed. We’ll never be able to prevent all future failures and incidents, but the right tools and techniques can help us anticipate, mitigate, and prepare for certain types of risk.

Between July and September 2021, we conducted an internal audit of FakeFinder, a deepfake detection tool that some of our IQT Labs colleagues developed earlier this year. In this post, we explain our approach and summarize three primary findings: (1) FakeFinder is actually a “face swap” detector; (2) FakeFinder’s results may be biased with respect to protected classes; and (3) FakeFinder is a software prototype, not a production ready tool. Over the next few weeks, we will share additional findings and recommendations in a series of more detailed posts.

Managing risk is tricky business. This was our first AI audit and we definitely do not have all the answers. But we do have a series of tactics that others can borrow and — we hope! — improve. One thing we do know is that to do this well we need input from multiple stakeholders with diverse perspectives to help us see past our own blind spots. So, if you have suggestions or are interested in collaborating  on future projects, contact us at labsinfo@iqt.org.

AI failures can be intentional or unintentional

Earlier this year, IQT Labs worked with BNH.ai, a Washington, D.C.-based law firm specializing in AI liability and risk assessment, to collect and analyze 169 failures of AI tools that occurred between 1988 and 2021 and were covered in the public news media. (We recognize that AI failures covered by the media represent only a fraction of the failures that actually occurred. Nonetheless, this exercise helped us better understand common failure modes that occur when AI is deployed in real-world contexts.)

Adversarial attacks, ways of tricking AI/ML models into making erroneous predictions, are a popular area of academic research. These intentional modes of failure are a growing threat, but unfortunately, they are not the only cause for concern. 95% of the AI failures we analyzed were unintentional. Instead of malicious attacks by nefarious actors, these unintentional failures were the result of oversights and accidents, a lack of testing, poor design, unintended consequences, and good old-fashion human error — someone using a tool incorrectly or thinking that the results meant something different from what they actually meant. 

We saw AI/ML models fail because they weren’t properly validated, because their training data was biased in a way no one realized, or because a model discovered a relationship between irrelevant features in the data…which led people to act on erroneous predictions. We also saw situations where (it appeared) important privacy implications weren’t fully considered before a tool was deployed, or where a tool didn’t provide enough transparency into how a particular decision was made.

Again and again, we saw unintentional failures occur when someone overestimated the potential of a tool, ignored (or didn’t understand) its limitations, or didn’t fully think through the consequences of deploying the tool in its current state.

A very brief overview of FakeFinder

When you upload a video snippet, FakeFinder uses deep learning to predict whether that video is a deepfake or not, that is, whether the video has been modified or manipulated by a deep learning model. If you want to learn more about deepfakes, we recommend checking out this post. If you’d like to try FakeFinder, the code is available on GitHub for any use consistent with the terms of the Apache 2.0 license.

We chose FakeFinder as the target of our audit both because deepfake detection is a novel (and timely) use of AI/ML, and also, because the tool was developed by our colleagues and we want to make internal auditing (or “red teaming”) an integral part of IQT Labs’ tool development efforts.

We refer to FakeFinder as a tool, but it’s actually composed of 6 different “deepfake detector” models. These models were developed outside of IQT Labs and are also available open source. One model, Boken, was the top performer at the Deeper-Forensics Challenge. The others — Selimsef, \WM/, NTechLab, Eighteen years old, and The Medics — were the top 5 performers at the Facebook/Kaggle Deepfake Detection Challenge, which was launched in December 2019. These 5 models were trained on a dataset containing videos of paid actors (who consented to the use of their likeness for this purpose), which was curated and released by Meta (FKA Facebook) as part of the Deepfake Detection Challenge.

In addition to these underlying models, the FakeFinder tool includes several other components that were built by the IQT Labs team:

  • A front-end application — created using Plotly’s Dash framework — that aggregates predictions from the 6 detector models and displays them in a visual interface;
  • An API that enables programmatic access to the models’ output; and
  • A containerized back-end that helps users spin up compute resources on AWS.

Our audit approach, in a nutshell

We used the AI Ethics Framework for the Intelligence Community, issued by the Office of the Director of National Intelligence, to guide our audit. Published in June 2020, this document developed by the United States Intelligence Community poses a series of important questions aimed to promote “an enhanced understanding of goals between AI practitioners and managers while promoting the ethical use of AI.”

Given how extensive this document is, we knew that three months was not enough time for us to craft a rigorous response to each and every question. Instead, we decided to examine FakeFinder from four perspectives — Ethics, the User Experience (UX), Bias, and Cybersecurity — and focus on four sections of the AI Ethics Framework related to those perspectives: Purpose: Understanding Goals and Risks; Human Judgment & Accountability; Mitigating Undesired Bias & Ensuring Objectivity; and Testing your AI.

Each of these perspectives encouraged us to examine a different aspect of FakeFinder — we looked at the infrastructure and software implementation (Cybersecurity), the deepfake detection models and their training data (Bias), how the tool presents results through the user interface (UX), and how FakeFinder might be used as part of an analytical workflow (Ethics). This multi-dimensional approach helped us think broadly about potential risks and encouraged us to seek advice from stakeholders with diverse types of expertise: software engineering, data science, legal counsel, UX design, mis/disinformation, AI policy, and media ethics. In each case, we asked ourselves and our collaborators: Is what we’re getting out of this tool what we think we’re getting?

Below, we summarize three key findings.

FakeFinder is actually a “face swap” detector

Many of the video snippets in the Deepfake Detection Challenge dataset were manipulated using a powerful deepfake technique called “face swap.” This technique can be used to transpose the face of one person onto the motions or actions of another, like in this fake video of the former President Obama, which Jordan Peele created as a public service announcement about fake news.

In fact, we believe that all the videos in the training data labeled “fake” were manipulated with face swap, but this was not disclosed to competitors. To evaluate the submitted models, Meta used a test dataset that included other types of manipulated videos, including “cheapfakes” (fake videos that were created using techniques like altering the frame rate, which do not require algorithmic intervention). As a result, we suspect that the competition organizers wanted to test whether (or to what extent) the submitted models would generalize to detect other types of fake videos that weren’t present in the training data.

Since we don’t know the precise criteria by which Meta decided what constituted a “fake” in the test dataset, however, we can’t characterize what types of fake videos are likely (or unlikely) to be detected by FakeFinder’s models. All we know is that FakeFinder’s detector models were trained on a dataset where “fake” essentially meant “subjected to face swap.”

This, on its own, isn’t necessarily an issue. The problem is that nothing in FakeFinder’s documentation or User Interface makes clear to users that FakeFinder’s models were trained exclusively on examples of face swap. Without this critical piece of information, there is substantial risk that users will misunderstand the output.

FakeFinder’s results may be biased with respect to protected classes (i.e., race or gender)

FakeFinder’s models were trained on videos of human subjects’ faces. This means that protected group information, such as skin color or features associated with biological sex, is directly encoded into the training data. Unless explicitly addressed by the model developers, this could make model performance highly dependent on facial features that relate to protected class categories. If FakeFinder’s detector models were biased with respect to protected classes, this could lead to biased predictions, which could lead to discriminatory outcomes.

In April 2021, several researchers at Meta released a paper called Towards measuring fairness in AI: the Casual Conversations dataset, in which they wrote that an evaluation of the top five winners of the DeepFake Detection Challenge revealed “that the winning models are less performant on some specific groups of people, such as subjects with darker skin tones and thus may not generalize to all people”.

During our audit we conducted our own bias testing (which we will describe in detail in a future post) and some of the results were concerning. We saw little indication of bias when the models were correct, but when they were wrong, they failed unevenly across the race and gender categories we tested. For example, with one detector model, we found that East Asian faces experienced 644% of the false positive rate that White faces experienced.

FakeFinder’s multiple-model design might help to mitigate the biases exhibited by any individual model. However, we strongly recommend that the biases we detected are remediated before FakeFinder is used to inform decision-making in a production context. 

FakeFinder is a software prototype, not a production ready tool

FakeFinder is not an enterprise-ready software product. It is a prototype designed to demonstrate the art of the possible in deepfake detection. In and of itself, this is not a problem. In fact, open-source prototyping is a common (and essential!) way to develop and test the limits of emerging capabilities. However, if users were to overestimate the maturity of FakeFinder, this could create significant risks.

When we examined FakeFinder from a cybersecurity perspective, we found several vulnerabilities that could be remedied through established software development practices, but that may not be standard practice for data science prototyping efforts. We’ll detail our approach and findings in our next blog post.

We have also summarized a few recommendations here:

  • FakeFinder requires 8 EC2 instances, an EFS file system, an S3 bucket, and an ECR repo to run. This complexity not only represents a large attack surface, but it also makes the tool difficult to set up and maintain. We recommend automating tool setup to prevent misconfiguration issues.
  • We discovered a significant exploit through the Werkzeug debug console, which enabled us to gain access to FakeFinder’s underlying detector models and their weights. To protect against this attack — and others — we recommend (1) ensuring that code is committed with Debug flags set to “false” and (2) that file systems and volumes are mounted as read-only, when possible.
  • We recommend using https certs for internal comms to prevent a MITM (man-in-the-middle) attack.
  • We also discovered a critical bug in the API component that allows for full RCE by an unauthenticated user. This exploit can be expanded to a full system takeover and exposure of potentially sensitive data. We recommend disclosing this vulnerability to impacted users and refactoring the impacted functions to remediate the vulnerability.

Stay tuned for our next blog in this series!

IQT Blog

Insights & Thought Leadership from IQT

Read More