Select Page

In-Depth Interview – Sam Gregory

In-Depth Interview – Sam Gregory

Sam Gregory is Program Director of WITNESS, an organisation that works with people who use video to document human rights issues. WITNESS focuses on how people create trustworthy information that can expose abuses and address injustices. How is that connected to deepfakes?

Sam Gregory is Program Director of WITNESS. We talked to Sam about the development and challenges of new ways to create mis- and disinformation, specifically those making use of artificial intelligence. We discussed the impact of shallow- and deepfakes, and what the essential questions are with development of tools for detection of such synthetic media.

The following has been edited and condensed. 

Sam, what is your definition of a deepfake?

I use a broad definition of a deepfake. I use the phrase synthetic media to describe the whole range of ways in which you can manipulate audio or video with artificial intelligence. 

We look at threats and in our search for solutions we look at how you can change audio, how you can change faces and how you can change scenes by for example removing objects or adding objects more seamlessly. 

What is the difference to shallowfakes? 

We use the phrase shallowfake in contrast to deepfake to describe what we have seen for the past decade at scale, which is people primarily miscontextualizing videos like claiming a video is from one place when it is actually from another place. Or claiming it is from one date when it is actually from another date. Also when people do deceptive edits of videos or do things you can do in a standard editing process, like slowing down a video, we call it a shallowfake.

The impact can be exactly the same but I think it’s helpful to understand that deepfakes can create these incredibly realistic versions of things that you haven’t been able to do with shallowfakes. For example, the ability to make someone look like they’re saying something or to make someone’s face appear to do or say something that they didn’t do. Or the really seamless and much easier ability to edit within a scene. All are characteristics of what we can do with synthetic media. 

We did a series of threat modeling and solution prioritization workshops globally. In Europe, US, Brazil, Sub-Sahara Africa, South and Southeast Asia people keep on saying, we have to view both types of fakes as a continuum and we have to be looking at solutions across it. And also we need to really think about the wording we use because it may not make that much difference to an ordinary person who is receiving a WhatsApp message whether it is a shallowfake or a deepfake. It matters, whether it’s true or false.

Where do you encounter synthetic media the most at the moment?

Indisputably the greatest range of malicious synthetic media is targeting women. We know that from the research that has been done by organizations like Sensity. We have to remember that synthetic media is a category in the non-malicious, but potentially malicious usages. There is an explosion of apps that enable very simple creation of deepfakes. We are seeing deepfakes starting to emerge on those parody lines, a kind of an appropriation of images. And, at what time does software become readily available to lots of people to do moderately good deepfakes that could be used in satire, which is a positive usage but can also be used in gender-based violence?  

Where is the highest impact of deepfakes at the moment?

It is on the individual level. In terms of impact on individual women and their ability to participate in the public sphere, related to the increasing patterns of online and offline harassment that journalists and public figures face. 

Four threat areas were identified in our meetings with journalists, civic activists, movement leaders and fact-checkers that they were really concerned about in each region. 

  1. The Liars dividend, which is the idea that you can claim something is false when it is actually true which forces people to prove that it is true. This happens particularly in places where there is no strong established media. The ability to just call out everything as false benefits the powerful, not the weak. 
  2. There is no media forensics capacity amongst most journalists and certainly no advanced media forensics capacity. 
  3. Targeting of journalists and civic leaders using gender-based violence, as well as other types of accusations of corruption or drunkenness.
  4. Emphasis on threats from domestic actors. In South Africa we learned that the government is using facial recognition, harassing movement leaders or activists. 

These threats have to be kept in mind with the development of tools for detection. Are they going to be available to a community media outlet in the favelas in Rio facing a whole range of misinformation? Are they going to be available to human rights groups in Cambodia who know the government is against them? We have to understand that they cannot trust a platform like Facebook to be their ally.

Can be synthetic media used as an opportunity as well?

I come from a creative background. At WITNESS the center of our work is the democratization of video, the ability to film and edit. Clearly these are potential areas that are being explored commercially to create video without requiring so much investment.

I think if we do not have conversations about how we are going to find structured ways to respond to malicious usages, I see positive usage of these technologies being outweighed by the malicious usage. And I think there is a little bit too much of a „it will all work itself out” approach being described by many of the people in this space.

We need to look closely at what we expect of the people who develop these technologies: Are they making sure that they include a watermark? That they have a provenance tree that can show the original? Are they thinking about consent from the start?

Although I enjoy playing with apps that use these types of tools, I don’t want to deny that I think 99% of the usage of these are malicious.

We have to recognize that the malicious part of this can be highly damaging to individuals and highly disruptive to the information ecosystem.

Should we use synthetic media in satire for media literacy? 

We have been running a series of webtalks called deepfakery . One of the main questions is, what are the boundaries around satire? Satire is an incredibly powerful weapon of the weak against the powerful. So for example, in the US we see the circulation of shallowfakes and memes made on sites that say very clearly on the top that this is satire. But of course no one ever sees that original site. They just see the content retweeted by President Trump in which case it looks like it is a real claim.

So satire is playing both ways. I do think the value of satire is to help people understand the existence of this and to push them to sort of responsibly question their reaction to video.

I think the key question in the media literacy discussion is: how do we get people to pause? Not to dismiss everything but to give them the tools to question things. Give them the tools to be able to pause emotionally before they share.

From a technology point of view, what are we still missing to detect synthetic media?

Synthesis of really good synthetic media is still hard. So synthesizing a really good faceswap, or a convincing scene is still hard. What is getting easier is the ability to use apps to create something that is impactful but perhaps not believable. I think sometimes people over assume how easy it is to create a deepfake.

We’re not actually surrounded by convincing deepfakes at this point. 

A lot of our work has been thinking about detection and authentication. How do you spot evidence of media manipulation which could be detection of a deepfake or detection of a shallowfake? How to spot that a video has been miscontextualized and there is an original or an earlier version that has different edits? Then authentication, how do we trace a video over time to see it’s manipulations. 

At the moment the detection of synthetic media is, and this is the nature of the technology, an arms race between the people who will develop the detection tool and those who will use it to test and enhance their new synthesis tool. The results of detection tools are getting better but they are not at the level that you could do it at scale.

The meta question for us on detection is actually who to make this accessible to. If it is only the BBC, Deutsche Welle, France 24 and New York Times, that leaves out 90% of the world as well as ordinary people who may be targeted by this in an incredibly damaging way.

Do all journalists need to be trained in using advanced forensic technology?

One of the things we have learned as we have been working on deepfakes is that we shouldn’t exclusively focus on media forensics. I think it is important to build the media forensic skills of journalists and it is a capacity gap for almost every journalist to do any kind of media forensics with existing content. I do not think we can expect that every journalist will have that skill set. We also need to consider how we invest in e.g. regional hubs of expertise.

The bigger backdrop is that we need to build a stronger set of OSINT skills in journalism. We need to be careful not to turn this purely into a technical question around media forensics at a deep level because it is a complicated and specialist skill set.

We identified a range of areas that need to be addressed to develop tools that plug into journalistic workflows. For example that journalists are not going to rely on tools easily. They do not need just a confidence number, they need software to explain why it is coming up with this result. So, I think we need a constant interchange between journalists and researchers and tools developers and the platforms to say what the tools are that we really need as this gets more pervasive. And we need tools that potentially provide information to consumers and community leader level activists to help them do the kind of rapid debunking and rapid challenging of the kind of digital wildfire of rumors that journalists frankly often do not get too. Often community leaders are talking about things that circulate very rapidly in a Favela or a Township and journalists never get to them in a timely way. So we need to focus on journalists, but also on community leaders.

What are your three tips for consumers to deal with synthetic media? 

  1. Pause before you share the content.
  2. Consider the intention of why people are trying to encourage you to share it.
  3. To take an emotional pause when consuming media trying to understand the context of it is supported by a range of tools like the SIFT methodology or the Sheep Acronym

I don’t think it is a good idea to encourage people to think that they can spot deepfakes.

The clearest and most consistent demand we heard primarily from journalists and fact checkers is to show them if this is a mis-contextualized video so that they can then just clearly say, no this video is from 2010 and not from 2020.

Therefore reverse video search or finding similar videos is pretty important because that shallowfake problem remains the most predominant.

Many thanks Sam! Here’s the ‘Ticks or it didn’t happen‘ report that Sam mentioned. If you are interested to learn more or have questions then please get into contact with us, either via commenting on this article or via our Twitter channel.

We hope you liked it! Happy Digging and keep an eye on our website for future updates!

 Don’t forget: be active and responsible in your community – and stay healthy!

Related Content

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

From Rocket-Science to Journalism

From Rocket-Science to Journalism

In the Digger project we aim to implement scientific audio forensic functionalities in journalistic tools to detect both shallow- and deepfakes. At the Truth and Trust Online Conference 2020 we explained how we are doing this.

Audio Synthesis, what’s next? – Parallel WaveGan

Audio Synthesis, what’s next? – Parallel WaveGan

The Parallel WaveGAN is a neural vocoder producing high quality audio faster than real-time. Are personalized vocoders possible in the near future with this speed of progress?

In our previous post of the “Audio Synthesis: What’s next?” series, we started talking about the latest advancement of audio synthesis. In this post we will introduce you to the Parallel WaveGAN network.

Not easy

You probably already guessed that speech-to-text is actually a very HARD task. Let us tell you something: it is so hard that it has been split into two separate problems. Some researchers focused on translating the input text into a time-frequency representation. The spectrograms below  for example, are generated by Tacotron-like networks. Other researchers focused on translating those pictures into proper audio files, sounding as natural as possible. Which is the general goal of “Neural Vocoders” as the Parallel WaveGAN itself. 

Time-frequency spectrograms generated by Tacotron-like networks.

Fast

The Parallel WaveGAN has been proposed by three researchers from the LINE (Japan) and NAVEL (South Korea) corporations, with the goal of improving the pre-existing neural vocoders. The researchers focused on one of the most demanding requirements of neural vocoders producing high quality audio in a “reasonable” amount of time. Their efforts were rewarded when they managed to achieve a system which could work faster than real-time and could be trained four times as fast as the competition.

Before & After

Before Parallel WaveGAN, to achieve this level of audio quality faster than real-time one had to spend at least 2 weeks training a neural vocoder on a very high-end GPU. After Parallel WaveGAN it is possible to create remarkable high quality audio files with only 3 days of training on the same high-end GPU as before. On top it is possible to produce content 28 times faster than real-time! With such speed of progress, sooner or later also consumer GPUs can be used to train such models. Which would mean that a new era of personalized vocoders is getting closer quickly.

If you are interested in what this sounds like, do not miss out on the audio examples produced by the Parallel WaveGAN, and have a look at the original paper:

    Happy Digging and keep an eye on our future “Audio Synthesis: What’s next?” posts!

     Don’t forget: be active and responsible in your community – and stay healthy!

    Related Content

    In-Depth Interview – Jane Lytvynenko

    In-Depth Interview – Jane Lytvynenko

    We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

    From Rocket-Science to Journalism

    From Rocket-Science to Journalism

    In the Digger project we aim to implement scientific audio forensic functionalities in journalistic tools to detect both shallow- and deepfakes. At the Truth and Trust Online Conference 2020 we explained how we are doing this.

    In-Depth Interview – Jane Lytvynenko

    In-Depth Interview – Jane Lytvynenko

    We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

    Jane Lytvynenko, is a senior reporter with Buzzfeed News, based in Canada. She is primarily focusing on online mis- and disinformation. You can check out her work here. We talked to Jane about how big the synthetic media problem actually is and Jane provides us with practical tips on how to detect deepfakes and how to handle disinformation.

    Jane, what is your definition of a deepfake?

    Oh, that is a tricky one. I mean it is very different from a video that was just slowed down or manipulated in a very basic way e.g. cut&paste of scenes.

    For me a deepfake is using computer technology to make it look like a person said something or took an action they did not say or do.

    Where do you mostly encounter deepfakes at the moment?

    We do not encounter deepfakes a lot on the day-to-day.  Because the technology is not widely accessible at the moment, we are much more worried about cheapfakes than we are about deepfakes. They spread much faster. We do find deepfakes mostly in satire. We have seen a lot of that come up in the last little while. We also see GAN (Generative Adversarial Networks) generated images being used for fake personas. Essentially faces generated by a computer that are being used to present a persona on social media that doesn’t actually exist in real life.

    Do you expect an increase of video synthetic media ahead during the US elections?

    No. The reason why I say no is because deepfakes are still fairly difficult to create for people who do not have a lot of tech knowledge. But cheapfakes you can make in iMovie. You can make things using very basic tools that are also convincing.

    There is of course always a fear in the back of my head of the ‘Big One’. What is going to be the big deepfake that fools everybody?

    Another fear that I have, is a deepfake inserted among legitimate videos. Using one small portion rather than the whole video being a deepfake.

    Where do deepfakes have their biggest impact?  

    Right now, we see deepfakes mostly used for harassment of women, in pornography in particular. They are primarily targeted at women, but sometimes also at men. Deepfake technology is also being used in movies in Hollywood. And like I mentioned previously for sort of high level production satire.

    But when it comes to the field of politics, we’ve seen a couple here and there but in North America we haven’t seen deepfakes that are so convincing that they uproot the political conversation.

    Should journalists be concerned?

    Do all journalists need to be trained in verification? Or is that a task for experts?

    I think it needs to be both. In order to send something to a researcher you need to first understand what it is you are looking for and why you are sending it to a researcher.

    So, we need to have basic training. We need reporters to understand what a deepfake looks like. What a GAN-generated image looks like. Get them in on the basics of verification of all types of content. If something extremely technical comes up, they can just send it to the researchers.

    For journalists it is very important to have that source. It is very important to have an expert opinion for second verification. That is part of the practice of journalism. But if you do not know to ask the right question, you are not going to be able to get that second opinion.

    What are the tools you still miss?

    There are a few things. The biggest problem is social media discovery, especially when it comes to video. If there is a video going viral it is fairly difficult to trace back where it came from. There are some tools that break video down into thumbnails. You can then reverse image search them and try to find your way back. But for me right now, there is no way to tell where the video originated. Part of that is a lack of cross-platform searches. For example, Instagram stories; it is one of the most popular Facebook products right now, but they disappear within 24 hours. If somebody downloads that Instagram story and cuts off the banner that says who posted it, they can upload it to Twitter or Facebook, and I will not know where the video came from. It doesn’t allow reporters to see the bigger picture.

    Right now, we do not necessarily have the tools to both look at video cross-platform and to look at these videos in terms of when they were shot, what time, by whom, from what angle, at what location. It is a challenge that requires a lot of time that reporters just do not have.

    The other thing is, we do not really have a strong way of mapping video spread. So, when reporters do content analysis, they generally focus on text. The reason for that is because text is machine-readable. We have the tools to sort of map out the biggest account that posted this, the smaller accounts that came from it and the sort of audience that looked at this. We do not have similar tools for video even though analyzing the spreading of information is one of the most useful things we do as disinformation reporters. It allows us to see the key points where disinformation traveled. It allows us to understand where to look next time. It allows us most importantly to understand which communities were most impacted.

    Is collaboration with researchers and platforms essential in fighting disinformation?

    Definitely collaboration with researchers. Platforms are a bit on and off again in terms of what kind of information they are willing to provide us with. Sometimes they are willing to confirm fact findings, but they are rarely helping us to do research independently.

    This is where researchers, analytics, sort of third parties that specialize in this are really key for reporters.

    How should we report about disinformation and deepfakes?

    At Buzzfeed News we always try to put correct information first. We repeat the accurate information before you get to the inaccurate information. There are two different approaches you can take. One is reporting on the content of the video and the other is reporting on the existence of the video as well as any information you have in terms of where it came from, who posted it and why.

    We generally focus on the second approach as the primary presentation of facts.

    That is how we frame a lot of these things. Because the key aim of a manipulated video is to get the message across and if we put the message at the top then they still get the message across. What you want to do is describe the techniques, describe how they are attempting to manipulate the audience. And then explain the other part of manipulation which is the message.

    Do you have a specific workflow in verifying digital content?

    We do have best practices. Putting accurate information first is definitely the top priority. We also make sure to never put up an image or a screenshot without stamping it or crossing it out in some way. That gives a visual clue to anybody who comes across it that it is false. But more importantly, if a search engine scrapes that image and somebody comes across it on Google or Bing they are immediately able to see that it is false.

    Buzzfeed false stamp

    In terms of workflow for verification, the key part is documentation. We archive everything that we come across and take a screenshot. We are essentially making sure that we are able to retrace our steps. It is kind of a scientific process, we want to make sure that if anybody repeated our steps, they would come to the same conclusion. A lot of the times when we do pure debunks that is what we focus on. Because not only does it increase trust and shows how we got to the conclusion, it also teaches our audience some of the techniques that we are using so that they can use them in the future as well.

    Can you still trust what you are seeing, or do you always have this critical view?

    The short answer is yes, especially with a lot of videos. I really fear missing something because sometimes the manipulation is so subtle you can’t quite tell that it is a manipulation from the first or the second look. If somebody is just scrolling on their feed, they are not looking very closely at those details. They might not even listen to the audio and hear that it sounds off. They might read the subtitles instead. They might not notice that the mouth in a deepfake is a little bit imperfect because those are little details. We are bombarded with information in our news feed so we might just not notice it.

    I’m always extremely suspicious and sometimes I’m more suspicious than I should be, sometimes I look at a video and I’m like was that slowed down by 0.7 of a second or am I losing my mind?

    What are the three main tips for a news consumer to detect synthetic media?

    Tip #1
    My first tip, if you see a video that is extremely viral or if you see video that sparks a lot of emotion or if you see a video that just kind of like feels a little bit off
    just pause.

    From there the number one thing you can do is search a couple of key terms in a search engine with the words fact check to see if somebody has already picked up on what you are seeing. You can also read the comments, very often in the comments people will explain what is going on in the video.

    Tip #2
    If you are unsure about the video really do play the audio and look at the key features that make people people. Look at the eyes, look at the mouth. What are the mannerisms of the person that you are seeing in the video and do they match with what you understand about that person? Ask yourself if the voice of the person sounds like their real voice.

    A lot of people when they see a GAN-generated photo for example, they have a gut feeling that something is wrong. They feel that they are not looking at a real person, but they can’t quite explain why. So, just really trust that feeling and start looking for those little signs that something is wrong. If it is a photo usually the best thing to look at are the earlobes. If a person has glasses look at the glasses. Eyebrows are not generally perfect if a photo is computer-generated and teeth are always a little off. Those are the things that I would look for.

    Tip #3

    The final tip is: do not share anything you are not sure of. Do not pass it on to your network.

    We all have created a small online community around us, whether it is friends, family, acquaintances or sort of strangers that we met on the internet. And most of that community really trusts us. Even if you are not a public figure your friends are going to trust what you post. So, take that responsibility seriously and try to not pass on anything that you are unsure of to that online community.

    When not deepfakes, what else would be our challenge in disinformation?

    My biggest worry when it comes to disinformation is not necessarily synthetic media. It is humans trying to convince other humans.

    Look at the most insidious falsehoods that we see right now in the US; the QAnon mass delusion like we call it at Buzzfeed. People who believe in this are very often brought on board by other people they know. So, what I really worry about is the continuing creation of online communities where people bring one another along for the ride except the ride is extremely false.

    I think that manipulated images and manipulated videos and fake news articles are all just tools. They are all parts of the problem. But the problem itself I think is a community problem and I definitely foresee that community problem growing beyond synthetic media.

    Many thanks Jane! If you are interested to learn more or have questions then please get into contact with us, either via commenting on this article or via our Twitter channel.

    We hope you liked it! Happy Digging and keep an eye on our website for future updates!

     Don’t forget: be active and responsible in your community – and stay healthy!

    Related Content

    In-Depth Interview – Jane Lytvynenko

    In-Depth Interview – Jane Lytvynenko

    We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

    From Rocket-Science to Journalism

    From Rocket-Science to Journalism

    In the Digger project we aim to implement scientific audio forensic functionalities in journalistic tools to detect both shallow- and deepfakes. At the Truth and Trust Online Conference 2020 we explained how we are doing this.