Select Page

Audio Synthesis, what’s next? – Mellotron

Audio Synthesis, what’s next? – Mellotron

Expressive voice synthesis with rhythm and pitch transfer. Mellotron managed to let a person sing, without ever recording his/her voice performing any song. Interested? Here is more...

Some time ago we suggested to pay a visit to the virtual ICASSP 2020 conference. For those of you who couldn’t make it, here’s a short recap of one of the most exiting research papers we stumbled upon. Please give a warm welcome to Mr. Mellotron!

Tacotron

If you are interested in deepfakes, you probably heard of the impressive audio deepfakes created by the Tacotron model, and by its rightful successor Tacotron 2. The goal of both models is to turn an input text into a complex time-frequency matrix which is then translated into an audio file. The birth of “modern” text-to-speech applications, which led to audio deepfakes, is largely due to these two papers.

Singing

Both Tacotron and Tacotron 2 were able to learn how a voice sounded and could reproduce speech from text with that very voice. This ability, by itself, was already remarkable. At ICASSP 2020 this year three researchers from NVIDIA went a step further: They managed to let a person sing without ever recording his/her voice performing any song. The Mellotron neural network is able to vary the pace and the intonation of any (singing or speaking) voice according to the user input, leaving infinite possibilities of variations and expressiveness.

Before & After

Before Mellotron, reproducing a lively and expressive voice required gathering plenty of audio material of a speaker and exploring all possible variations of the voice. After Mellotron much less material is going to be needed, “only” enough to learn a person’s voice timbre. That person being happy, angry or sad is up to the network to decide. A couple of years ago this would have been impossible. Thanks to this research, it just became reality.

If you are interested in what this sounds like, do not miss out on the audio examples produced by Mellotron, and have a look at the original paper:

Happy Digging and keep an eye on our future “Audio Synthesis: What’s next?” posts!

 Don’t forget: be active and responsible in your community – and stay healthy!

Related Content

In-Depth Interview – Sam Gregory

In-Depth Interview – Sam Gregory

Sam Gregory is Program Director of WITNESS, an organisation that works with people who use video to document human rights issues. WITNESS focuses on how people create trustworthy information that can expose abuses and address injustices. How is that connected to deepfakes?

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

Video verification step by step

Video verification step by step

What should you do if you encounter a suspicious video online? Although there is no golden rule for video verification and each case may present its own particularities, the following steps are a good way to start.

What should you do if you encounter a suspicious video online? Although there is no golden rule for video verification and each case may present its own particularities, the following steps are a good way to start. 

Pay attention and ask yourself these basic questions

Start with asking some basic questions like “Could what I am seeing here be true?”, “Who is the source of the video and why am I seeing/receiving this?”. “Am I familiar with this account?”, “Has the account’s content and reporting been reliable in the past?” and “Where is the uploader based, judging by the account’s history?”. Thinking the answers to such questions may raise some red flags about why you should be skeptical towards what you see. Also, watch the video at least twice and pay close attention to the details; this remains your best shot for identifying fake videos, especially deepfakes. So, careful viewers may be able to detect certain inconsistencies in the video (e.g. non-synchronized lips or irregular background noises) or signs of editing/manipulation (e.g. certain areas of a face that are blurry or strange cuts in the video). Most video manipulation is still visible by the naked eye. If you want to read more on how to deal with dubious claims in general, you can read our previous blog post

Capture and reverse search video frames

When encountering a suspicious image, reverse searching it on Google or Yandex is one of the first steps you take in order to find out if it was used before in another context . For videos, although reverse video search tools are not commercially available yet, there are ways to work around that, in order to examine the provenance of a video and see whether similar or identical videos have circulated online in the past. There are many tools like Frame-By-Frame that enable users to view a video frame-by-frame, capture any frame and save it – if you have the VLC player installed it works as well. 

Cropping certain parts of a frame or flipping the frame (flipping images is one method disinformation actors use to make it more difficult to find the original source through reverse image search) before doing a reverse search may sometimes yield unexpected results. Also, searching in several reverse search engines (Google, Yandex, Baidu, TinEye, Karma Decay for Reddit, etc.) increases the possibility of finding the original video. The InVID-WeVerify plugin can help you verify images and videos using a set of tools like contextual clues, image forensics, reverse image search, keyframe extraction and more.

Examine the location where the video was allegedly filmed

Although in some instances it is very difficult or nearly impossible to verify the location where a video was shot, other times the existence of landmarks, reference points or other distinct signs in the video may reveal its filming location. For example, road signs, shop signs, landmarks like mountains, distinct buildings or other building structures can help you corroborate the video’s filming location.

Tools like Google Maps, Google Street View, Wikimapia, and Mapillary can be used to cross-check whether the actual filming location is the same as the alleged. Checking historical weather conditions for this particular place, date and time is another way to verify a video. Shadows visible in the video should also be cross-checked to determine whether they are consistent with the sun’s trajectory and position at that particular day and time. SunCalc is a tool that helps users check if shadows are correct by showing sun movement and sunlight phases during the given day and time at the given location. And sometimes it helps to stitch together several keyframes to narrow down the location – you may check this great tutorial by Amnesty

Video metadata and image forensics 

Even though most social media platforms remove content metadata once someone uploads a video or an image, if you have the source video, you can use your computer’s native file browser or tools like Exiftool to examine the video’s metadata. Also, with tools like Amnesty International’s YouTube DataViewer you will be able to find out the exact day and time a video was uploaded on YouTube.  If the above steps don’t yield confident results and you are still unsure of the video you can try out some more elaborate ways to assess its authenticity. With tools like the InVID-WeVerify plugin or FotoForensics you can examine an image or a video frame for manipulations with forensics algorithms like Error Level Analysis (ELA) and Double Quantization (DQ). The algorithms may reveal signs of manipulation, like editing, cropping, splicing or drawing. Nevertheless, to be able to understand the results and draw safe conclusions avoiding false-positives a level of familiarity with image forensics is required.

A critical mind and an eye for detail

As mentioned above, there is no golden rule on how to verify videos. The above steps are merely exhaustive, but they can be a good start. But as new methods of detection are developed, so are new manipulation methods – in a game that doesn’t seem to end. The commercialization of the technology behind deepfakes through openly accessible applications like Zao or Doublicat is making matters worse driving the “democratization of propaganda”. What remains most important and independent of the tools that can be used for the detection of manipulated media is to approach any kind of online information (especially user generated content) with a critical mind and an eye for detail. Traditional steps in the verification process, such as checking the source and triangulating all available information still remain central.   

In the effort to tackle mis- and disinformation, collaboration is key. In Digger we work with Truly Media to provide journalists with a working environment where they can collaboratively verify online content. Truly Media is a collaborative platform developed by Athens Technology Center and Deutsche Welle that helps teams of users collect and organise content relevant to an investigation they are carrying out and together decide on how trustworthy the information they have found is.  In order to make the verification process as easy as possible for journalists, Truly Media integrates a lot of the tools and processes mentioned above, while offering a set of image and video tools that aid users in the verification of multimedia content. Truly Media is a commercial platform – for a demo go here.

How to get started?

If you are a beginner in verification or if you would like to learn more about the whole verification process, we would suggest reading the first edition of the Verification Handbook, the Verification Handbook for Investigative Reporting, as well as the latest edition published in April 2020.

Stay tuned and get involved

We will publish regular updates about our technology, external developments and interview experts to learn about ethical, legal and hand-on expertise.

The Digger project is developing a community to share knowledge and initiate collaboration in the field of synthetic media detection. Interested? Follow us on Twitter @Digger_project and send us a DM or leave a comment below.

Related Content

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

The dog that never barked

The dog that never barked

Deepfakes have the potential to seriously harm people’s lives and to deter people’s trust in democratic institutions. They also continue to make the headlines. How dangerous are they really?

Deepfakes, although characterized by some as “the dog that never barked”, have in fact the potential to seriously harm people’s lives and to deter people’s trust in democratic institutions. 

Deepfakes continue to make the headlines – the latest news at the time of writing this article being about Donald Trump’s Independence Day deepfake video – which  raised also important legal and ethical issues, almost three years after the term “deepfake” was first coined in the news. Behind the headlines, synthetically generated media content (also known as  deepfakes) have even more serious consequences on individual lives – and especially on the lives of women. Deepfakes are also expected to be increasingly weaponized and combined with other trends and technologies they are expected to heighten security and democracy challenges in areas like cyber-enabled crime, propaganda and disinformation, military deception, and international crises.

“Technical approaches are useful until synthetic media techniques inevitably adapt to them. A perfect deepfake detection system will never exist”.   Sam Gregory, program director of WITNESS

 

It´s a race

Researchers, academics, and industry are all working towards developing deepfake detection algorithms, but developments in the field occur both ways, and as new detection algorithms get better, so do available tools to create deepfakes. As Sam Gregory, program director of WITNESS puts it, “Technical approaches are useful until synthetic media techniques inevitably adapt to them. A perfect deepfake detection system will never exist”. 

Verification of synthetically generated media content is still part of the traditional verification and fact-checking techniques and should be approached in the context of these already existing methods. Even though technology cannot provide a yes-or-no answer in the question “Is this video fake?”, it can greatly aid journalists in the process of assessing the authenticity of deepfakes. That’s why we at the Digger team are working hard to provide journalists with tools that can help them determine if a certain video is real or synthetic. Stay tuned for our how-to article coming up soon!

 

Don’t forget: be active and responsible in your community – and stay healthy!

Related Content

In-Depth Interview – Sam Gregory

In-Depth Interview – Sam Gregory

Sam Gregory is Program Director of WITNESS, an organisation that works with people who use video to document human rights issues. WITNESS focuses on how people create trustworthy information that can expose abuses and address injustices. How is that connected to deepfakes?

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

ICASSP 2020 International Conference on Acoustics, Speech, and Signal Processing

ICASSP 2020 International Conference on Acoustics, Speech, and Signal Processing

Here is what we think are the most relevant upcoming audio-related conferences. And which sessions you should attend at the ICASSP 2020.

To keep up-to-date with the latest on audio-technology for our software development, we follow other researchers studies and we usually visit many conferences. Sadly, this time, we cannot attend them in person. Nevertheless, we can visit them virtually, together with you. Here is what we think are the most relevant upcoming audio-related conferences:

Let’s take a more detailed look at,

ICASSP 2020 International Conference on Acoustics, Speech, and Signal Processing

Date: 04th – 8th of May, 2020
Location: https://2020.ieeeicassp.org/program/schedule/live-schedule/

This is a list of panels we recommend during the ICASSP 2020:

Date: Tuesday 05th of May 2020

  • Opening Ceremony (9:30 – 10:00h)
  • Plenary by Yoshua Bengio on “Deep Representation Learning” (15:00 – 16:00h)
    • Note: may be pretty technical, for deep learning enthusiastic
    • Note: He’s one of the fathers of deep learning

Date: Wednesday 06th of May 2020

Date: Thursday 07th of May 2020

We’re looking forward to seeing you there!

The Digger project aims:

  • to develop a video and audio verification toolkit, helping journalists and other investigators to analyse audiovisual content, in order to be able to detect video manipulations using a variety of tools and techniques.
  • to develop a community of people from different backgrounds interested in the use of video and audio forensics for the detection of deepfake content.

Related Content

In-Depth Interview – Sam Gregory

In-Depth Interview – Sam Gregory

Sam Gregory is Program Director of WITNESS, an organisation that works with people who use video to document human rights issues. WITNESS focuses on how people create trustworthy information that can expose abuses and address injustices. How is that connected to deepfakes?

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.

Digger – Detecting Video Manipulation & Synthetic Media

Digger – Detecting Video Manipulation & Synthetic Media

What happens when we cannot trust what we see or hear anymore? First of all: don’t panic! Question the content: Could that be true? And when you are not 100 percent sure, do not share, but search for other media reports about it to double-check.

What happens when we cannot trust what we see or hear anymore? First of all: don’t panic! Question the content: Could that be true? And when you are not 100 percent sure, do not share, but search for other media reports about it to double-check.

How do professional journalists and human rights organisations do this? Every video out there could be manipulated. With video editing software anyone can edit a video.

It is challenging to verify content which has been edited, mislabeled or staged. What is even more complex is to verify content that has been modified. We roughly see two kinds of manipulation:

  1. Shallow fakes: manipulated audiovisual content (image, audio, video) generated with ‘low tech’ technologies like Cut & Paste or speed adjustments. 
  2. Deepfakes: artificial (synthetic) audiovisual content (image, audio, video) generated with technologies like Machine Learning.

Deepfakes and synthetic media are some of the most feared things in journalism today. It is a term which describes audio and video files that have been created using artificial intelligence. Synthetic media is non-realistic media and often referred to as Deepfakes at the moment. Generated by algorithms it is possible to create or swap faces, places, and digital synthetic voices that realistically mimic human speech and face impressions but actually do not exist and aren´t real. That means machine-learning technology can fabricate a video with audio to make people do and say things they never did or said. These synthetic media can be extremely realistic and convincing but are actually artificial.

Detection of synthetic media

Face or body swapping, voice cloning and modifying the speed of a video is a new form of manipulating content and the technology is becoming widely accessible

At the moment the real challenge are the so called shallow fakes. Remember the video where Nancy Pelosi appeared to be drunk during a speech. It turned out the video was just slowed down, but with the pitch turned up to cover up the manipulation. Video manipulation and creation of synthetic media is not the end of the truth but it makes us more cautious before using the content in our reporting. 

On the technology side it is a rat race. Forensic journalism can help detect altered media. DW´s Research & Cooperation team works together with ATC, a technology company from Greece and the Fraunhofer Institute for digital media technology to detect manipulation in videos. 

Digger – Audio forensics

In the Digger project we focus on using audio forensics technologies to detect manipulation. Audio is an essential part of video and with a synthetic voice of  a politician or the tampered noise of a gunshot a story can change completely. Digger aims to provide functionalities to detect audio tampering and manipulation in videos. 

Our approach makes use of:

  1. Microphone analysis: Analysing the device being used for the recording of audio. 
  2. Electrical network Frequency Analysis: Detect editing (cut & paste analyses) of audio.
  3. Codec Analysis: We follow the digital footprint of audio by extraction of ENF traces.

Synthetic media in reality

Synthetic media technologies can have a positive as well as a negative impact on society.

It is exciting and scary at the same time to think about the ability to create audio-visual content in the way we want it and not in the way it exists in reality. Voice synthesis will allow us to speak in hundreds of languages in our own voice. (Hyperlink: Video David Beckham) 

Or we could bring the master of surrealism back to life:

With the same technology you can also make politicians say something they never have or place people in scenes they have never been. These technologies are being used in pornography a lot but the unimaginable impact is also showcased in short clips in which actors are placed in films they have never acted in. Possibly one of the most harmful effects is that perpetrators can also easily claim “that’s a deepfake” in order to dismiss any contested information. 

How can the authenticity of information be proofed reliably? This is exactly what we aim to address with our project Digger.  

Stay tuned and get involved

We will publish regular updates about our technology, external developments and interview experts to learn about ethical, legal and hand-on expertise. 

The Digger project is developing a community to share knowledge and initiate collaboration in the field of synthetic media detection. Interested? Follow us on Twitter @Digger_project and send us a DM or leave a comment below. 

Related Content

In-Depth Interview – Jane Lytvynenko

In-Depth Interview – Jane Lytvynenko

We talked to Jane Lytvynenko, senior reporter with Buzzfeed News, focusing on online mis- and disinformation about how big the synthetic media problem actually is. Jane has three practical tips for us on how to detect deepfakes and how to handle disinformation.