M+E Connections

CWMF: Ukraine-Based Respeecher Touts Voice Cloning Tech

As the amount of new content continues to grow, we are seeing greater innovation in technologies to make the localisation process faster, more streamlined and more cost-efficient.

Although many of these new technologies are orientated more towards subtitling, there has been significant innovation in the dubbing sector also. Technologies more widely used in the gaming sector are being evaluated, including synthetic voices and new lip synch technologies that synchronise the lips of actors on the original video with their dubbed voices are now available.

One example of that new technology is the voice cloning software for content creators developed by Kyiv, Ukraine-based Respeecher, whose CEO, Alex Serdiuk, was one of the speakers on March 22, during the closing keynoteInnovation in Dubbing Technology – Friend or Foe?” at the sixth annual Content Workflow Management Forum.

The event was held in conjunction with the eighth annual Content Protection Summit Europe at the Cavendish Conference Centre in London and as a virtual event via the MESAverse, allowing for remote attendance worldwide.

Deepfakes are becoming ever more realistic now, so this seems like a perfect time to talk about the ethics around some of the innovation we’re seeing, in addition to reviewing the positive impact that many of these technologies are having and stand to have in the future.

Francesca Panetta, curator, Sheffield DocFest Alternate Realities Programme, kicked off the session by discussing the short 2019 film In Event of Moon Disaster that she directed with Halsey Burgund. The movie was based on a speech written by William Safire that was created just in case the 1969 moon expedition failed  and the astronauts couldn’t make it back to Earth successfully, she noted.

The film used “synthetic media to bring this to life but also to highlight the problems of deepfake technologies and of how easy it is to manipulate media as well, and to  warn people to be  very careful about checking the media that passes through their social media feeds,” she explained.

There were four key objectives of the project, she said: to illustrate to audiences what deepfakes can look like, show how realistic they can be, equip audiences to be more discerning when they encounter media in the future, and show how synthetic media can be used in creative and positive ways.

She played a small clip from the nine-minute film, which uses synthetic audio and video, showing President Richard Nixon “delivering” the Safire speech on TV.

She then showed a bit of how the technology was pulled off, using an actor, Nixon’s resignation speech and the technologies of Respeecher and Canny AI, the latter of which handled the project’s video dialogue replacement.

The project was done a while ago and used an early version of Respeecher’s voice cloning technology, Serdiuk told viewers via video. “Our technology was not that sophisticated and that easy to use” at the time, he conceded.

As a result, the film’s production team had to spend days going through a “complicated process” that included “gathering the dataset we required” and then “recording what we call a parallel dataset,” he explained. Multiple takes were also required and the system made many mistakes back then, he said.

The process has become much easier since then, requiring only one take typically, he pointed out, explaining: “If you would do this project right now, we would need some samples of Richard Nixon’s speech and training our model for a week or two and that’s it.”

One use case for the Respeech tech is making Tom Hanks sound like Tom Hanks, including the timbre of his voice, in multiple languages despite using local performers in each region instead of the star to actually record the audio, Serdiuk said.

The tech can also be used to add flexibility to the work of localisation studios in each region, expanding their options to include additional voice actors, he said, noting that’s been of particular interest to those studios.

A website was built around the film In Event of Moon Disaster that includes resources and information about synthetic media and deepfakes, Panetta noted, adding there is an exhibition featuring the film at the Museum of the Moving Image in Queens, New York. It is running through May 15, according to the museum’s website.

The sixth annual Content Workflow Management Forum was produced by MESA in association with CDSA, the Hollywood IT Society (HITS), the Smart Content Council, the Content Localisation Council, and presented by Convergent Risks, with sponsorship by archTIS, NAGRA, Signiant, Whip Media, AppTek, BuyDRM, LinQ Media Group, OOONA, ZOO Digital, EIDR and Titles-On.

The eighth annual Content Protection Summit Europe was produced by MESA in association with the Content Delivery & Security Association (CDSA), and presented by Convergent Risks, with sponsorship by archTIS, NAGRA, Signiant, and BuyDRM.