AI-powered voice dubbing is transforming how media reaches global audiences. With companies like Papercup enabling significant growth for major players such as Sky News, Fremantle and Insider, AI localisation is quickly becoming a game-changer for engaging new viewers worldwide.
The London-based company, co-founded by Jesse Shemen and Jiameng Gao in 2017, set out to build a model that could translate voices into other languages at a fraction of the time and cost of traditional methods.
Major media companies, global enterprises, content creators and celebrities have since utilised the services to tap into new audiences worldwide. Papercup says its dubbing technology enabled Sky News to grow its viewership by 26 million in 12 months, Fremantle to amass 18 million views in seven months, and Insider to gain an additional 100 million views in just a few weeks.
Localisation is key expanding and tapping new, previously unreachable audiences. According to Papercup, 99 percent of video is locked to a single language, while 65 percent of people prefer to consume content in their own language. Papercup uses artificial intelligence to localise videos at scale to reach new audiences in their language.
Amir Jirbandey, Head of Growth & Marketing at Papercup, says their solution is up to “10 times cheaper than traditional dubbing methods, and up to four times faster.”
“Over the years, we’ve developed a platform where you input videos in English, for instance, and it automates transcription and translation to create a synthetic voiceover in the target language,” Jirbandey said.
Reaching new audiences
Wanting to expand into the Spanish market, Insider employed Papercup to repurpose its original content into the new target language. With a vast catalogue of English video content readily available, Papercup repurposed and distributed it across YouTube and Facebook, with localised meta-data, thumbnails and media descriptions.
According to Tony Manfred, Head of Video at Insider, the partnership saw the news provider achieve “percentage increases in the 1000s in as little as three months.”
Sky News approached Papercup to “accurately translate content into different languages at scale.” By localising and distributing videos in Spanish, Sky News achieved 26 million views and 96,000 subscribers in the first 12 months, according to Papercup.
Working across time zones
Jirbandey says Papercup employs a “human in the loop” approach. After generating synthetic voiceovers through text-to-speech technology, professional translators verify the translation quality and fine-tune the voices to ensure that the content matches “the same level of emotion, intention, and any nuances in the language to the original speaker.”
When collaborating with major media corporations, the team establishes mutual agreements on lead times to meet deadlines.
For instance, working with companies like Sky News and Bloomberg requires turning around a set number of videos within 24 hours. To meet tight deadlines, Papercup has built a workflow to “follow the sun.”
“We have people working across different time zones to make sure that there’s always someone working on it, and we can meet those tough deadlines,” Jirbandey said.
Beyond media giants
Papercup’s capabilities extend beyond publishers. Enterprises, content creators, and streaming media are also among Papercup’s roster of clients.
Cineverse, formerly Cinedigm, an independent entertainment company, is one of Papercup’s biggest customers. Cineverse launched a dedicated Bob Ross channel and sought to enter the Spanish market. Partnering with Papercup enabled them to localise substantial volumes of Bob Ross content – equivalent to two seasons per week, totalling 30 seasons in two months, which Jirbandey notes is “the kind of scale that’s not possible through any other means.”
Papercup currently offers dubbing in six core languages, focusing on Latin American Spanish, Brazilian Portuguese, and a limited number of European languages. The company is also exploring experimental languages like Hindi and various Asian languages.
Jirbandey says the company is usually “within two to three months at any given point to roll out a new language.” The challenge lies in ensuring these languages can capture the full spectrum of human emotion and make them “as expressive as possible.”
“If you look at the range of emotion depicted in a movie, you go from whispering to screaming and so many other things in between – that’s the main problem that we’re looking to solve – to make sure that the languages that we do have can match like for like, what you would hear from a human voice,” Jirbandey said.
Investment and funding
Jirbandey cautions against AI companies claiming to do mass language dubbing, as they often use “off-the-shelf, generic AI voices, provided by the big guys like Microsoft or Google,” who allow them to use them as an API.
He claims that true AI dubbing requires “tremendous amounts of R&D and machine learning resources,” an investment that Papercup has dedicated the past six years to.
In 2022, the company announced $20 million USD in Series A financing led by Octopus Ventures, bringing their total funding to $30.5 million USD. This funding solidified Papercup as one of the most well-funded AI dubbing companies globally.
Papercup’s investors include Local Globe, Sands Capital, Sky, and Guardian Media Ventures, along with prominent angel investors like John Collison, co-founder of Stripe, and the former Chief Scientist at Uber.
The road ahead
Papercup is yet to tackle a number of languages, such as Japanese or Mandarin. This is primarily due to the fact that the original source content doesn’t align with the preferences of these audiences.
“Just localising the language often is not good enough in those countries, just because they don’t consume the same type of content.
“That’s why those large organisations don’t necessarily need to think with a global lens with the whole world. They just need to think of secondary, tertiary, and fourth markets for them to enter through scalable localisation solutions like ours,” Jirbandey said.
Looking to the future, Jirbandey says the Papercup team are aiming to delve deeper into their existing markets.
“At the moment, our sweet spot tends to be unscripted content, documentaries, commentary, sports commentary, reality shows, all that kind of good stuff.
“The very challenging type of content is blockbusters and scripted content, so that’s the area where we’re focusing on, to tackle even more premium content – content that people place more value on.”
Impacts on jobs
AI dubbing still has far to go before achieving a human-like level. Jirbandey says that the main challenge right now is refining synthetic voices to exhibit more human qualities across various types of content.
“Ultimately, we’re trying to capture as many types of emotion as possible,” he says. “We are looking to improve some of the nuances with human speech, such as the utterances like the oohs and ahhs and the pauses – those are actually really hard to replicate.”
Adding additional layers of emotion poses another hurdle. Aspects such as irony, comedy, and flirtation—integral facets of the human experience—remain challenging for AI to mimic accurately. Jirbandey says it’s “just a matter of time” before this is possible.
“We don’t anticipate any system is going to be good enough to be wholly automated to tackle the content type that we tackle – you’re going to need that human intervention,” he added.
Given the uncertainties surrounding the impact of AI on the entertainment job landscape, including the current writers’ and actors’ strikes having much of the industry at a standstill, Jirbandey claims that Papercup is firmly aligned with artists and actors.
He says Papercup is “on the side of the artists and actors,” as they employ professional voice actors to train their machine learning algorithms.
“Apart from currently paying them above average at very good price points for their work, we will also look at mechanisms to continue paying them based on how their voices are used on media containers with us – we’re creating jobs and creating new opportunities,” he added.
Additionally, Papercup claims that the technology is not replacing existing jobs, as organisations have “shied away from traditional dubbing or haven’t even considered localising before,” due to the cost and time.
“These are use cases that are not necessarily being taken away from another market. We’re generating a new market and a new type of business that wouldn’t have existed without this type of technology.”