On television, language barriers are reconciled by translations or subtitles. Thanks to artificial intelligence, audio tracks can now even be converted into foreign languages largely automatically and with much less effort. New features co-developed by IRT make it possible to offer viewers individual language versions via the TV speakers or via smartphones and headphones. A showcase aroused great interest at the IFA and IBC 2019.
The term “accessibility” makes one think first of people with disabilities. But language can also be a barrier, for example when people do not speak a foreign language. In media productions, original sounds in interviews are overlaid with a translation or, for Video-on-Demand (VoD) offerings, are offered in the original language with subtitles. This is where a new idea comes in – made possible by artificial intelligence.
Here, an audio track translation is largely automated by a combination of a speech-to-text speech recognition system, a translation machine and a downstream text-to-speech application and thus the track is converted with much less manual effort into a foreign language.
Additional language versions via Internet
The Eurovox initiative of the European Broadcasting Union (EBU) is currently implementing such a translation engine based on cloud services: A video clip is loaded into the cloud and provided with a new audio file. Translation errors can be edited manually. At the Technical Assembly of the EBU, the idea was born to combine this approach with the new features of HbbTV 2, which were developed with the support of IRT. With media synchronisation, additional language versions of the television programme can be offered via the Internet. A showcase developed at short notice attracted many visitors at IFA and IBC 2019.
Various television content with different speaking situations were selected from the media libraries: Interviews, presenters in the studio, off-screen comments. The Eurovox service’s user interface was used to request translations in eight currently available languages. For the demo, only the translated audio files were made available on a web server. The original content with the German audio file was broadcasted with a local playout – according to the real broadcast environment. A narrow-band “Timeline” (“MPEG-TEMI Timeline”) was added to this broadcast signal, as specified by HbbTV 2 for the synchronization of IP streams to broadcast streams.
The local broadcast contents were also supplemented by an HbbTV application offering the eight additional language versions. Once the user selected a version, the HbbTV application called it up from the web server on the Internet in the background and, using the sync feature of HbbTV 2, replayed it lip-synched with the broadcast TV picture over the TV speakers. It was also demonstrated that HbbTV 2 allows the additional playback of IP audio versions also on mobile devices synchronized to the TV picture. The TV sends the timeline via home network and synchronizes the audio player on the mobile device. This makes it possible to offer individual viewers an individual language version via smartphone or headphones, while the TV speakers provide the German broadcast audio file.
With the increasing amount of HbbTV 2-capable devices, the demonstrated technologies can be used immediately in today’s broadcasting. Thanks to artificial intelligence, additional language versions can be generated with much less effort, especially for non-linear applications and genres with good audio quality. These versions can also be used for OTT (over-the-top content), for example in the ARD media player, and can be an important service for future European media platforms.
Whatch the demo video