Object-based audio is a revolutionary approach for creating and deploying interactive, personalised, scalable and immersive
content, by representing it as a set of individual assets together with metadata describing their relationships and
This allows media objects to be assembled in ground-breaking ways to create new user experiences. With the
introduction of HTML5 and the Web Audio API, an important prerequisite was made for
native rendering of object-based audio even in browsers.
The first public trials were a massive milestone
and the development continued since then. The BBC is one of the key institutions for research and development of object-based media
and is a great ressource for information about object-based technology.
Moreover, the ORPHEUS project realised an end-to-end object-based audio workflow, based
on open standards and published many information on their web page. Further ressources of the IRT are available collected here.
A few basic information in a presentation document is embedded below:
Although digital and file-based workflows have found their way into media production for some time now, those workflows still try, to some extent, to map the old analogue and tape-based method of production and delivery to the digital world. In an audio production, for example, various sound sources are mixed in a digital audio workstation to create a final channel-based mix for a specific target loudspeaker layout. Each audio channel in the final product has to be reproduced by a loudspeaker at a well-defined position. This fixed audio mix is transmitted to the end-user with basically no means to adapt it to their needs, which may be a specific playback device or their personal preferences.
An object-based production approach, however, is able to overcome the above-mentioned obstacles. The term
'object-based media' has become commonly used to describe the representation of media content by a set of individual
assets, together with metadata describing their relationships and associations. At the point of consumption these
objects can be assembled to create an overall user experience. The precise combination of objects can be flexible
and responsive to user, environmental and platform specific factors.
Essentially, the goal is to capture the creative intent of the producer and carry as much information as possible, required or desired, from the production side to the end-user, to ensure the best recreation possible on the consumer side. To achieve this, the final product of a production process will be an audio scene that is in turn composed of several objects. The metadata associated with each object includes, but is not limited to, the target position of the audio signal, its target loudness and a description of its actual content. For playback, the object-based content needs to be 'rendered' to the reproduction layout, such as a multi-channel loudspeaker set-up. The term 'rendering' describes the process of generating actual loudspeaker signals from the object-based audio scene. This processing takes into account the target positions of the audio objects, as well as the positions of the speakers in the reproduction room. It may further take into account user interaction such as a change of position or level.