Introduction

Object-based audio is a revolutionary approach for creating and deploying interactive, personalised, scalable and immersive content, by representing it as a set of individual assets together with metadata describing their relationships and associations. This allows media objects to be assembled in ground-breaking ways to create new user experiences. With the introduction of HTML5 and the Web Audio API, an important prerequisite was made for native rendering of object-based audio even in browsers. The first public trials were a massive milestone and the development continued since then. The BBC is one of the key institutions for research and development of object-based media and is a great ressource for information about object-based technology. Moreover, the ORPHEUS project realised an end-to-end object-based audio workflow, based on open standards and published many information on their web page. Further ressources of the IRT are available collected here.

Don't miss to check out our object-based demos (e.g. this one) accessible via the top menu on this page!

A few basic information in a presentation document is embedded below:

Channel-based audio

Although digital and file-based workflows have found their way into media production for some time now, those workflows still try, to some extent, to map the old analogue and tape-based method of production and delivery to the digital world. In an audio production, for example, various sound sources are mixed in a digital audio workstation to create a final channel-based mix for a specific target loudspeaker layout.

Conceptual overview of channel-based audio production and consumption
Each audio channel in the final product has to be reproduced by a loudspeaker at a well-defined position. This fixed audio mix is transmitted to the end-user with basically no means to adapt it to their needs, which may be a specific playback device or their personal preferences.

Object-based audio

An object-based production approach, however, is able to overcome the above-mentioned obstacles. The term 'object-based media' has become commonly used to describe the representation of media content by a set of individual assets, together with metadata describing their relationships and associations. At the point of consumption these objects can be assembled to create an overall user experience. The precise combination of objects can be flexible and responsive to user, environmental and platform specific factors.

Essentially, the goal is to capture the creative intent of the producer and carry as much information as possible, required or desired, from the production side to the end-user, to ensure the best recreation possible on the consumer side. To achieve this, the final product of a production process will be an audio scene that is in turn composed of several objects. The metadata associated with each object includes, but is not limited to, the target position of the audio signal, its target loudness and a description of its actual content.

Conceptual overview of object-based audio production and consumption
For playback, the object-based content needs to be 'rendered' to the reproduction layout, such as a multi-channel loudspeaker set-up. The term 'rendering' describes the process of generating actual loudspeaker signals from the object-based audio scene. This processing takes into account the target positions of the audio objects, as well as the positions of the speakers in the reproduction room. It may further take into account user interaction such as a change of position or level.

Benefit for producers and consumers

An object-based approach as mentioned above, can serve end-users more effectively, by optimizing the experience to best suit their access requirements, the characteristics of their playback platform and the playback environment or personal preferences of the listener. Moreover, it will be highly beneficial for content producers, as workflows can be streamlined and only a single production needs to be created, archived and transmitted in order to support and serve a multitude of potential target devices and environments. This is enabled by the simple fact that the metadata of individual objects can be modified and adjusted, either by the end-user or along the production and transmission chain, without the need to change the audio material itself. This way, the four key features of object-based media – interactivity and personalization, accessibility, immersive experiences and compatibility – can be achieved in a non- destructive, controlled and scalable way.