Subtitles for 360° media – alternative presentation for immersive experiences


When subtitles are provided in a 360° environment, they are typically fixed at the bottom of the screen and centered horizontally. This is fine for common 2D TV, but maybe is not the best strategy for 360° media. By making use of the additional “dimension”, maybe subtitles can convey more than just textual information. Whereas audio can guide a viewer through a 360° scene – by helping tracking the action and keeping the orientation – for people with hearing impairments such spatial cues are not available. Can similar cues be realized with an enhanced subtitle service? And how do viewers respond to them?

Thus, the main questions to be answered are:

  • What purpose do subtitles in 360° media serve?
  • And how can this purpose be fulfilled best?

To address these questions, alternative options for the presentation of subtitles are developed and evaluated in the ImAc-project (Immersive Accessibility). An example is shown below (Image 1) where the subtitle sticks to the corresponding speaker (or, in this case, a guitar player) as long as he is visible. When the viewer turns away and the speaker moves out of view, the subtitle sticks to the side that is closest to the speaker:

Image 1

Image 1 – In this example (starting at the top picture), the viewer turns his head from right to left. The subtitle “follows” the guitar player, who is moving to the right edge of the image. When the guitar player finally moves out of view, the subtitle stays at the right edge.

This is one possible way of conveying information about the scene or speakers in a scene by means of subtitle positioning. This and other possibilities to use the subtitles as spatial cues will be investigated in user trials and in our pilots.

In the course of 2018, a first version of the ImAc subtitle service will be shown to end users. And we are eager to find out, how they will respond to it.

Mapping subtitles to 360° media

Let’s have a look at the 360° video first: various projections and mappings exist to store and display a 360° video; the most common one probably is the equirectangular projection. Image 2 shows how the equirectangular image would look like, when you render the stored 360° video 1:1 on a 2-dimensional plane:

Image 2

Image 2 – Video as stored for equirectangular projection.

But this is probably NOT how one would like to watch 360° videos :-). To let a viewer look around in the scene, the image above needs to be projected onto a sphere. The picture that you typically see through VR glasses or on your tablet is a part of that sphere. Image 3 gives an approximate example showing one half of such a sphere. The red marked area indicates the part that is currently visible to the viewer, the so-called “field of view”.

Image 3

Image 3 – Video from image 2 wrapped as a texture around a sphere. The viewer looks from the inside of the sphere onto it’s surface. The red-outlined area marks the current field of view that is shown to the viewer.

This is a very basic and simplified description of 360° video presentation. There are a lot more aspects to it, but this introduction is sufficient for going through the following considerations.

2D and 360° subtitles from an authoring perspective

Compared to “traditional” or 2D video, the presentation style that is described above happens in a different space and thus induces changes on the creation of subtitles. (We will mostly use the terms 2D and 360° video from here on).

To show the differences between 2D and 360° we will first take a closer look at “traditional” subtitles and their authoring process. Some of the key points are:

  1. Subtitles are placed on a static field of view and on a fixed video. With “fixed video”, we mean that the video is shown completely as it is stored in a media file and there is no interaction with the user (or subtitle editor) that influences the way the video is shown.
  2. For the authoring process, the video can be seen as a flat plane that will always be rendered/shown the same way.
  3. As a result from 1) and 2), the subtitles are placed at a position that is absolute to both video and field of view. The subtitle author can see and decide on which area of the video the subtitle will be shown (e.g. top, bottom).

Note: There are web players that offer personalization options, that allow for example to change the font size or move subtitles to a different position on the screen (e.g. “always top” instead of “always bottom”). For now, we will put these options aside.

 What is positioning used for? Usually, in 2D video subtitles are placed at the bottom of the screen. A different vertical position is used for example to avoid obstructing graphics and inserts. The horizontal position is sometimes used to support speaker identification. For example, the subtitle for a person that appears at the left side of a video may be placed left aligned (typically, however, speaker identification is usually done by choosing different font colors).

What is different in 360°? Obviously, the media presentation sphere does not correlate with the field of view. Only a part of the media is shown at any time. Typically, the viewer has the freedom to choose which part to look at. Sometimes he can zoom in or out too. When showing 360° media on a PC or tablet, the term “magic window” is often used.

Subtitles for 360° can be rendered in various ways. To outline some impacts of 360°, we will look at two possibilities:

  1. Use the video projection sphere as reference for subtitles
  2. Use the field of view as reference for subtitles

#1 Field of view (magic window) as reference

When subtitles correlate with the field of view (or magic window), they can be handled similarly as for 2D, because the field of view has similar attributes as the 2D presentation plane does:

  • The field of view can be seen as a simple 2D rendering plane.
  • Positioning of subtitles can be done as for a “traditional” screen.
  • The relation font size to field of view is the same as in 2D.

The main difference to a 2D presentation is that the subtitle does not correlate with the video. That leads to various effects, e.g.:

  • The font size does not change when viewer zooms in or out.
  • The author cannot ensure that subtitles do not obscure important areas of the video.

Additionally, when watching the video via a PC/tablet, the subtitle will not be part of the 360° scene, and thus, will not be distorted as the video is (see “lens distortion” below). That may compromise the immersive experience.

Modifications: One piece of information that is missing in this scenario is an indication for the speaker’s location in the scene. This information could be added for instance by showing an arrow next to the subtitle line that points towards the speaker. Another possibility could be to manipulate the subtitle position within the field of view, depending on the position of the correlated speaker (as shown in Image 1). First user tests that were conducted by ImAc partners RBB and CCMA, revealed that the latter option (altering the subtitle position) likely is too distracting to the viewer. The former option (using arrows/icons to guide the viewer) was preferred. Adding this kind of information may help the viewer to navigate within the scene and will be investigated further.

#2 Video sphere as reference:

In this case, we assume that the surface where subtitles are rendered on is equal to the video projection sphere (regarding dimension and geometry). So basically, the subtitle is part of the scene and reacts the same way as the video to user interactions, e.g.:

  • A subtitle may move out of view when the viewer turns his head.
  • A subtitle always covers a specific area of the video. It can be positioned such that it doesn’t obscure important areas of the video.
  • Subtitles zoom when the video is zoomed.
  • Subtitles are affected by lens distortion the same way that the video is, when played on a PC or Tablet.

Modifications: This approach might provide the more immersive experience but has some drawbacks that need to be investigated. In the following some possible modifications of the basic approach are discussed.

Project subtitles on a plane instead of a sphere: First implementations suggest that rendering subtitles onto a plane instead of a sphere are more convenient to read. The plane with the size of the subtitle can be added to the scene. Position and orientation can be chosen such that the subtitle appears (approximately) at the position where it would be when it is rendered onto a sphere. That would lead to slight aberrations between subtitle and video. In the following picture the difference can be seen:

Image 4

Image 4 – Two different rendering options for subtitles. Left: subtitle is rendered onto a sphere. Right: Subtitle is rendered onto a plane.

Manipulate subtitle position: When rendering subtitles on a sphere they can move out of view and the viewer will miss some information. To avoid that, one must break with one of the basic characteristics of this approach: the subtitle needs to be “released” from its actual position in the scene/video the moment it would move out of view. That means it “sticks” to the edges of the field of view. How users will react to this approach is not yet known and may be subject to further investigations.

Fix subtitle size: The font size of a subtitle that is most convenient for a viewer is rather related to the screen dimensions and the viewer’s preferences than to the zoom level. Thus, it might make sense to modify the font size during the subtitle rendering process to achieve a suitable font size independent of the video zoom factor.

Back to blog