The DASH format

A technical overview of the DASH format

The MPEG-DASH standard is very comprehensive (some will even say complex), and the specifications for it cover hundreds of pages. Luckily, one doesn’t have to know the full detail of it to be able to create an effective streaming solution.

Still, it pays to have a general understanding of the format, if only to be able to understand how a solution like broadpeak.io handles this type of stream, and troubleshoot particular scenarios.

Quick Summary

Manifest

In DASH, there is a single manifest file, in XML format, containing all the necessary information about available renditions and media segments in a hierarchical structure.

Codecs

The most widely used codecs for DASH are:

  • H.264/AVC, H.265/HEVC for video. Sometimes you may also find VP9.
  • AAC and Dolby Digital for audio. DASH also offers support for the Opus audio codec.

Segment Containers

DASH is commonly used with fMP4 fragments for H.264/AVC and H.265/HEVC video and AAC and Dolby Digital formats, and WebM fragments for VP9 and Opus.

DASH also supports CMAF containers (another variant of ISOBMFF) for streams encoded with H.264 and H.265. This makes it possible to have a single set of fragments for both HLS and DASH, a major advantage in terms of storage (and therefore storage costs), if your device supports those versions. For most applications, fMP4 segments that are not strictly CMAF can be used to achieve the same result.

Content protection

The DRM system in use is not typically directly constrained by the ABR format, but in practice you’ll often see that content in DASH will often use Widevine and/or PlayReady

Subtitles

DASH is, as with all things, fairly flexible when it comes to subtitle formats it accepts, and it will often be more of a question of whether the player (or other system consuming the manifest) supports that format that determines suitability.

There are 2 types of subtitles usually used in DASH

  • A single “sidecar” file that contains subtitles for the whole duration of the content, usually in WebVTT or TTML format (or one of its derivatives, such as IMSC)
  • A segmented format in which the subtitles are chunked into segments, in the same way as video and audio streams. This is usually done by wrapping WebVTT or TTML/IMSC subtitles in an fMP4 container, but sometimes the raw file itself (WebVTT or TTML/IMSC) is segmented into smaller ones.

Thumbnails

For playback purposes, usually to provide preview thumbnails when the users hovers over a scrub bar, or to provide trick play mode on some devices, one or multiple image tracks will commonly be added to an HLS manifest.

If it is specified in the DASH manifest, you will typically find it specified as a set of sprite image files, treated as segments in the same way as with video and audio tracks.

📘

There are other formats that may typically be used by players, such as image sprites declared in a WebVTT file, but they are usually not provided to the player through the manifest.

Manifest Format

Media Presentation Document

An MPEG-DASH manifest is contained in a Media Presentation Document (MPD) file, which is essentially an XML file with a specific XML schema, usually with an .mpd file extension.

The root of the file is an MPD element, decorated with a number of XML attributes

<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:isoff-live:2011" type="static" mediaPresentationDuration="P0Y0M0DT0H10M0.000S" minBufferTime="P0Y0M0DT0H0M2.000S" xmlns:ns2="http://www.w3.org/1999/xlink">
     ...
</MPD>

This example is for a VOD stream. By contrast, the following one is for a Live stream:

<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:ns2="http://www.w3.org/1999/xlink" profiles="urn:hbbtv:dash:profile:isoff-live:2012,urn:mpeg:dash:profile:isoff-live:2011" type="dynamic" availabilityStartTime="2023-07-03T00:00:00Z" publishTime="2023-07-03T20:10:12.428Z" minimumUpdatePeriod="P0Y0M24DT19H0M0.000S" minBufferTime="P0Y0M0DT0H0M2.000S" timeShiftBufferDepth="P0Y0M0DT0H5M0.000S">
	   ...
</MPD>

The main attributes that you’ll find are described below:

Profile

Although MPEG-DASH is agnostic to the media streaming formats used, in practice most of the times the same type of streams are used, and therefore DASH defines a number of profiles that capture common practice. The profiles attribute is used to indicate what profile or profiles the manifest adheres to. The most commonly found ones are:

  • A Live profile, marked as urn:mpeg:dash:profile:isoff-live:2011, which applies when the streams (audio and video principally) are pre-segmented into separate small fragmented MP4 files (”hard-parted”), which can be individually requested by the player directly.
  • An On-Demand profile, marked as urn:mpeg:dash:profile:isoff-on-demand:2011, which applies when the streams are contained in a single internally-fragmented MP4 file (”soft-parted”), and the player will use byte-range access to retrieve specific segments from it.

The naming of these profiles is slightly confusing. The Live profile, even though it is primarily addressing the needs of live streams, can and is commonly used for VOD as well.

Timeline

The MPEG-DASH standard also offers a very advanced timing model, which defines how a consumer of the manifest is meant to make its scheduling decisions (such as for rendering the content of the manifest in a player). The primary definition of the whole presentation is done on the MPD element with the following attributes:

  • type, which is either static for content that is used for VOD, or dynamic for a live stream
  • mediaPresentationDuration, mostly used for VOD, states what the duration of the whole presentation is (for VOD)
  • availabilityStartTime, mostly used for Live, indicates the wall (ie. real) clock time that corresponds to the start of the presentation.
  • minimumUpdatePeriod, again for Live streams, tells the player how regularly to refresh the manifest.

Periods

A DASH manifest divides the playback stream into one or multiple Periods.

Most of the time you will only have one Period, but multiple Periods can be used to segment the timeline of the video, typically for the purpose of creating chapters, of for insertion of ads in a SSAI solution. A Period may provide an indication of its duration with the duration attribute, and/or a start point that indicates from what point in the presentation the period starts at.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:isoff-live:2011,urn:com:dashif:dash264" type="static" mediaPresentationDuration="P0Y0M0DT0H10M0.000S" minBufferTime="P0Y0M0DT0H0M2.000S" xmlns:ns2="http://www.w3.org/1999/xlink">
    <Period duration="PT10M">
       ...
    </Period>
</MPD>

In this VOD example, we will use the simplest version containing a single period of 10 minutes of content.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:isoff-live:2011" type="dynamic" availabilityStartTime="1970-01-01T00:00:00Z" publishTime="2023-07-06T12:43:39.351201Z" minimumUpdatePeriod="PT2S" timeShiftBufferDepth="PT33.101S" maxSegmentDuration="PT3S" minBufferTime="PT2S" >
  <Period id="1" start="PT1688647379S" duration="PT430S">
      ...
  </Period>
  <Period id="2" start="PT1688647409S">
      ...
	</Period>
</MPD>

In this Live example, we have 2 periods with different times. The first period ends at the exact moment that the second one starts.

Adaptation Sets

Each Period contains one or multiple Adaptation Sets. There is usually one Adaptation Set for each type of media: video, audio, subtitles, image tracks, etc.

Multiple Adaptation Sets are however possible for the same media type, for example:

  • when different codecs are made available, for example different video codecs.
  • when different audio channel layouts are provided, for example stereo and surround sound.
  • when multiple languages are provided for audio or subtitles.
<MPD ...>
    <Period>
        <!-- Video Adaptation Set -->
        <AdaptationSet contentType="video" mimeType="video/mp4">
           ...
        </AdaptationSet>

        <!-- Audio Adaptation Sets -->
        <AdaptationSet contentType="audio" mimeType="audio/mp4" lang="en">
           ...
        </AdaptationSet>

        <AdaptationSet contentType="audio" mimeType="audio/mp4" lang="fr">
           ...
        </AdaptationSet>

        <!-- Subtitle Adaptation Set -->
        <AdaptationSet contentType="text" mimeType="application/ttml+xml" lang="en">
           ...
        </AdaptationSet>

        <!-- Thumbnail Adaptation Set -->
        <AdaptationSet contentType="image" mimeType="image/jpeg" lang="en">
           ...
        </AdaptationSet>
    </Period>
</MPD>

In this example, we separate our video, multiple audio, subtitles and image tracks into their own adaptation sets.

Representations

Each Adaptation Set contains one or multiple Representations.

A Representation is a semantically equivalent version of the same content, but with different quality levels. For example, for video representations, different bitrates or resolutions; for audio, different bitrates.

<MPD ...>
    <Period>
        <!-- Video Adaptation Set -->
        <AdaptationSet contentType="video" mimeType="video/mp4">
            <!-- 720p rendition at 4 Mbps -->
            <Representation id="1" bandwidth="4000000" width="1920" height="1080" codecs="avc1.4D4028">
                ...
            </Representation>
            <!-- 360p rendition at 500 Kbps -->
            <Representation id="2" bandwidth="500000" width="640" height="360" codecs="avc1.4D401E">
                ...
            </Representation>
        </AdaptationSet>

        <!-- Audio Adaptation Set -->
        <AdaptationSet  contentType="audio" mimeType="audio/mp4" lang="en">
            <!-- single rendition at 192 Kbps -->
            <Representation id="3" bandwidth="192000" codecs="mp4a.40.2">
                ...
            </Representation>
        </AdaptationSet>
    </Period>
</MPD>

For simplicity in the example, we only have 1 audio and 2 video representations.

Media Segments

A Representation is composed of multiple Segments. They contain the actual media being played back by the video player.

A critical role for an adaptive streaming manifest is to provide the client (such as a player) with a way to determine where the segments are located, how they are named, and how they can be downloaded (ie. how to calculate the URI to the segments).

There are many ways in DASH to express this information, called "addressing modes". The most common (and recommended) ones are:

Explicit Addressing: Segment Template with Segment Timeline

Explicit addressing tells the DASH client how to calculate the URL to individual media segments when each segment filename is numbered with an identifier. The adaptation sets in the MPD provide a SegmentTemplate in which the media attribute contains a placeholder for that identifier.

Inside it, a SegmentTimeline provides a list of sequences of segments with associated time and duration information. From that (usually quite succinct) information, the client can calculate the identifier and through it determine the filename of the segment.

There are 2 main types of identifiers used:

Timestamps ($Time$ addressing)

With this mode, the segment filenames are numbered with a timestamp. The $Time$ placeholder is used in the SegmentTemplate@media attribute.

<Representation>
	<SegmentTemplate timescale="48000" initialization="audio/en/init.mp4a" media="audio/en/$Time$.mp4a"> 
		<SegmentTimeline> 
			<S t="0" d="96000" r="432"/> 
		</SegmentTimeline> 
	</SegmentTemplate>
</Representation>

Numbers ($Number$ addressing)

The segment filenames can also be using numbers that do not express a timestamp. In this mode, the SegmentTemplate@media contains a $Number$ placeholder.

<Representation>
  <SegmentTemplate timescale="1000" initialization="init.mp4" media="chunk-$Number$.m4s" startNumber="10">
  	<SegmentTimeline>
    	<S t="0" d="5000" />
    	<S d="1500" />
    	<S d="2000" />
    	<S d="5000" r="20" />
  	</SegmentTimeline>
	</SegmentTemplate>

Simple Addressing: Segment Template without Segment Timeline

A simpler addressing mode can be used if all segments are guaranteed to have the same duration, or use continuous numbers. Both $Time and $Number$ placeholders can be used, but it's most often seen with the latter.

<Representation>
  <SegmentTemplate timescale="12288" duration="24576" media="video_$Number$.mp4" startNumber="1" initialization="video_init.mp4"/>
</Representation>

📘

Note that sometimes the SegmentTemplate information can be contained in the AdaptationSet itself, when it’s shared by all representations.

Indexed Addressing: Segment Base

This mode is used when the segments are not contained in individual files, but instead are internal fragments in a single MP4 or CMAF file, and a method called byte-range access is used to retrieve the segments to download.

<Representation>
  <BaseURL>audio.mp4</BaseURL>
  <SegmentBase timescale="48000" indexRange="848-999">
    <Initialization range="0-847"/>
  </SegmentBase>
</Representation>

Base URL

BaseURL elements throughout the MPD will often be used to indicate where those files are located relative to the manifest file itself.

Full example

<MPD ...>
    <Period>
        <!-- Video Adaptation Set -->
        <AdaptationSet mimeType="video/mp4" segmentAlignment="true">
            <!-- 720p rendition at 4 Mbps -->
            <Representation id="1" bandwidth="4000000" width="1920" height="1080" codecs="avc1.4D4028">
                <BaseURL>video/720p</BaseURL>
                <SegmentBase timescale="90000" indexRange="1000-1345">
                		<Initialization range="0-999"/>
								</SegmentBase>
            </Representation>
        </AdaptationSet>

        <!-- Audio Adaptation Set for English-->
        <AdaptationSet mimeType="audio/mp4" segmentAlignment="true" lang="en">
            <!-- single rendition at 192 Kbps -->
            <Representation id="3" bandwidth="192000" codecs="mp4a.40.2">
                <SegmentTemplate media="audio/1/segment-$Number$.m4s" startNumber="0" timescale="44100" duration="176400" initialization="audio/1/init.mp4"/>
            </Representation>
        </AdaptationSet>

        <!-- Audio Adaptation Set for French-->
        <AdaptationSet mimeType="audio/mp4" segmentAlignment="true" lang="fr">
            <!-- single rendition at 192 Kbps -->
            <Representation id="3" bandwidth="192000" codecs="mp4a.40.2">
								<SegmentTemplate timescale="12800" initialization="audio/2/init.mp4" media="audio/2/segment-$Time$.dash" presentationTimeOffset="0">
				           <SegmentTimeline>
				              <S t="102400" d="25600" r="10" />
				              <S d="12300" />
				         </SegmentTimeline>
				      </SegmentTemplate>
           </Representation>
        </AdaptationSet>
    </Period>
</MPD>

In our example, we use a SegmentBase for the video representations, and different SegmentTemplates for the audio representations. In reality you will rarely see different mechanisms used for different AdaptationSets in the same period, but the format allows it.

Events

The DASH standard offers different mechanisms to enrich the presentation with timed metadata. One of the most common one is through the use of EventStream elements inside the Period, which can carry a payload of information in a variety of formats, such as SCTE35 information which may contain information about where original ads can be located in an original live stream

<MPD ...>
  <Period>
    <EventStream timescale="90000" schemeIdUri="urn:scte:scte35:2013:xml">
      <Event duration="1081008">
        <scte35:SpliceInfoSection protocolVersion="0" ptsAdjustment="167725080" tier="4095">
          <scte35:TimeSignal>
            <scte35:SpliceTime ptsTime="7678071728"/>
          </scte35:TimeSignal>
          <scte35:SegmentationDescriptor segmentationEventId="417292" segmentationEventCancelIndicator="false" segmentationDuration="1081008" segmentationTypeId="48" segmentNum="11" segmentsExpected="16">
            <scte35:SegmentationUpid segmentationUpidType="0" segmentationUpidLength="0" segmentationTypeId="48" segmentNum="11" segmentsExpected="16"/>
          </scte35:SegmentationDescriptor>
        </scte35:SpliceInfoSection>
      </Event>
    </EventStream>

    <AdaptationSet>
			...
		</AdaptationSet>
  </Period>

References

The information above only scratches the surface of what the MPEG-DASH specification(s) offer, but for common use cases, understanding this should be sufficient.

For more details on the structure of MPEG-DASH, please refer to the specification (ISO/IEC 23009-1:2019).

The DASH Industry Forum (DASH-IF) guidelines on interoperability are particularly useful to ensure compatibility with various devices and applied standards. In particular, the DASH timing model is explained in more detail in DASH-IF implementation guidelines on restricted timing model, a critical source of information to understand how a DASH client works.