-
Notifications
You must be signed in to change notification settings - Fork 14
/
index.html
1300 lines (1250 loc) · 77.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Sourcing In-band Media Resource Tracks from Media Containers into HTML</title>
<meta charset='utf-8'>
<script src='http://www.w3.org/Tools/respec/respec-w3c-common' async class='remove'></script>
<script class='remove'>
var respecConfig = {
// specification status (e.g. WD, LCWD, WG-NOTE, etc.). If in doubt use ED.
specStatus: "unofficial",
// the specification's short name, as in http://www.w3.org/TR/short-name/
shortName: "in-band-tracks",
// if your specification has a subtitle that goes below the main
// formal title, define it here
// subtitle : "an excellent document",
// if you wish the publication date to be other than the last modification, set this
// publishDate: "2009-08-06",
// if the specification's copyright date is a range of years, specify
// the start date here:
copyrightStart: "2014",
// if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
// and its maturity status
// previousPublishDate: "1977-03-15",
// previousMaturity: "WD",
// if there a publicly available Editor's Draft, this is the link
edDraftURI: "http://dev.w3.org/html5/html-sourcing-inband-tracks/",
// if this is a LCWD, uncomment and set the end of its review period
// lcEnd: "2009-08-05",
// editors, add as many as you like
// only "name" is required
editors: [
{
name: "Silvia Pfeiffer"
, url: "mailto:[email protected]"
, mailto: ""
, company: "NICTA"
, companyURL: "http://nicta.com.au/"
},
{
name: "Bob Lund"
, url: ""
, mailto: ""
, company: "CableLabs Inc"
, companyURL: "http://www.cablelabs.com/"
}
],
// name of the WG
wg: "Inband Text Tracks Community Group",
// URI of the public WG page
wgURI: "http://www.w3.org/community/inbandtracks/",
// name (without the @w3c.org) of the public mailing to which comments are due
wgPublicList: "public-inbandtracks",
// URI of the patent status for this WG, for Rec-track documents
// !!!! IMPORTANT !!!!
// This is important for Rec-track documents, do not copy a patent URI from a random
// document unless you know what you're doing. If in doubt ask your friendly neighbourhood
// Team Contact.
wgPatentURI: ""
// !!!! IMPORTANT !!!! MAKE THE ABOVE BLINK IN YOUR HEAD
};
</script>
<!-- script to register bugs -->
<script src="https://dvcs.w3.org/hg/webcomponents/raw-file/tip/assets/scripts/bug-assist.js"></script>
<meta name="bug.short_desc" content="[InbandTracks] ">
<meta name="bug.product" content="HTML WG">
<meta name="bug.component" content="Sourcing In-band Media Resource Tracks">
<style type="text/css">
table {
border-collapse: collapse;
border-style: hidden hidden none hidden;
}
table thead, table tbody {
border-bottom: solid;
}
table td, table th {
border-left: solid;
border-right: solid;
border-bottom: solid thin;
vertical-align: top;
padding: 0.2em;
}
/* fix bug entry form styling */
#bug-assist-form {
padding: 4px;
border: 1px solid red;
background-color: rgba(255, 255, 255, 0.6);
position: fixed;
top: 1em;
right: 1em;
width: 115px;
opacity: 0.8;
text-align: right;
}
/* move respec button out of the way of the bug button */
#respec-ui{
top: 100px !important;
}
</style>
</head>
<body>
<section id='abstract'>
<p>
This specification is provided to promote interoperability among implementations and users of in-band text tracks sourced for [[HTML5]]/[[HTML]] from media resource containers. The specification provides guidelines for the creation of video, audio and text tracks and their attribute values as mapped from in-band tracks from media resource types typically supported by User Agents. It also explains how the UA should map in-band text track content into text track cues.
</p>
<p>
Mappings are defined for [[MPEGDASH]], [[ISOBMFF]], [[MPEG2TS]], [[OGGSKELETON]] and [[WebM]].
</p>
</section>
<section id='sotd'>
<p>
This is the first draft. Please send feedback to: <a href="mailto:[email protected]">[email protected]</a>.
</p>
</section>
<section>
<h2>Introduction</h2>
<p>
The specification maintains mappings from in-band audio, video and other data tracks of media resources to HTML <code>VideoTrack</code>, <code>AudioTrack</code>, and <code>TextTrack</code> objects and their attribute values.
</p>
<p>This specification defines the mapping of tracks from media resources depending on the MIME type of that resource. If an implementation claims to support that MIME type and exposes a track from a resource of that type, the exposed track must conform to this specification.</p>
<p>Which actual tracks are exposed by a user agent from a supported media resource is implementation dependent. A user agent may expose tracks, for which it supports parsing, decoding and rendering, for playback selection by the web application or user. A user agent may also decide to expose tracks coded in formats it is not able to decode, but which it can identify, and describe through metadata such as the HTML <code>kind</code> attribute and others as defined in this specification. For text tracks, the track content may be exposed to the Web application via TextTrackCue or DataCue objects.</p>
<p>
A generic rule to follow is that a track as exposed in HTML only ever represents a single semantic concept. When mapping from a media resource, sometimes an in-band track does not relate 1-to-1 to a HTML text, audio or video track.
</p>
<p class="note">For example, a HTML <code>TextTrack</code> object is either a subtitle track or a caption track, never both. However, in-band text tracks may encapsulate caption and subtitle cues of the same language as a single in-band track. Since a caption track is essentially a subtitle track with additional cues of transcripts of audio-only information, such an encapsulation in a single in-band track can save space. In HTML, these tracks should be exposed as two <code>TextTrack</code> objects, since they represent different semantic concepts. The cues appear in their relevant tracks - subtitle cues would be present in both. This allows users to choose between the two tracks and activate the desired one in the same manner that they do when the two tracks are provided through two track elements.
</p>
<p class="note">
A similar logic applies to in-band text tracks that have subtitle cues of different languages mixed together in one track. They, too, should be exposed in a track of their own language each.
</p>
<p class="note">
A further example is when a UA decides to implement rendering for a caption track but without exposing the caption track through the <code>TextTrack</code> API. To the Web developer and the Web page user, such a video appears as though it has burnt-in captions. Therefore, the UA could expose two video tracks on the HTMLMediaElement - one with captions and a <code>kind</code> attribute set to <code>captions</code> and one without captions with a <code>kind</code> attribute set to <code>main</code>. In this way, the user and the Web developer still get the choice of whether to see the video with or without captions.
</p>
<p>
Another generic rule to follow for in-band data tracks is that in order to map them to <code>TextTrack</code> objects, the contents of the track need to be mapped to media-time aligned cues that relate to a non-zero interval of time.
</p>
<p>
For every MIME-type/subtype of an existing media container format, this specification defines the following information:
</p>
<ol>
<li>Track order.
<p>Tracks sourced according to this specification are referenced by HTML <code>TrackList</code> objects (<code>audioTracks</code>, <code>videoTracks</code> or <code>textTracks</code>). The [[HTML5]]/[[HTML]] specification mandates that the tracks in those objects be consistently ordered. This requirement insures that the order of tracks is not changed when a track is added or removed, e.g. that <code>videoTracks[3]</code> points to the same object if the tracks with indices 0, 1, 2 and 3 were not removed. This also insures a deterministic result when calls to <code>getTrackById</code> are made with media resources, possibly invalid, that declares two tracks with the same id. This specification defines a consistent ordering of tracks between the media resource and <code>TrackList</code> objects when the media resource is consumed by the user agent.</p>
<p>Note that in some media workflows, the order of tracks in a media resource may be subject to changes (e.g. tracks may be added or removed) between authoring and publication. Applications associated with a media resource should not rely on an order of tracks being the same between when the media resource was authored and when it is consumed by the user agent.</p>
<p>All media resource formats used in this specification support identifying tracks using a unique identifier. This specification defines how those unique identifiers are mapped onto the <code>id</code> attribute of HTML Track objects. Application authors are encouraged to use the <code>id</code> attribute to identify tracks, rather than the index in a <code>TrackList</code> object.</p>
</li>
<li>How to identify the type of tracks - one of audio, video or text.</li>
<li>Setting the attributes <code>id</code>, <code>kind</code>, <code>language</code> and <code>label</code> for sourced <code>TextTrack</code> objects.</li>
<li>Setting the attributes <code>id</code>, <code>kind</code>, <code>language</code> and <code>label</code> for sourced <code>AudioTrack</code> and <code>VideoTrack</code> objects.</li>
<li>Mapping Text Track content into text track cues.</li>
</ol>
</section>
<section id='mpegdash'>
<h2>MPEG-DASH</h2>
<b>MIME type/subtype: <code>application/dash+xml</code></b>
<p>
[[MPEGDASH]] defines formats for a media manifest, called MPD (Media Presentation Description), which references media containers, called media segments. [[MPEGDASH]] also defines some media segments formats based on [[MPEG2TS]] or [[ISOBMFF]]. Processing of media manifests and segments to expose tracks to Web applications can be done by the user agent. Alternatively, a web application can process the manifests and segments to expose tracks. When the user agent processes MPD and media segments directly, it exposes tracks for <code>AdaptationSet</code> and <code>ContentComponent</code> elements, as defined in this document. When the Web application processes the MPD and media segments, it passes media segments to the user agent according to the MediaSource Extension [[MSE]] specification. In this case, the tracks are exposed by the user agent according to [[MSE]]. The Web application may set default track attributes from MPD data, using the <code>trackDefaults</code> object, that will be used by the user agent to set attributes not set from initialization segment data.
</p>
<ol>
<li><p>Track Order</p>
<p>
If an <code>AdaptationSet</code> contains <code>ContentComponents</code>, a track is created for each <code>ContentComponent</code>. Otherwise, a track is created for the <code>AdaptationSet</code> itself. The order of tracks specified in the MPD (Media Presentation Description) format [[MPEGDASH]] is maintained when sourcing multiple MPEG DASH tracks into HTML.
</p>
</li>
<li><p>Determining the type of track</p>
<p>
A user agent recognises and supports data from a MPEG DASH media resource as being equivalent to a HTML track using the content type given by the MPD. The content type of the track is the first present value out of: The <code>ContentComponents</code>'s "contentType" attribute, the <code>AdaptationSet</code>'s "contentType" attribute, or the main type in the <code>AdaptationSet</code>'s "mimeType" attribute (i.e. for "video/mp2t", the main type is "video").
</p>
<ul>
<li>text track:
<ul>
<li>the content type is "<code>application</code>" or "<code>text</code>"</li>
<li>the content type is "<code>video</code>" and the <code>AdaptationSet</code> contains one or more <a href="#mp4avcceacaption">ISOBMFF CEA 608 or 708 caption services</a>.</li>
</ul>
<li>video track: the content type is "<code>video</code>"</li>
<li>audio track: the content type is "<code>audio</code>"</li>
</ul>
</li>
<li><p>Track Attributes for sourced Text Tracks</p>
<p>
Data for sourcing text track attributes may exist in the media content or in the MPD. Text track attribute values are first sourced from track data in the media container, as described for <a href='#mpeg2tstta'>text track attributes in MPEG-2 Transport Streams</a> and <a href='#mpeg4tta'>text track attributes in MPEG-4 ISOBMFF</a>. If a track attribute's value cannot be determined from the media container, then the track attribute value is sourced from data in the track's <code>ContentComponent</code>. If the needed attribute or element does not exist on the <code>ContentComponent</code> (or if the <code>AdaptationSet</code> doesn't contain any <code>ContentComponents</code>), then that attribute or element is sourced from the <code>AdaptationSet</code>:
</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
The track is:
<ul>
<li>An <a href="#mp4avcceacaption">ISOBMFF CEA 608 caption service</a>: the string "cc" concatenated with the value of the '<code>channel-number</code>' field in the <code>Accessibility</code> descriptor in the <code>ContentComponent</code> or <code>AdaptationSet</code>.</li>
<li>An <a href="#mp4avcceacaption">ISOBMFF CEA 708 caption service</a>: the string "sn" concatenated with the value of the '<code>service-number</code>' field in the <code>Accessibility</code> descriptor in the <code>ContentComponent</code> or <code>AdaptationSet</code>.</li>
<li>Otherwise, the content of the '<code>id</code>' attribute in the <code>ContentComponent</code>, or <code>AdaptationSet</code>.</li>
</ul>
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td>The track:
<ul>
<li>Represents a <code>ContentComponent</code> or <code>AdaptationSet</code> containing a <code>Role</code> descriptor with <code>schemeIdURI</code> attribute = "<code>urn:mpeg:dash:role:2011</code>":
<ul>
<li>"<code>captions</code>": if the <code>Role</code> descriptor's value is "<code>caption</code>"</li>
<li>"<code>subtitles</code>": if the <code>Role</code> descriptor's value is "<code>subtitle</code>"</li>
<li>"<code>metadata</code>": otherwise</li>
</ul></li>
<li>Is an <a href="#mp4avcceacaption">ISOBMFF CEA 608 or 708 caption service</a>: "<code>captions</code>".</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
The empty string.
</td>
</tr>
<tr>
<th><code>language</code></th>
<td>The track is:
<ul>
<li>An <a href="#mp4avcceacaption">ISOBMFF CEA 608 708 caption service</a>: the value of the '<code>language</code>' field in the <code>Accessibility</code> descriptor, in the <code>ContentComponent</code> or <code>AdaptationSet</code>, where the corresponding '<code>channel-number</code>' or '<code>service-number</code>' is the same as this track's '<code>id</code>' attribute. The empty string if there is no such corresponding '<code>channel-number</code>' or '<code>service-number</code>'.</li>
<li>Otherwise: the content of the '<code>lang</code>' attribute in the <code>ContentComponent</code> or <code>AdaptationSet</code> element.</li>
</ul>
</td>
</tr>
<tr>
<th><code>inBandMetadataTrackDispatchType</code></th>
<td>
If <code>kind</code> is "<code>metadata</code>", an XML document containing the <code>AdaptationSet</code> element and all child <code>Role</code> descriptors and <code>ContentComponents</code>, and their child <code>Role</code> descriptors. The empty string otherwise.
</td>
</tr>
<tr>
<th><code>mode</code></th>
<td>
"<code>disabled</code>"
</td>
</tr>
</table>
</li>
<li><p>Track Attributes for sourced Audio and Video Tracks</p>
<p>
Data for sourcing audio and video track attributes may exist in the media content or in the MPD. Audio and video track attribute values are first sourced from track data in the media container, as described for <a href='#mpeg2tsavta'>audio and video track attributes in MPEG-2 Transport Streams</a> and <a href='#mpeg4avta'>audio and video track attributes in MPEG-4 ISOBMFF</a>. If a track attribute's value cannot be determined from the media container, then the track attribute value is sourced from data in the track's <code>ContentComponent</code>. If the needed attribute or element does not exist on the <code>ContentComponent</code> (or if the <code>AdaptationSet</code> doesn't contain any <code>ContentComponents</code>), then that attribute or element is sourced from the <code>AdaptationSet</code>:
</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
Content of the <code>id</code> attribute in the <code>ContentComponent</code> or <code>AdaptationSet</code> element. Empty string if the <code>id</code> attribute is not present on either element.
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td>
<p>Given a <code>Role</code> scheme of "<code>urn:mpeg:dash:role:2011</code>", determine the <code>kind</code> attribute from the value of the <code>Role</code> descriptors in the <code>ContentComponent</code> <b>and</b> <code>AdaptationSet</code> elements.</p>
<ul>
<li>"<code>alternative</code>": if the role is "<code>alternate</code>" but not also "<code>main</code>" or "<code>commentary</code>", or "<code>dub</code>"</li>
<li>"<code>captions</code>": if the role is "<code>caption</code>" and also "<code>main</code>"</li>
<li>"<code>descriptions</code>": if the role is "<code>description</code>" and also "<code>supplementary</code>"</li>
<li>"<code>main</code>": if the role is "<code>main</code>" but not also "<code>caption</code>", "<code>subtitle</code>", or "<code>dub</code>"</li>
<li>"<code>main-desc</code>": if the role is "<code>main</code>" and also "<code>description</code>"</li>
<li>"<code>sign</code>": not used</li>
<li>"<code>subtitles</code>": if the role is "<code>subtitle</code>" and also "<code>main</code>"</li>
<li>"<code>translation</code>": if the role is "<code>dub</code>" and also "<code>main</code>"</li>
<li>"<code>commentary</code>": if the role is "<code>commentary</code>" but not also "<code>main</code>"</li>
<li>"": otherwise</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
The empty string.
</td>
</tr>
<tr>
<th><code>language</code></th>
<td>
Content of the <code>lang</code> attribute in the <code>ContentComponent</code> or <code>AdaptationSet</code> element.
</td>
</tr>
</table>
</li>
<li><p>Mapping Text Track content into text track cues</p>
<p>
<code>TextTrackCue</code> objects may be sourced from DASH media content in the WebVTT, TTML, MPEG-2 TS or ISOBMFF format.
</p>
<p>
Media content with the MIME type "<code>text/vtt</code>" is in the WebVTT format and should be exposed as a <code>VTTCue</code> object as defined in [[WEBVTT]].
</p>
<p>
Media content with the MIME type "<code>application/ttml+xml</code>" is in the TTML format and should be exposed as an as yet to be defined <code>TTMLCue</code> object. Alternatively, browsers can also map the TTML features to <code>VTTCue</code> objects [[WEBVTT]]. Finally, browsers that cannot render TTML [[ttaf1-dfxp]] format data should expose them as <code>DataCue</code> objects [[HTML51]]. In this case, the TTML file must be parsed in its entirety and then converted into a sequence of TTML Intermediate Synchronic Documents (ISDs). Each ISD creates a <code>DataCue</code> object with attributes sourced as follows:
<p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>Decimal representation of the <code>id</code> attribute of the <code>head</code> element in the XML document. Null if there is no <code>id</code> attribute.</td>
</tr>
<tr>
<th><code>startTime</code></th>
<td>
Value of the beginning media time of the active temporal interval of the ISD.
</td>
</tr>
<tr>
<th><code>endTime</code></th>
<td>
Value of the ending media time of the active temporal interval of the ISD.
</td>
</tr>
<tr>
<th><code>pauseOnExit</code></th>
<td>"<code>false</code>"</td>
</tr>
<tr>
<th><code>data</code></th>
<td>The (UTF-16 encoded) <code>ArrayBuffer</code> composing the ISD resource.</td>
</tr>
</table>
</p>
</p>
<p>
Media content with the MIME type "<code>application/mp4</code>" or "<code>video/mp4</code>" is in the [[ISOBMFF]] format and should be exposed following the same rules as for <a href='#ISOBMFF-TT'>ISOBMFF text track</a>.
</p>
<p>
Media content with the MIME type "<code>video/mp2t</code>" is in the MPEG-2 TS format and should be exposed following the same rules as for <a href='#MPEG2TS-TT'>MPEG-2 TS text track</a>.
</p>
</li>
</ol>
</section>
<section id='mpeg2ts'>
<h2>MPEG-2 Transport Streams</h2>
<b>MIME type/subtype: <code>audio/mp2t</code>, <code>video/mp2t</code></b>
<ol>
<li><p>Track Order</p>
<p>
Tracks are called "elementary streams" in a MPEG-2 Transport Stream (TS) [[MPEG2TS]]. The order in which elementary streams are listed in the "Program Map Table" (PMT) of a MPEG-2 TS is maintained when sourcing multiple MPEG-2 tracks into HTML. Additions or deletions of elementary streams in the PMT should invoke <code>addtrack</code> or <code>removetrack</code> events in the user agent.
</p>
<p class='note'>The order of elementary streams in the PMT may change between when the media resource was created and when it is received by the user agent. Scripts should not infer any information from the ordering, or rely on any particular ordering being present.</p>
</li>
<li><p>Determining the type of track</p>
<p>
A user agent recognizes and supports data in an MPEG-2 TS elementary stream identified by the <code>elementary_PID</code> field in the Program Map Table as being equivalent to an HTML track based on the value of the <code>stream_type</code> field associated with that <code>elementary_PID</code>:
</p>
<ul>
<li>text track:
<ul>
<li>The elementary stream with PID 0x02 or the <code>stream_type</code> value is "0x02", "0x05" or between "0x80" and "0xFF". </li>
<li><dfn id="captionservice">The CEA 708 caption service</dfn> [[CEA708]], as identified by:
<ul>
<li>A <code>caption_service_descriptor</code> [[ATSC65]] in the 'Elementary Stream Descriptors' in the PMT entry for a video stream with stream type 0x02 or 0x1B.</li>
<li>For <code>stream_type</code> 0x02, the presence of caption data in the <code>user_data()</code> field [[ATSC52]].</li>
<li>For <code>stream_type</code> 0x1B, the presence of caption data in the <code>ATSC1_data()</code> field [[SCTE128-1]].</li>
</ul>
</li>
<li>a DVB subtitle component [[DVB-SUB]] as identified by a <code>subtitling_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
<li>an ITU-R System B Teletext component [[DVB-TXT]] as identified by an <code>teletext_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
<li>a VBI data component [[DVB-VBI]] as identified by a <code>VBI_data_descriptor</code> [[DVB-SI]] or a <code>VBI_teletext_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
</ul>
<li>video track: the <code>stream_type</code> value is "0x01", "0x02", "0x10", "0x1B", between "0x1E" and "0x24" or "0xEA".</li>
<li>audio track:
<ul>
<li>the <code>stream_type</code> value is "0x03", "0x04", "0x0F", "0x11", "0x1C", "0x81" or "0x87".</li>
<li>an AC-3 audio component as identified by an <code>AC-3_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
<li>an Enhanced AC-3 audio component as identified by an <code>enhanced_ac-3_descriptor</code> [[DVB-SI]]in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
<li>a DTS® audio component as identified by a <code>DTS_audio_stream_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
<li>a DTS-HD® audio component as identified by a <code>DTS-HD_audio_stream_descriptor</code> [[DVB-SI]] in the 'Elementary Stream Descriptors' in the PMT entry for a stream with a <code>stream_type</code> of "0x06"</li>
</ul>
</ul>
</li>
<li id="mpeg2tstta"><p>Track Attributes for sourced Text Tracks</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
Decimal representation of the elementary stream's identifier (<code>elementary_PID</code> field) in the PMT.
<p>
For CEA 608 closed captions, the string "cc" concatenated with the decimal representation of the channel number.
</p>
<p>
For CEA 708 closed captions, the string "sn" concatenated with the decimal representation of the <code>service_number</code> field in the 'Caption Channel Service Block'.
</p>
<p>
If program 0 (zero) is present in the transport stream, a string of the format "OOOO.TTTT.SSSS.CC" consisting of the following, lower-case hexadecimal encoded fields:
<ul>
<li>OOOO is the four character representation of the 16-bit <code>original_network_id</code> [[DVB-SI]].</li>
<li>TTTT is the four character representation of the 16-bit <code>transport_stream_id</code> [[DVB-SI]].</li>
<li>SSSS is the four character representation of the 16-bit <code>service_id</code> [[DVB-SI]].</li>
<li>CC is:
<ul>
<li>If a <code>stream_identifier_descriptor</code> [[DVB-SI]] is present in the PMT, a two character representation of the 8-bit <code>component_tag</code> value.</li>
<li>Otherwise, a four character representation of the elementary stream's identifier (13-bit <code>elementary_PID</code> field) in the PMT.</li>
</ul>
</li>
</ul>
</p>
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td>
<ul>
<li>"<code>captions</code>":
<ul>
<li>For a <a href="#captionservice">CEA708 caption service.</a></li>
<li>for a DVB subtitle component [[DVB-SUB]] as identified by a <code>subtitling_descriptor</code> [[DVB-SI]] in the PMT with a <code>subtitling_type</code> in the range "0x20" to "0x25".</li>
<li>an ITU-R System B Teletext component [[DVB-TXT]] as identified by an <code>teletext_descriptor</code> [[DVB-SI]] with a <code>teletext_type</code> value of "0x05" in the PMT</li>
<li>a VBI data component [[DVB-VBI]] as identified by a <code>VBI_teletext_descriptor</code> [[DVB-SI]] with a <code>teletext_type</code> value of "0x05" in the PMT.</li>
</ul>
<li>"<code>subtitles</code>":
<ul>
<li>If the stream type value is "0x82".</li>
<li>for a DVB subtitle component [[DVB-SUB]] as identified by a <code>subtitling_descriptor</code> [[DVB-SI]] in the PMT with a <code>subtitling_type</code> in the range "0x10" to "0x15".</li>
<li>an ITU-R System B Teletext component [[DVB-TXT]] as identified by an <code>teletext_descriptor</code> [[DVB-SI]] with a <code>teletext_type</code> value of "0x02" in the PMT</li>
<li>a VBI data component [[DVB-VBI]] as identified by a <code>VBI_teletext_descriptor</code> [[DVB-SI]] with a <code>teletext_type</code> value of "0x02" in the PMT.</li>
</ul>
<li>"<code>metadata</code>": otherwise</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
<ul>
<li>If a <code>component_name_descriptor</code> [[ATSC65]] is found immediately after the <code>ES_info_length</code> field in the Program Map Table [[MPEG2TS]], the <code>DOMString</code> representation of the <code>component_name_string</code> in that <code>component_name_descriptor</code>.</li>
<li>If a <code>component_descriptor</code> [[DVB-SI]] for the component is present in the SDT or EIT, the <code>DOMString</code> representation of the content of the text field in that <code>component_descriptor</code></li>
<li>The empty string otherwise.</li>
</ul>
</td>
</tr>
<tr>
<th><code>language</code></th>
<td><code>kind</code> is
<ul>
<li>"<code>captions</code>":
<ul>
<li>For a <a href="#captionservice">CEA708 caption service.</a>
<ul>
<li>Content of the <code>language</code> field for the caption service in the <code>caption_service_descriptor</code>, if present.</li>
<li>Otherwise, for the first caption service, as identified by the <code>service_number</code> field in the <code>service_block</code> [[CEA708]] with a value of 1, the value of <code>language</code> of the audio track where <code>kind</code> has the value "<code>main</code>".</li>
<li>The empty string for all other caption services, as identified by values greater than 1 in the <code>service_number</code> field.</li>
</ul>
</li>
<li>For a DVB subtitle component [[DVB-SUB]], the value of the <code>ISO_639_language_code</code> field in the <code>subtitling_descriptor</code> [[DVB-SI]] in the PMT</li>
<li>For an ITU-R System B Teletext component [[DVB-TXT]], the value of the <code>ISO_639_language_code</code> field in the <code>teletext_descriptor</code> [[DVB-SI]] in the PMT</li>
<li>For a VBI data component [[DVB-VBI]], the value of the <code>ISO_639_language_code</code> field in the <code>VBI_teletext_descriptor</code> [[DVB-SI]] in the PMT</li>
</ul>
</li>
<li>"<code>subtitles</code>":
<ul>
<li> If <code>stream_type</code> value is "0x82", the content of the <code>ISO_639_language_code</code> field in the <code>ISO_639_language_descriptor</code> in the elementary stream descriptor array in the PMT.</li>
<li>for a DVB subtitle component [[DVB-SUB]], the value of the <code>ISO_639_language_code</code> field in the <code>subtitling_descriptor</code> [[DVB-SI]] in the PMT</li>
<li>for an ITU-R System B Teletext component [[DVB-TXT]], the value of the <code>ISO_639_language_code</code> field in the <code>teletext_descriptor</code> [[DVB-SI]] in the PMT</li>
<li>for a VBI data component [[DVB-VBI]], the value of the <code>ISO_639_language_code</code> field in the <code>VBI_teletext_descriptor</code> [[DVB-SI]] in the PMT</li>
</ul>
</li>
<li>"<code>metadata</code>": The empty string.</li>
</ul>
</td>
</tr>
<tr>
<th><code>inBandMetadataTrackDispatchType</code></th>
<td>
If <code>kind</code> is "<code>metadata</code>", then the concatenation of the <code>stream_type</code> byte field in the program map table and <code>ES_info_length</code> bytes following the <code>ES_info_length</code> field expressed in hexadecimal using <a href="https://html.spec.whatwg.org/multipage/infrastructure.html#uppercase-ascii-hex-digits">uppercase ASCII hex digits</a>. The empty string otherwise.
</td>
</tr>
<tr>
<th><code>mode</code></th>
<td>
"<code>disabled</code>"
</td>
</tr>
</table>
</li>
<li id="mpeg2tsavta"><p>Track Attributes for sourced Audio and Video Tracks</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
<ul>
<li>Decimal representation of the elementary stream's identifier (<code>elementary_PID</code> field) in the PMT.</li>
<li>If a program 0 (zero) is present in the transport stream, a string of the format "OOOO.TTTT.SSSS.CC" or "OOOO.TTTT.SSSS.CC&CC", consisting of the following, lower-case hexadecimal encoded fields:
<ul>
<li>OOOO is the four character representation of the 16-bit <code>original_network_id</code> [[DVB-SI]].</li>
<li>TTTT is the four character representation of the 16-bit <code>transport_stream_id</code> [[DVB-SI]].</li>
<li>SSSS is the four character representation of the 16-bit <code>service_id</code> [[DVB-SI]].</li>
<li>CC is:
<ul>
<li>If a <code>stream_identifier_descriptor</code> [[DVB-SI]] is present in the PMT, a two character representation of the 8-bit <code>component_tag</code> value.</li>
<li>Otherwise, a four character representation of the elementary stream's identifier (13-bit <code>elementary_PID</code> field) in the PMT.</li>
</ul>
</li>
</ul>
<p>
Where a track is derived from two components, the second form ("CC&CC") identifies the independent and dependent streams, where the first 'CC' identifies the independent stream, and the second 'CC' identifies the dependent stream. Otherwise the first form is used.
</p>
</li>
</ul>
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td>
<ul>
<li>If a <code>supplementary_audio_descriptor</code> [[DVB-SI]] is present in the PMT for an audio component, the value is derived according to the audio purpose defined in table J.3 of [[DVB-SI]] using the following rules:
<ul>
<li>"<code>main</code>" if PSI signalling of audio purpose indicates "Main audio" for the audio track that the user agent would select by default, otherwise to "<code>translation</code>"
<p class='note'>Need to define how UA would select track by default.</p>
</li>
<li>components with an audio purpose of "Audio description (broadcast-mix)" map to "<code>main-desc</code>"</li>
<li>components with an audio purpose of "Audio description (receiver-mix)":
<ul>
<li>The user agent exposes an audio track of <code>kind</code> "<code>main-desc</code>" for each permitted combination of this track with another audio track as defined in annex J.2 of [[DVB-SI]]. Enabling this track results in the combination being presented.</li>
<li>If the user agent can present the stream in isolation, it also exposes an audio track of <code>kind</code> "<code>descriptions</code>" for this audio component.</li>
</ul>
</li>
<li>components with an audio purpose of "Clean audio (broadcast-mix)", "Parametric data dependent stream", or "Unspecific audio for the general audience" map to "<code>alternative</code>"</li>
<li>components with other audio purposes map to the empty string</li>
</ul>
</li>
<li>Otherwise:
<ul>
<li>"<code>descriptions</code>":
<ul>
<li>For AC-3 audio [[ATSC52]] if the <code>bsmod</code> field is 2 and the <code>full_svc</code> field is 0 in the <code>AC-3_audio_stream_descriptor()</code> in the PMT</li>
<li>For E-AC-3 audio [[ATSC52]] if the <code>audio_service_type</code> field is 2 and the <code>full_service_flag</code> is 0 in the <code>E-AC-3_audio_descriptor()</code> in the PMT</li>
<li>For AAC audio [[SCTE193-2]] if the <code>AAC_service_type</code> field is 2 and the <code>receiver_mix_rqd</code> is 1 in the <code>MPEG_AAC_descriptor()</code> in the PMT</li>
</ul>
</li><!-- see http://www.atsc.org/cms/pdf/bootcamp/PSIP_Captions_rev2.pdf -->
<li>"<code>main</code>" if the first audio (video) elementary stream in the PMT and the <code>audio_type</code> field in the <code>ISO_639_language_descriptor</code>, if present, is "0x00" or "0x01"</li>
<li>"<code>main-desc</code>":
<ul>
<li>For AC-3 audio [[ATSC52]] if the <code>bsmod</code> field is 2 and the <code>full_svc</code> field is 1 in the <code>AC-3_audio_stream_descriptor()</code></li>
<li>For E-AC-3 audio [[ATSC52]] if the <code>audio_service_type</code> field is 2 and the <code>full_service_flag</code> is 1 in the <code>E-AC-3_audio_descriptor()</code></li>
<li>For AAC audio [[SCTE193-2]] if the <code>AAC_service_type</code> field is 2 and the <code>receiver_mix_rqd</code> is 0 in the <code>MPEG_AAC_descriptor()</code></li>
</ul>
</li>
<li>"<code>sign</code>" video components with a <code>component_descriptor</code> [[DVB-SI]] in the SDT or EIT, where the <code>stream_content</code> is "0x3" and the <code>component_type</code> is "0x30" or "0x31"</li>
<li>"<code>translation</code>": not first audio elementary stream in the PMT and the <code>audio_type</code> field in the <code>ISO_639_language_descriptor</code> is "0x00" or "0x01" <font color='red'>and bsmod=0</font></li>
<li>"": otherwise</li>
</ul>
</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
<ul>
<li>If a <code>component_descriptor</code> [[DVB-SI]] is present in the SDT or EIT, the <code>DOMString</code> representation of the content of the text field in that <code>component_descriptor</code></li>
<li>If a <code>component_name_descriptor</code> [[ATSC65]] is present for this elementary in the Program Map Table [[MPEG2TS]], the <code>DOMString</code> representation of the <code>component_name_string</code> field in that descriptor .</li>
<li> The empty string otherwise.</li>
</ul>
</td>
</tr>
<tr>
<th><code>language</code></th>
<td><code>kind</code> is:
<ul>
<li>"<code>descriptions</code>" or "<code>main-desc</code>": Content of the <code>language</code> field in the <code>AC-3_audio_stream_descriptor</code> or <code>AC-3_audio_stream_descriptor</code> [[ATSC52]] if present.</li>
<li> otherwise: Content of the <code>ISO_639_language_code</code> field in the <code>ISO_639_language_descriptor</code>.</li>
</ul>
</td>
</tr>
</table>
</li>
<li><p><dfn id='MPEG2TS-TT'>Mapping Text Track content into text track cues for MPEG-2 TS</dfn></p>
<p>
MPEG-2 transport streams may contain data that should be exposed as cues on "<code>captions</code>", "<code>subtitles</code>" or "<code>metadata</code>" text tracks. No data is defined that equates to "<code>descriptions</code>" or "<code>chapters</code>" text track cues.
</p>
<ol type=a>
<li><p>Metadata cues</p>
<p>
Cues on an MPEG-2 metadata text track are created as <code>DataCue</code> objects [[HTML51]]. Each <code>section</code> in an elementary stream identified as a text track creates a <code>DataCue</code> object with its <code>TextTrackCue</code> attributes sourced as follows:
</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
The empty string.
</td>
</tr>
<tr>
<th><code>startTime</code></th>
<td>0</td>
</tr>
<tr>
<th><code>endTime</code></th>
<td>
The time, in the media resource timeline, that corresponds to the presentation time of the video frame received immediately prior to the <code>section</code> in the media resource.
</td>
</tr>
<tr>
<th><code>pauseOnExit</code></th>
<td>"<code>false</code>"</td>
</tr>
<tr>
<th><code>data</code></th>
<td>
The entire MPEG-TS section, starting with <code>table_id</code> and ending <code>section_length</code> bytes after the <code>section_length</code> field.
</td>
</tr>
</table>
</li>
<li><p><dfn id='CEA708Cue'>Captions cues</dfn><p>
<ul>
<li>CEA 708
<p>
MPEG-2 TS captions in the CEA 708 format [[CEA708]] are carried in the video stream in Picture User Data [[ATSC53-4]] for <code>stream_type</code> 0x02 and in Supplemental Enhancement Information [[ATSC72-1]] for <code>stream_type</code> 0x1B. Browsers that can render the CEA 708 format should expose the caption data to the web application by mapping the CEA 708 features to <code>VTTCue</code> objects [[VTT708]].
</p>
</li>
<li><p>DVB</p>
<p>
MPEG-2 TS captions in the DVB subtitle format [[DVB-SUB]], ITU-R System B Teletext [[DVB-TXT]] and VBI [[DVB-VBI]] formats are not exposed in a <code>TextTrackCue</code>.
</p>
</li>
</ul>
</li>
<li><p>Subtitles cues</p>
<ul>
<li>SCTE 27
<p>
MPEG-2 TS subtitles in the SCTE 27 format [[SCTE27]] should should be exposed in an as yet to be specified <code>SCTE27Cue</code> objects. Alternatively, browsers can also map the SCTE 27 features to <code>VTTCue</code> object via an as yet to be specified mapping process. Finally, browsers that cannot render SCTE 27 subtitles, should expose them as <code>DataCue</code> objects [[HTML51]]. In this case, each <code>section</code> in an elementary stream identified as a subtitles text track creates a <code>DataCue</code> object with <code>TextTrackCue</code> attributes sourced as follows:
</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
The empty string.
</td>
</tr>
<tr>
<th><code>startTime</code></th>
<td>
The time, in the HTML media resource timeline, that corresponds to the <code>display_in_PTS</code> field in the <code>section</code> data.
</td>
</tr>
<tr>
<th><code>endTime</code></th>
<td>
The sum of the <code>startTime</code> and the <code>display_duration</code> field in the <code>section</code> data expressed in seconds.
</td>
</tr>
<tr>
<th><code>pauseOnExit</code></th>
<td>"<code>false</code>"</td>
</tr>
<tr>
<th><code>data</code></th>
<td>
The entire MPEG-TS section, starting with <code>table_id</code> and ending <code>section_length</code> bytes after the <code>section_length</code> field.
</td>
</tr>
</table>
</li>
<li><p>DVB</p>
<p>
MPEG-2 TS subtitles in the DVB subtitle format [[DVB-SUB]], ITU-R System B Teletext [[DVB-TXT]] and VBI [[DVB-VBI]] formats are not exposed in a <code>TextTrackCue</code>.
</p>
</li>
</ul>
</li>
</ol>
</li>
</ol>
</section>
<section id='mpeg4'>
<h2>MPEG-4 ISOBMFF</h2>
<b>MIME type/subtype: <code>audio/mp4</code>, <code>video/mp4</code>, <code>application/mp4</code></b>
<ol>
<li><p>Track Order</p>
<p>
The order of tracks specified by <code>TrackBox</code> (<code>trak</code>) boxes in the <code>MovieBox</code> (<code>moov</code>) container [[ISOBMFF]] is maintained when sourcing multiple MPEG-4 tracks into HTML.
</p>
</li>
<li><p>Determining the type of track</p>
<p>
A user agent recognises and supports data from a <code>TrackBox</code> as being equivalent to a HTML track based on the value of the <code>handler_type</code> field in the <code>HandlerBox</code> (<code>hdlr</code>) of the <code>MediaBox</code> (<code>mdia</code>) of the <code>TrackBox</code>:
</p>
<ul>
<li>text track:
<ul>
<li>the <code>handler_type</code> value is "<code>meta</code>", "<code>subt</code>" or "<code>text</code>"</li>
<li>the <code>handler_type</code> value is "<code>vide</code>" and an <dfn id="mp4avcceacaption"> ISOBMFF CEA 608 or 708 caption service </dfn> is encapsulated in the video track as an SEI message as defined in [[DASHIFIOP]].</li>
</ul>
<li>video track: the <code>handler_type</code> value is "<code>vide</code>"</li>
<li>audio track: the <code>handler_type</code> value is "<code>soun</code>"</li>
</ul>
</li>
<li id="mpeg4tta"><p>Track Attributes for sourced Text Tracks</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
<p>
For <a href="#mp4avcceacaption">ISOBMFF CEA 608 closed captions</a>, the string "cc" concatenated with the decimal representation of the <code>channel_number</code>.
</p>
<p>
For <a href="#mp4avcceacaption">ISOBMFF CEA 708 closed captions</a>, the string "sn" concatenated with the decimal representation of the <code>service_number</code> field in the 'Caption Channel Service Block'.
</p>
<p>
Otherwise, the decimal representation of the <code>track_ID</code> of a <code>TrackHeaderBox</code> (<code>tkhd</code>) in a <code>TrackBox</code> (<code>trak</code>).
</p>
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td><!-- see http://www.mp4ra.org/codecs.html -->
<ul>
<li>"<code>captions</code>":
<ul>
<li><dfn id="WebVTTcaption">WebVTT caption</dfn>: <code>handler_type</code> is "<code>text</code>" and <code>SampleEntry</code> format is <code>WVTTSampleEntry</code> [[ISO14496-30]] and the VTT metadata header <code>Kind</code> is "<code>captions</code>"</li>
<li><dfn id="SMPTETTcaption">SMPTE-TT caption</dfn>: <code>handler_type</code> is "<code>subt</code>" and <code>SampleEntry</code> format is <code>XMLSubtitleSampleEntry</code> [[ISO14496-30]] and the <code>namespace</code> is set to "<code>http://www.smpte-ra.org/schemas/2052-1/2013/smpte-tt#cea708</code>" [[SMPTE2052-11]].</li>
<li>An <a href="#mp4avcceacaption">ISOBMFF CEA 608 or 708 caption service</a>.</li>
<li><dfn id="3GPPcaption">3GPP caption</dfn>: <code>handler_type</code> is "<code>text</code>" and the <code>SampleEntry</code> code (<code>format</code> field) is "<code>tx3g</code>". <p class='note'>Are all sample entries of this type "<code>captions</code>"?</p></li>
</ul>
</li>
<li>"<code>subtitles<code>":
<ul>
<li><dfn id="WebVTTsubtitle">WebVTT subtitle</dfn>: <code>handler_type</code> is "<code>text</code>" and <code>SampleEntry</code> format is <code>WVTTSampleEntry</code> [[ISO14496-30]] and the VTT metadata header <code>Kind</code> is "<code>subtitles</code>"</li>
<li><dfn id="SMPTE-TT subtitle">SMPTE-TT subtitle</dfn>: <code>handler_type</code> is "<code>subt</code>" and <code>SampleEntry</code> format is <code>XMLSubtitleSampleEntry</code> [[ISO14496-30]] and the <code>namespace</code> is set to a TTML namespace that does not indicate a <a href="#SMPTETTcaption">SMPTE-TT caption</a>.</li>
</ul>
</li>
<li>"<code>metadata</code>": otherwise</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
Content of the <code>name</code> field in the <code>HandlerBox</code>.
</td>
</tr>
<tr>
<th><code>language</code></th>
<td>
If the track is an <a href="#mp4avcceacaption">ISOBMFF CEA 608 or 708 caption service</a> then the empty string ("").
<p>
Otherwise, the content of the <code>language</code> field in the <code>MediaHeaderBox</code>.
</p>
<p class='note'>
No signaling is currently defined for specifying the langaugae of CEA 608 or 708 captions in ISOBMFF. MPEG DASH MPDs may specify caption track metadata, including language [[DASHIFIOP]]. The user agent should set the <code>language</code> attribute of CEA 608 or 708 caption text tracks to the empty string so that script may use the media source extensions [[MSE]] <code>TrackDefault</code> object to provide a default for the <code>language</code> attribute.
</p>
</td>
</tr>
<tr>
<th><code>inBandMetadataTrackDispatchType</code></th>
<td>
<ul>
<li><code>kind</code> is "<code>metadata</code>":
<ul>
<li>if a <code>XMLMetaDataSampleEntry</code> box is present the concatenation of the string "<code>metx</code>", a U+0020 SPACE character, and the value of the <code>namespace</code> field</li>
<li>if a <code>TextMetaDataSampleEntry</code> box is present the concatenation of the string "<code>mett</code>", a U+0020 SPACE character, and the value of the <code>mime_format</code> field</li>
<li>otherwise the empty string</li>
</ul>
</li>
<li>otherwise the empty string</li>
</ul>
</td>
</tr>
<tr>
<th><code>mode</code></th>
<td>
"<code>disabled</code>"
</td>
</tr>
</table>
</li>
<li id="mpeg4avta"><p>Track Attributes for sourced Audio and Video Tracks</p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
Decimal representation of the <code>track_ID</code> of a <code>TrackHeaderBox</code> (<code>tkhd</code>) in a <code>TrackBox</code> (<code>trak</code>).
</td>
</tr>
<tr>
<th><code>kind</code></th>
<td>
<ul>
<li>"<code>alternative</code>": not used</li>
<li>"<code>captions</code>": not used</li>
<li>"<code>descriptions</code>"
<ul>
<li>For E-AC-3 audio [[ETSI102366]] if the <code>bsmod</code> field is 2 and the <code>asvc</code> is 1 in the <code>EC3SpecificBox</code></li>
</ul>
</li>
<li>"<code>main</code>": first audio (video) track</li>
<li>"<code>main-desc</code>
<ul>
<li>For AC-3 audio [[ETSI102366]] if the <code>bsmod</code> field is 2 in the <code>AC3SpecificBox</code></li>
<li>For E-AC-3 audio [[ETSI102366]] if the <code>bsmod</code> field is 2 and the <code>asvc</code> is 0 in the <code>EC3SpecificBox</code></li>
</ul>
</li>
<li>"<code>sign</code>": not used</li>
<li>"<code>subtitles</code>": not used</li>
<li>"<code>translation</code>": not first audio (video) track</li>
<li>"<code>commentary</code>": not used</li>
<li>"": otherwise</li>
</ul>
</td>
</tr>
<tr>
<th><code>label</code></th>
<td>
Content of the <code>name</code> field in the <code>HandlerBox</code>.
</td>
</tr>
<tr>
<th><code>language</code></th>
<td>
Content of the <code>language</code> field in the <code>MediaHeaderBox</code>.
</td>
</tr>
</table>
</li>
<li><p><dfn id='ISOBMFF-TT'>Mapping Text Track content into text track cues for MPEG-4 ISOBMFF</dfn></p>
<p>
[[ISOBMFF]] text tracks may be in the WebVTT or TTML format [[ISO14496-30]], 3GPP Timed Text format [[3GPP-TT]], or other format.
</p>
<p>
[[ISOBMFF]] text tracks carry WebVTT data if the media handler type is "<code>text</code>" and a <code>WVTTSampleEntry</code> format is used, as described in [[ISO14496-30]]. Browsers that can render text tracks in the WebVTT format should expose a <code>VTTCue</code> [[WEBVTT]] as follows:
</p>
<p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>
The <code>cue_id</code> field in the <code>CueIDBox</code>.
</td>
</tr>
<tr>
<th><code>startTime</code></th>
<td>
The sample presentation time.
</td>
</tr>
<tr>
<th><code>endTime</code></th>
<td>
The sum of the <code>startTime</code> and the sample duration.
</td>
</tr>
<tr>
<th><code>pauseOnExit</code></th>
<td>"<code>false</code>"</td>
</tr>
<tr>
<th>cue setting attributes</th>
<td>
The <code>settings</code> field in the <code>CueSettingsBox</code>.
</td>
</tr>
<tr>
<th><code>text</code></th>
<td>
The <code>cue_text</code> field in the <code>CuePayloadBox</code>.
</td>
</tr>
</table>
</p>
<p>
[[ISOBMFF]] captions in the CEA 708 format [[CEA708]] are carried in the video stream in SEI messages [[DASHIFIOP]]. Browsers that can render the CEA 708 format should expose the caption data to the web application by mapping the CEA 708 features to VTTCue objects [[VTT708]].
</p>
<p>
ISOBMFF text tracks carry TTML data if the media handler type is "<code>subt</code>" and an <code>XMLSubtileSampleEntry</code> format is used with a TTML-based <code>name_space</code> field, as described in [[ISO14496-30]]. Browsers that can render text tracks in the TTML format should expose an as yet to be defined <code>TTMLCue</code>. Alternatively, browsers can also map the TTML features to <code>VTTCue</code> objects. Finally, browsers that cannot render TTML [[ttaf1-dfxp]] format data should expose them as <code>DataCue</code> objects [[HTML51]]. Each TTML subtitle sample consists of an XML document and creates a <code>DataCue</code> object with attributes sourced as follows:
<p>
<table>
<thead>
<th>Attribute</th>
<th>How to source its value</th>
</thead>
<tr>
<th><code>id</code></th>
<td>Decimal representation of the <code>id</code> attribute of the <code>head</code> element in the XML document. Null if there is no <code>id</code> attribute.</td>
</tr>
<tr>
<th><code>startTime</code></th>
<td>
Value of the beginning media time of the top-level temporal interval of the XML document.
</td>
</tr>
<tr>
<th><code>endTime</code></th>
<td>
Value of the ending media time of the top-level temporal interval of the XML document.
</td>
</tr>
<tr>
<th><code>pauseOnExit</code></th>
<td>"<code>false</code>"</td>
</tr>
<tr>
<th><code>data</code></th>
<td>The (UTF-16 encoded) <code>ArrayBuffer</code> composing the XML document.</td>
</tr>
</table>
</p>
</p>
<p>
TTML data may contain tunneled CEA708 captions [[SMPTE2052-11]]. Browsers that can render CEA708 data should expose it as defined for <a href='#CEA708Cue'>MPEG-2 TS CEA708 cues</a>.
</p>
<p>
3GPP timed text data is carried in [[ISOBMFF]] as described in [[3GPP-TT]]. Browsers that can render text tracks in the 3GPP Timed Text format should expose an as yet to be defined <code>3GPPCue</code>. Alternatively, browsers can also map the 3GPP features to <code>VTTCue</code> objects.
</p>
</li>
</ol>
</section>
<section id='webm'>
<h2>WebM</h2>
<b>MIME type/subtype: <code>audio/webm</code>, <code>video/webm</code></b>
<ol>
<li><p>Track Order</p>
<p>
The order of tracks specified in the EBML initialisation segment [[WebM]] is maintained when sourcing multiple WebM tracks into HTML.
</p>
</li>
<li><p>Determining the type of track</p>
<p>
A user agent recognises and supports data from a WebM resource as being equivalent to a HTML track based on the value of the <code>TrackType</code> field of the track in the Segment info:
</p>
<ul>
<li>text track: <code>TrackType</code> field is "0x11" or "0x21"</li>
<li>video track: <code>TrackType</code> field is "0x01"</li>
<li>audio track: <code>TrackType</code> field is "0x02"</li>
</ul>
</li>