forked from w3c/webmediaguidelines
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
354 lines (326 loc) · 21.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Web Media Application Developer Guidelines</title>
<script
src="https://www.w3.org/Tools/respec/respec-w3c-common"
class='remove'></script>
<script class="remove">
var respecConfig = {
specStatus: "CG-DRAFT",
editors: [{
name: "Jeff Burtoft",
url: "mailto:[email protected]",
company: "Microsoft",
companyURL: "http://www.microsoft.com"
},{
name: "Thasso Greibel",
url: "mailto:[email protected]",
company: "Cast Labs",
companyURL: "http://www.castlabs.com"
},{
name: "Joel Korpi",
url: "mailto:[email protected]",
company: "JW Player",
companyURL: "http://www.jwplayer.com"
}],
processVersion: 2015,
edDraftURI: "http://w3c.github.io/webmediaguidelines",
shortName: "dahut",
wg: "Web Media API Community Group",
wgURI: "https://www.w3.org/community/webmediaapi/",
};
</script>
</head>
<body>
<section id="abstract">
<p>
This specification companion guide to the Web Media API spec. While the Web Media API spec is targeted at device implementations to support media web apps in 2017, this specification will outline best practices and developer guidance for implementing web media apps. This specification should be updated at least annually to keep pace with the evolving Web platform. The target devices will include any device that runs a modern HTML user agent, including televisions, game machines, set-top boxes, mobile devices and personal computers.
</p>
<p>
The goal of this Web Media API Community Group specification is to transition to the W3C Recommendation Track for standards development.
</p>
</section>
<section id="sotd"></section>
<section>
<h2>Introduction</h2>
<ol class="ednote" title="Notes on v1 draft specification:">
<li>This document is directed towards application developers. Its content will contain best practices for building media applications across devices, it will not direction to device manufactures or User Agent implementers. </li>
<li>This is a companion spec put forth by the Web Media API Community Group.</li>
</ol>
<h3>Scope</h3>
<p>The scope of this document includes general guidelines, best practices, and examples for building media applications across web browsers and devices.
</p>
<p>The target audience for these guidelines are software developers and engineers focused on building cross-platform, cross-device, HTML5-based applications that contain media-specific use cases.</p>
<p>The focus of this document is on HTML5-based applications, however the use cases and principles described in the guidelines can be applied to native applications (applications that has been developed for use on a particular platform or device). The examples in this document provides a starting point to build your media application and includes example implementations from various providers and vendors. This document also includes sample content and manifests as well as encoding guidelines to maximize the provide hints on achieving the best quality and efficiency for your media applications.</p>
<h3>Definitions</h3>
<p>TBD</p>
</section>
<section>
<h2>Media Playback Use Cases</h2>
<p>
</p>
<section>
<h3>Streaming overview</h3>
<p>
NOTE: Adding a brief introductory section that removes the need for duplication of material across VOD and linear sections. Here we can outline the broad mechanics that are shared by the two use cases allowing us to focus on the distinctions in the respective sections.
</p>
<h4>General Description</h4>
<p>Material (typically in video or audio content) is made available by a content provider via a web-enabled application and delivered by a content distribution network. There are three distinct interlocking processes: generation, delivery and consumption / playback.
</p>
<h4>Content Generation</h4>
<p>
Content is normally delivered by file that has near lossless compression (eg 25mbps to 100mbps).
</p>
<p>
Content is generated by taking this source and encoding it with reference to an encoding profile. First the content is duplicated into different versions. After this is it split temporally into segments of different length.
</p>
<p>
The first process is defined by an encoding profile. Profiles describe a set of constraints to be used when video is being prepared for consumption by a range of video applications. The description includes the different bitrates to be generated during the encoding process that will allow for the same content to be consumed on a wide variety of devices in different networked scenarios from cellular to LAN.
</p>
<p>
The second process is performed by a packager which segments the source files of differing qualities. These are then packaged into a transport format such as transport streams (.ts) or fragmented mp4s (.m4s) after this they are encrypted with a DRM that is suitable for the environment where the content is going to be played out. The packager is also responsible for the generation of a manifest file, typically a DASH (.mpd), HLS (.m3u8) or possibly Smooth (.ism) or HDS (.f4v) describes the location of the media and its format.
</p>
<h4>Content Delivery</h4>
<p>
After the content has been generated the resulting segments of video and corresponding manifest files are pushed to an origin server. However, the assets are rarely delivered directly from the origin.
</p>
<p>
At this point the control in the chain switches to the client web video application. The content provider supplies the client with the URL of a manifest file located on a CDN rather than the origin. /
The manifest is typically passed to a player. The player makes a GET request for the manifest from an edge server in a CDN, this is determined by DNS. The CDN does one of two things: if it has the asset it returns it to the player, if it does not it requests it from the origin and then returns it to the client.
</p>
<h4>Content Playback</h4>
<p>
Once the player has the manifest it parses it. At this point the behaviour differs between players found on the different devices depending on the transport formats. However, broadly, it behaves in the following way: <br>
<ol>
<li>The clinet uses DRM license URL to request a secure key to enable decoding of the media.</li>
<li>The players ABR determines the bandwidth available to the client by examining the response times associated with the request for the first chunk of video, how many bytes were received in what time period. This provides enough information to determine the playback quality that the player can sustain over a proportion of the length of the asset, or in the case of live streaming a specific timeframe.</li>
<li>Once the player has this information it can then compare this with the the metadata from the manifest that describes the different qualities that the content provider is supplying. It picks the quality level with an average bitrate that is as close to the available bitrate but within its bounds to avoid a situation where a consistent experience is interrupted as the player requires more data than the current network bandwidth can supply, where the player’s buffer is emptying faster than it is being filled.
<li>It then requests a segment from a location on the edge server that typically relative to the location of the manifest.
<li>Once it has received the segment it is then typically decrypted in accordance with the specific DRM used.
<li>It then adds it to the players video buffer.
<li>The media engine pulls the video data from the buffer and passes it to the video surface where it is played out.
<li>In a situation where the bandwidth availability remains constant the player will continue to request chunks from the same quality stream. In the event a change in network performance the player will make a decision about the need to either drop to a lower quality stream or request a stream from a higher quality. </li></ol>
</p>
</section>
<section>
<h3>On-Demand Streaming</h3>
<p>
Despite the almost identical mechanics used for the two types of content, their generation, delivery and playout, a large organsiation will typically maintain two distinct workflows for VOD and linear content as there are subtle but important ways in which they differ.
</p>
<h4>Content Generation</h4>
<p>
For VOD the source is typically tape rather than a feed. The encoding profiles are also subtly different. A greater priority can be placed on high quality as the latency, time to live is not a requirement. To this end the encoder is able to pioritise density over quality via configuration allowing VOD encoders to spend more time on each frame. There are important differences in the manifests created. In HLS there is a tag that tells the player whether the playlist is describing ondemand material: #EXT-X-PLAYLIST-TYPE:VOD. As we will see shortly this is used by the player. There are also client side restrictions where certain profiles are blocked due to rights restrictions and network consumption capped bitrates on Movies and Entertainment whilst being alowed on sports content. Fragmnet size will also effect playout as a players ABR can be more responsive if the chunks are smaller, e.g. 2 seconds rather than 10 seconds.
</p>
<h4>Content Delivery</h4>
<p>
The CDN configuration and topology for delivering VOD content is also different to linear. There are different levels of caching; popular VOD content which is kept closer to the edge in the CDN network in this way it can be delivered to customers faster than an edge server that isn't tuned for high volume delivery. Older and less popular content is retained in mid-tier caching whilst the long tail content is relegated to a lower tier.
</p>
<h4>Content Playback</h4>
<p>
As mentioned in the content generation section the player uses a tag within the manifest to determine the playout type. In HLS if there is a type is VOD then the player will not reload the manifest. This has important consequences if there are changes in availability after the session as commenced. In DASH the difference between a live and VOD playlist are more subtle (more detail) </p>
<p>
There are other differences in playout as well. Unlike linear, a VOD asset has a predefined duration, infomation around duration / current time can be used to update the UI to provide feedback to the user on the amount and proportion of the asset watched.
</p>
<p>
At a broader level the UX requirements will be different in respect to the need for representing static rather than linear content where a tile view rather an an epg is required. There is a requiremnet for Trick Play.
</p>
<h3>VOD usecases</h3>
<p>
In the previous section we outlined the use cases associated with video streaming. In this section we give some examples of use cases that are specific to on-demand streaming and are mainly related to strategies employed on the clients to improve performance in some way.
</p>
<h4> Pre-caching </h4>
<p>
The key performance indicator for most streaming services will be the percentage of sessions that experience buffering as a ratio to the length of the session. Buffering, the state of the video application when the player has insufficient content within its framebuffer to continuously play content, within a session has a direct relationship to engagement and as a consequence retention. For every second of buffering within a session 10pc of users abandon a video stream. Precaching is a strategy used in on-demand streaming. Web video application developers will use points within an application's UX to precache content, for example when entering a mezzanine/synopsis page the application might connect to a stream and begin to pull content and either add it directly to the players video buffer or alternatively store the chunks locally. The consequence of this is that when/if the user chooses to play the content after reading the synopsis the video will commence playing without buffering and hence provide the user with a preferable experience to a buffering indicator. This technique is used by Netflix, Sky and the BBC in the case of on-demand content being watched in an in home context. This technique is not used for cellular sessions where users mobile data would be consumed potentially on content that they do not watch.
</p>
<h4>
Caching </h4>
Like pre-caching this technique is used to reduce buffering. In this scenario the user has consumed a piece of content and the chunks persist in local http cache. As a consequence they do not need to download the chunk again if they watch the same content subsequently. The amount of time the material can remain cached is defined in relation to the rights the service provider to the content. These rights will be different for different types of content and in different territories.
<h4> Bitrate capping </h4>
<p> Despite a device's capability to playout a stream of a higher bitrate there are circumstances where the application developer may wish to cap the bitrate. An example of this relates to rights. The service provider is often required to limit the availability of of content to below a an agreed bitrate / resolution. This is done to comply with business logic in respect to product differentiation. As an example a content owner supplies a new title, e.g. Wolverine, to a service provider for distribution on the proviso that this high-value asset cannot be made available in a home or cellular context above 3.5mbps. This allows them to make a UHD or 4K stream available within a separate product that is differentiated as ‘premium’.
The web video application developer has to restrict the playout on the client. The level of control they have is dependant on the video player; in iOS the AVplayerFoundation (the native player) provides preferedPeakBitrate api to restrict consumption, Androids NexPlayer in javascript the developer typically will parse the manifest, DASH or HLS, and use a regex to or another filter to remove available streams prior to passing the array of streams over to the ABR.
</p>
<h4> Client-side ad insertion </h4>
<p> Live linear with client-side ad insertion (CSAI). Typically this involves using the language available in the client runtime, javascript in the case of the web. As an example an ad serving vendor provides a client library. Close to the players initiation the client library makes its api available. The web application listens to events associated with playback, for example the video elements media event ‘playing’. The web application then calls the DOM pause method on the video element and then calls the play method provided by the client-side ad library, passing it the ‘id’ of the video asset. This is then returned to the vendor, possibly along with other identifiers that can be used to target the audience. At this stage an auction is performed with business logic at the ad vendor determining which provider supplies the ad (this is a complex topic and outside the scope of this document). The vendor responds with a VAST (Video Ad Serving Template) payload that includes the URI of a the ad content appropriate for the playback environment. In some cases there is no ad, if this is the case the user is presented with the content they originally requested and control is passed back to the web video application. If there is an ad targeted against the content then the library performs DOM manipulation and injects a new video element into the document this is typically accompanied by a further script that provides the vendor with insights based on the current session. The ad plays. The ad object will conform to VPAID (Video Player Ad Serving Definition) and present a standardised interface to the player for possible interaction, it will issue a standard set of events which the web application can listen to. In response to an ‘adEnded’ event the local library will tear down the injected DOM elements and in turn issue an event that the web application can use to trigger a return to playing the original content.
In the case of live linear CSAI websocket technology is often used. If the content has a dynamic nature, for example a sports event or a fashion show, a websocket connection is established and ads can then be ‘pushed’ when the editorial team thinks it’s appropriate. The web video application listens to the library that establishes and manages the connection. An event is pushed to the listening clients simultaneously, this event is then used to trigger the same set of events discussed above.
</p>
<li>Buffer mainpulation (devive / storage dependant)</li>
<li>Vod assests remain at the edge longer</li>
<li>presitant unique ID exposed via js - signed code issue</li>
<li>HTTP byte range requests</li>
<li>HTTPs support</li>
<li>Player event exposure</li>
<li>HDCP security requirmnet for HDMI</li>
<li>GSMA flag at DRM rights</li>
<li>Water marking</li>
<li>Hardware aceleration support</li>
</ul>
</p>
</section>
<section>
<h3>Live Streaming</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Streaming with Server Side Ad Insertion</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Streaming with Client Side Ad Insertion</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Linear Streaming</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Linear Streaming with Client Side Ad Insertion</h3>
<p>TBD
</p>
</section>
<section>
<h3>On-Demand Streaming with Trick Mode</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Streaming with Trick Mode</h3>
<p>TBD
</p>
</section>
<section>
<h3>On-Demand Streaming with Thumbnail Navigation</h3>
<p>TBD
</p>
</section>
<section>
<h3>Live Streaming with Thumbnail Navigation</h3>
<p>TBD
</p>
</section>
</section>
<section>
<h2>Media Playback Methods</h2>
<p>TBD
</p>
<section>
<h3>Device Identification</h3>
<p>TBD
</p>
</section>
<section>
<h3>Device Media Profile Support</h3>
<p>TBD
</p>
</section>
<section>
<h3>Device Key System Support</h3>
<p>TBD
</p>
</section>
<section>
<h3>Device Content Protection Capabilities</h3>
<p>How to determine the content protection capabilities of a device.
</p>
</section>
<section>
<h3>Using Encrypted Media Extensions</h3>
<p>TBD
</p>
</section>
<section>
<h3>Using Media Source Extensions</h3>
<p>TBD
</p>
</section>
</section>
<section>
<h2>Content Encoding Guidelines</h2>
<p>
</p>
<section>
<h3>For On-Demand Streaming</h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
<section>
<h3>For Live Streaming</h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
<section>
<h3>Ad Encoding for On-demand Content</h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
<section>
<h3>Ad Encoding for Live Streaming with Server Side Ad-insertion</h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
<section>
<h3>Trick Mode Track</h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
<section>
<h3>Thumbnails Track </h3>
<p>TBD
</p>
<section>
<h4>Media Encoding</h4>
<p>TBD</p>
</section>
<section>
<h4>Manifest Preparation</h4>
<p>TBD</p>
</section>
</section>
</section>
</body>
</html>