Authors: Sam Dallstream, Greg Whitworth, Rahul Singh
This document is intended as a starting point for engaging the community and standards bodies in developing collaborative solutions fit for standardization. As the solutions to problems described in this document progress along the standards-track, we will retain this document as an archive and use this section to keep the community up-to-date with the most current standards venue and content location of future work and discussions.
- This document status: Active
- Expected venue: W3C mst-content-hint
- Current version: this document
The Audio Category is a proposed addition to the mst-content-hint spec that will allow websites to set a contentHint
on a MediaStreamTrack
that specifies that the track is meant for speech recognition by a machine.
The contentHint
we are proposing is speechRecognition
.
We believe there is a general need to differentiate between streams intended for human consumption and streams meant to be used for transcription by a machine because there are many differences in the optimizations that are applied for each scenario. Specifically, requirements for communications between humans can be found in the ETSI TS 126 131 specification, and include optimizations in noise suppression like the addition of pink noise in order to increase user satisfaction, which is in direct opposition to the needs of a speech recognition system. There is also a draft of testing methods for speech recognition systems that outlines some of the different requirements for those systems STQ63-260v0210.
The proposed solution below was inspired by the categories that Windows offers for audio streams. These categories allow you to specify what kind of audio stream you want (ex: “speech” for when someone is dictating into a mic), which gives the operating system a chance to optimize the stream for that type of input. After some research, we found that similar categories exist across Android, iOS, and, of course, Windows.
We plan to follow the lead of native applications across Android, iOS, and Windows, and extend the list of content-hints for the developer to choose from when working with a stream. We will adapt this to the web by modifying the mst-content-hint API. For operating systems, such as Mac, that do not have one to one mappings of these categories, a best effort approach will be taken to applying categories.
Add the speechRecognition
option to contentHint
for audio tracks.
partial interface MediaStreamTrack {
attribute DOMString contentHint;
};
const constraints = {volume: 1};
navigator.mediaDevices.getUserMedia({ audio : constraints})
.then(handleMediaStreamAcquired.bind(this),
handleMediaStreamAcquiredError.bind(this));
function handleMediaStreamAcquired(mediaStream) {
mediaStream.getTracks()[0].contentHint = 'speechRecognition';
}
function handleMediaStreamAcquiredError(mediaStreamError) {
console.log(mediaStreamError);
}