From 9cd12f58fd4745b4712f6edcdcd9dab2c397e1e5 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Fri, 26 Jul 2024 13:39:56 +0900 Subject: [PATCH] Revamp the API surface There are two major motivations for this change: * Splitting translation and language detection into separate APIs, to reflect what we've learned from prototyping. * Aligning better with other built-in API proposals, including future ones, by using shared patterns. This notably removes translation from an unknown source language, closing #1. It also adds AbortSignals and destroy() methods. This also removes the tentative proposal for language tag handling, instead pointing to discussions in #11. --- README.md | 365 +++++++++++++++++------------- security-privacy-questionnaire.md | 20 +- 2 files changed, 225 insertions(+), 160 deletions(-) diff --git a/README.md b/README.md index e50b00c..c52387f 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Explainer for the Web Translation API +# Explainer for the Web Translation and Language Detection APIs _This proposal is an early design sketch by the Chrome built-in AI team to describe the problem below and solicit feedback on the proposed solution. It has not been approved to ship in Chrome._ @@ -11,6 +11,8 @@ Browsers are increasingly offering language translation to their users. Such tra To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU. This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages, so that if present, they can serve as a simpler and less resource-intensive alternative. +An important supplement to translation is language detection. This can be combined with translation, e.g. taking user input in an unknown language and translating it to a specific target language. In a similar way, browsers today often already have langauge detection capabilities, and we want to offer them to web developers through a JavaScript API. + ## Goals Our goals are to: @@ -20,16 +22,17 @@ Our goals are to: * Guide web developers to gracefully handle failure cases, e.g. translation not being available or possible. * Harmonize well with existing browser and OS translation technology ([Brave](https://support.brave.com/hc/en-us/articles/8963107404813-How-do-I-use-Brave-Translate), [Chrome](https://support.google.com/chrome/answer/173424?hl=en&co=GENIE.Platform%3DDesktop#zippy=%2Ctranslate-selected-text), [Edge](https://support.microsoft.com/en-us/topic/use-microsoft-translator-in-microsoft-edge-browser-4ad1c6cb-01a4-4227-be9d-a81e127fcb0b), [Firefox](https://support.mozilla.org/en-US/kb/website-translation), [Safari](https://9to5mac.com/2020/12/04/how-to-translate-websites-with-safari-mac/)), e.g. by allowing on-the-fly downloading of different languages instead of assuming all are present from the start. * Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers. +* Allow implementations to expose different capabilities for translation vs. language detection. For example, an implementation might be able to detect 30+ languages, but only be able to translate between 6. The following are explicit non-goals: -* We do not intend to force every browser to ship language packs for every language combination, or even to support translation at all. It would be conforming to implement this API by always returning `"no"` from `canTranslate()`, or to implement this API entirely by using cloud services instead of on-device translation. -* We do not intend to provide guarantees of translation quality, stability, or interoperability between browsers. These are left as quality-of-implementation issues, similar to the [shape detection API](https://wicg.github.io/shape-detection-api/). (See also a [discussion of interop](https://www.w3.org/reports/ai-web-impact/#interop) in the W3C "AI & the Web" document.) +* We do not intend to force every browser to ship language packs for every language combination, or even to support translation at all. It would be conforming to implement this API by always saying translation and language detection are unavailable, or to implement this API entirely by using cloud services instead of on-device translation. +* We do not intend to provide guarantees of translation and language detection quality, stability, or interoperability between browsers. These are left as quality-of-implementation issues, similar to the [shape detection API](https://wicg.github.io/shape-detection-api/). (See also a [discussion of interop](https://www.w3.org/reports/ai-web-impact/#interop) in the W3C "AI & the Web" document.) The following are potential goals we are not yet certain of: * Allow web developers to know whether translations are done on-device or using cloud services. This would allow them to guarantee that any user data they feed into this API does not leave the device, which can be important for privacy purposes. (Similarly, we might want to allow developers to request on-device-only translation, in case a browser offers both varieties.) -* Allow web developers to know some identifier for the translation model in use, separate from the browser version. This would allow them to allowlist or blocklist specific models to maintain a desired level of quality. +* Allow web developers to know some identifier for the translation and language detection models in use, separate from the browser version. This would allow them to allowlist or blocklist specific models to maintain a desired level of quality. Both of these potential goals are potentially detrimental to interoperability, so we want to investigate more how important such functionality is to developers to find the right tradeoff. @@ -39,227 +42,287 @@ Note that in this API, languages are represented as [BCP 47](https://www.rfc-edi See [below](#language-tag-handling) for more on the details of how language tags are handled in this API, and the [appendix](#appendix-converting-between-language-tags-and-human-readable-strings) for some helper code that converts between language tags and human-readable strings. -### For a known source language +### Translation -If the source language is known, using the API looks like so: +Here is the basic usage of the translation API, with no error handling: ```js -const canTranslate = await translation.canTranslate({ +const translator = await ai.translator.create({ sourceLanguage: "en", targetLanguage: "ja" }); -if (canTranslate !== "no") { - const translator = await translation.createTranslator({ - sourceLanguage: "en", - targetLanguage: "ja" - }); - - console.assert(translator.sourceLanguage === "en"); - console.assert(translator.targetLanguage === "ja"); - - const text = await translator.translate("Hello, world!"); - const readableStreamOfText = await translator.translateStreaming(` - Four score and seven years ago our fathers brought forth, upon this...`); -} else { - // Use alternate methods -} +const text = await translator.translate("Hello, world!"); +const readableStreamOfText = await translator.translateStreaming(` + Four score and seven years ago our fathers brought forth, upon this... +`); ``` -### For an unknown source language +Note that the `create()` method call here might cause the download of a translation model or language pack. Later examples show how to get more insight into this process. + +### Language detection -If the source language is unknown, the same APIs can be called without the `sourceLanguage` option. The return type of the resulting translator object's `translate()` and `translateStreaming()` methods will change to include the best-guess at the detected language, and a confidence level between 0 and 1: +A similar simplified example of the language detection API: ```js -const canTranslate = await translation.canTranslate({ targetLanguage: "ja" }); - -if (canTranslate !== "no") { - const translator = await translation.createTranslator({ targetLanguage: "ja" }); - - console.assert(translator.sourceLanguage === null); - console.assert(translator.targetLanguage === "ja"); - - const { - detectedLanguage, - confidence, - result - } = await translator.translate(someUserText); - - // result is a ReadableStream - const { - detectedLanguage, - confidence, - result - } = await translator.translateStreaming(longerUserText); +const detector = await ai.languageDetector.create(); + +const results = await detector.detect(someUserText); +for (const result of results) { + console.log(result.detectedLanguage, result.confidence); } ``` -If the language cannot be detected, then the return value will be `{ detectedLanguage: null, confidence: 0, result: null }`. +Here `results` will be an array of `{ detectedLanguage, confidence }` objects, with the `detectedLanguage` field being a BCP 47 language tag and `confidence` beeing a number between 0 and 1. The array will be sorted by descending confidence, and the confidences will be normalized so that all confidences that the underlying model produces sum to 1, but confidences below `0.1` will be omitted. (Thus, the total sum of `confidence` values seen by the developer will sometimes sum to less than 1.) -### Downloading new languages +The language being unknown is represented by `detectedLanguage` being null. The array will always contain at least 1 entry, although it could be for the unknown (`null`) language. -In the above examples, we're always testing if the `canTranslate()` method returns something other than `"no"`. Why isn't it a simple boolean? The answer is because the return value can be one of three possibilities: +### Capabilities, and a more realistic combined example -* `"no"`: it is not possible for this browser to translate as requested -* `"readily"`: the browser can readily translate as requested -* `"after-download"`: the browser can perform the requested translation, but only after it downloads appropriate material. +Both APIs provide a promise-returning `capabilities()` methods which let you know, before calling `create()`, what is possible with the implementation. The capabilities object that the promise fulfills with has an `available` property which is one of `"no"`, `"after-download"`, or `"readily"`: -To see how to use this, consider an expansion of the above example: +* `"no"` means that the implementation does not support translation or language detection. +* `"after-download"` means that the implementation supports translation or language detection, but it will have to download something (e.g. a machine learning model) before it can do anything. +* `"readily"` means that the implementation supports translation or language detection, and at least the base model is available without any downloads. -```js -const canTranslate = await translation.canTranslate({ targetLanguage: "is" }); - -if (canTranslate === "readily") { - const translator = await translation.createTranslator({ targetLanguage: "is" }); - doTheTranslation(translator); -} else if (canTranslate === "after-download") { - // Since we're in the "after-download" case, creating a translator will start - // downloading the necessary language pack. - const translator = await translation.createTranslator({ targetLanguage: "is" }); - - translator.ondownloadprogress = progressEvent => { - updateDownloadProgressBar(progressEvent.loaded, progressEvent.total); - }; - await translator.ready; - removeDownloadProgressBar(); - - doTheTranslation(translator); -} else { - // Use alternate methods -} -``` +Each of these capabilities objects has further methods which give the state of specific translation or language detection capabilities: -Note that `await translator.ready` is not necessary; if it's omitted, calls to `translator.translate()` or `translator.translateStreaming()` will just take longer to fulfill (or reject). But it can be convenient. +* `canTranslate(sourceLanguageTag, targetLanguageTag)` +* `canDetect(languageTag)` -If the download fails, then `downloadprogress` events will stop being emitted, and the `ready` promise will be rejected with a "`NetworkError`" `DOMException`. Additionally, any calls to `translator`'s methods will reject with the same error. +Both of these methods return `"no"`, `"after-download"`, or `"readily"`, which have the same meanings as above, except specialized to the specific arguments in question. -### Language detection - -Apart from translating between languages, the API can offer the ability to detect the language of text, with confidence levels. +Here is an example that adds capability checking to log more information and fall back to cloud services, as part of a language detection plus translation task: ```js -if (await translation.canDetect() !== "no") { - const detector = await translation.createDetector(); +async function translateUnknownCustomerInput(textToTranslate, targetLanguage) { + const languageDetectorCapabilities = await ai.languageDetector.capabilities(); + const translatorCapabilities = await ai.translator.capabilities(); + + // If `languageDetectorCapabilities.available === "no"`, then assume the source language is the + // same as the document language. + let sourceLanguage = document.documentElement.lang; + + // Otherwise, let's detect the source language. + if (languageDetectorCapabilities.available !== "no") { + if (languageDetectorCapabilities.available === "after-download") { + console.log("Language detection is available, but something will have to be downloaded. Hold tight!"); + } + + // Special-case check for Japanese since for our site it's particularly important. + if (languageDetectorCapabilities.canDetect("ja") === "no") { + console.warn("Japanese Language detection is not available. Falling back to cloud API."); + sourceLanguage = await useSomeCloudAPIToDetectLanguage(textToTranslate); + } else { + const detector = await ai.languageDetector.create(); + const [bestResult] = await detector.detect(textToTranslate); + + if (bestResult.detectedLangauge ==== null || bestResult.confidence < 0.4) { + // We'll just return the input text without translating. It's probably mostly punctuation + // or something. + return textToTranslate; + } + sourceLanguage = bestResult.detectedLanguage; + } + } + + // Now we've figured out the source language. Let's translate it! + // Note how we can just check `translatorCapabilities.canTranslate()` instead of also checking + // `translatorCapabilities.available`. + const canTranslate = translatorCapabilities.canTranslate(sourceLanguage, targetLanguage); + if (canTranslate === "no") { + console.warn("Translation is not available. Falling back to cloud API."); + return await useSomeCloudAPIToTranslate(textToTranslate, { sourceLanguage, targetLanguage }); + } - const results = await detector.detect("Hello, world!"); - for (const result of results) { - console.log(result.detectedLanguage, result.confidence); + if (canTranslate === "after-download") { + console.log("Translation is available, but something will have to be downloaded. Hold tight!"); } + + const translator = await ai.translator.create({ sourceLanguage, targetLanguage }); + return await translator.translate(textToTranslate); } ``` -If no language can be detected with reasonable confidence, this API returns an empty array. +### Download progress -### Listing supported languages +In cases where translation or language detection is only possible after a download, you can monitor the download progress (e.g. in order to show your users a progress bar) using code such as the following: -To get a list of languages which the current browser can translate, we can use the following code: +```js +const translator = await ai.translator.create({ + sourceLanguage, + targetLanguage, + monitor(m) { + m.addEventListener("downloadprogress", e => { + console.log(`Downloaded ${e.loaded} of ${e.total} bytes.`); + }); + } +}); +``` + +If the download fails, then `downloadprogress` events will stop being emitted, and the promise returned by `create()` will be rejected with a "`NetworkError`" `DOMException`. + +
+What's up with this pattern? + +This pattern is a little involved. Several alternatives have been considered. However, asking around the web standards community it seemed like this one was best, as it allows using standard event handlers and `ProgressEvent`s, and also ensures that once the promise is settled, the translator or language detector object is completely ready to use. + +It is also nicely future-extensible by adding more events and properties to the `m` object. + +Finally, note that there is a sort of precedent in the (never-shipped) [`FetchObserver` design](https://github.com/whatwg/fetch/issues/447#issuecomment-281731850). +
+ +### Destruction and aborting + +The API comes equipped with a couple of `signal` options that accept `AbortSignal`s, to allow aborting the creation of the translator/language detector, or the translation/language detection operations themselves: ```js -for (const language of await translation.supportedLanguages()) { - let text = languageTagToHumanReadable(lang, "en"); // see appendix - languageDropdown.append(new Option(text, language)); -} +const controller = new AbortController(); +stopButton.onclick = () => controller.abort(); + +const languageDetector = await ai.languageDetector.create({ signal: controller.signal }); +await languageDetector.detect(document.body.textContent, { signal: controller.signal }); ``` -This method does not distinguish between languages which are available `"readily"` vs. `"after-download"`, because giving that information for all languages at once is too much of a [privacy issue](#privacy-considerations). Instead, the developer must make individual calls to `canTranslate()`, which gives the browser more opportunities to apply privacy mitigations. +Additionally, the language detector and translator objects themselves have a `destroy()` method. Calling this method will: + +* Abort any ongoing downloads or loading process for the language detector or translator model. +* Reject any ongoing calls to `detect()` or `translate()` with a `"AbortError"` `DOMException`. +* Error any `ReadableStream`s returned by `translateStreaming()` with a `"AbortError"` `DOMException`. +* And, most importantly, allow the user agent to unload the machine learning models from memory. (If no other APIs are using them.) + +This method is mainly used as a mechanism to free up the memory used by the model without waiting for garbage collection, since machine learning models can be quite large. ## Detailed design ### Full API surface in Web IDL ```webidl -[Exposed=(Window,Worker)] -interface Translation { - Promise canTranslate(TranslationLanguageOptions options); - Promise createTranslator(TranslationLanguageOptions options); +// Shared self.ai APIs - Promise canDetect(); - Promise createDetector(); +partial interface WindowOrWorkerGlobalScope { + [Replaceable] readonly attribute AI ai; +}; - Promise>> supportedLanguages(); +[Exposed=(Window,Worker)] +interface AI { + readonly attribute AITranslatorFactory translator; + readonly attribute AILanguageDetectorFactory languageDetector; }; [Exposed=(Window,Worker)] -interface LanguageTranslator : EventTarget { - readonly attribute Promise ready; +interface AICreateMonitor : EventTarget { attribute EventHandler ondownloadprogress; - readonly attribute DOMString? sourceLanguage; - readonly attribute DOMString targetLanguage; + // Might get more stuff in the future, e.g. for + // https://github.com/explainers-by-googlers/prompt-api/issues/4 +}; - Promise<(DOMString or ResultWithLanguageDetection)> translate(DOMString input); - Promise<(ReadableStream or StreamingResultWithLanguageDetection)> translateStreaming(DOMString input); +callback AICreateMonitorCallback = undefined (AICreateMonitor monitor); + +enum AICapabilityAvailability { "readily", "after-download", "no" }; +``` + +```webidl +// Translator + +[Exposed=(Window,Worker)] +interface AITranslatorFactory { + Promise create(AITranslatorCreateOptions options); + Promise capabilities(); }; [Exposed=(Window,Worker)] -interface LanguageDetector : EventTarget { - readonly attribute Promise ready; - attribute EventHandler ondownloadprogress; +interface AITranslator { + Promise translate(DOMString input, optional AITranslatorTranslateOptions options = {}); + ReadableStream translateStreaming(DOMString input, optional AITranslatorTranslateOptions options = {}); - Promise> detect(DOMString input); + readonly attribute DOMString sourceLanguage; + readonly attribute DOMString targetLanguage; + + undefined destroy(); }; -partial interface WindowOrWorkerGlobalScope { - readonly attribute Translation translation; +[Exposed=(Window,Worker)] +interface AITranslatorCapabilities { + readonly attribute AICapabilityAvailability available; + + AICapabilityAvailability canTranslate(DOMString sourceLanguage, DOMString targetLanguage); }; -enum TranslationAvailability { "readily", "after-download", "no" }; +dictionary AITranslatorCreateOptions { + AbortSignal signal; + AICreateMonitorCallback monitor; -dictionary TranslationLanguageOptions { + required DOMString sourceLanguage; required DOMString targetLanguage; - DOMString sourceLanguage; }; -dictionary LanguageDetectionResult { - DOMString? detectedLanguage; - double confidence; +dictionary AITranslatorTranslateOptions { + AbortSignal signal; }; +``` -dictionary ResultWithLanguageDetection : LanguageDetectionResult { - DOMString? result; -}; +```webidl +// Language detector -dictionary StreamingResultWithLanguageDetection : LanguageDetectionResult { - ReadableStream? result; +[Exposed=(Window,Worker)] +interface AILanguageDetectorFactory { + Promise create(optional AILanguageDetectorCreateOptions options = {}); + Promise capabilities(); }; -``` -### Language tag handling +[Exposed=(Window,Worker)] +interface AILanguageDetector { + Promise> detect(DOMString input, + optional AILanguageDetectorDetectOptions options = {}); -If a browser supports translating from `ja` to `en`, does it also support translating from `ja` to `en-US`? What about `en-GB`? What about the (discouraged, but valid) `en-Latn`, i.e. English written in the usual Latin script? But translation to `en-Brai`, English written in the Braille script, is different entirely. + undefined destroy(); +}; -Tentatively, pending consultation with internationalization and translation API experts, we propose the following model. Each user agent has a list of (language tag, availability) pairs, which is the same one returned by `translation.supportedLanguages()`. Only exact matches for entries in that list will be used for the API. +[Exposed=(Window,Worker)] +interface AILanguageDetectorCapabilities { + readonly attribute AICapabilityAvailability available; -So for example, consider a browser which supports `en`, `zh-Hans`, and `zh-Hant`. Then we would have the following results: + AICapabilityAvailability canDetect(DOMString languageTag); +}; -```js -await translator.canTranslate({ targetLanguage: "en" }); // true -await translator.canTranslate({ targetLanguage: "en-US" }); // false +dictionary AILanguageDetectorCreateOptions { + AbortSignal signal; + AICreateMonitorCallback monitor; +}; + +dictionary AILanguageDetectorDetectOptions { + AbortSignal signal; +}; -await translator.canTranslate({ targetLanguage: "zh-Hans" }); // true -await translator.canTranslate({ targetLanguage: "zh" }); // false +dictionary LanguageDetectionResult { + DOMString? detectedLanguage; // null represents unknown language + double confidence; +}; ``` -To improve interoperability and best meet developer expectations, we can mandate in the specification that browsers follow the best practices outlined in BCP 47, especially around [extended language subtags](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1.2), such as: +### Language tag handling -* always returning canonical forms instead of aliases; -* correctly distinguishing between script support (e.g. `zh-Hant`) from country support (e.g. `zh-TW`); and -* avoiding including redundant script information (e.g. `en-Latn`). +If a browser supports translating from `ja` to `en`, does it also support translating from `ja` to `en-US`? What about `en-GB`? What about the (discouraged, but valid) `en-Latn`, i.e. English written in the usual Latin script? But translation to `en-Brai`, English written in the Braille script, is different entirely. + +We're not clear on what the right model is here, and are discussing it in [issue #11](https://github.com/WICG/translation-api/issues/11). ### Downloading The current design envisions that the following operations will _not_ cause downloads of language packs or other material like a language detection model: -* `translation.canTranslate()` -* `translation.canDetect()` -* `translation.supportedLanguages()` +* `ai.translator.capabilities()` and the properties/methods of the returned object +* `ai.languageDetector.capabilities()` and the properties/methods of the returned object -The following _can_ cause downloads. In all cases, whether or not a call will initiate a download can be detected beforehand by checking the return value of the corresponding `canXYZ()` call. +The following _can_ cause downloads. In all cases, whether or not a call will initiate a download can be detected beforehand by checking the corresponding capabilities object. -* `translation.createTranslator()` -* `translation.createDetector()` +* `ai.translator.create()` +* `ai.languageDetector.create()` -After a developer has a `LanguageTranslator` or `LanguageDetector` object created by these methods, further calls are not expected to cause any downloads. (Although they might require internet access, if the implementation is not entirely on-device.) +After a developer has a `AITranslator` or `AILanguageDetector` object created by these methods, further calls are not expected to cause any downloads. (Although they might require internet access, if the implementation is not entirely on-device.) + +This design means that the implementation must have all information about the capabilities of its translation and language detection models available beforehand, i.e. "shipped with the browser". (Either as part of the browser binary, or through some out-of-band update mechanism that eagerly pushes updates.) ## Privacy considerations @@ -273,11 +336,11 @@ Some sort of mitigation may be necessary here. We believe this is adjacent to ot * Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack. * Only exposing a fixed set of languages to this API, e.g. based on the user's locale or the document's main language. -As a first step, we require that detecting the availability of translation for a given language pair be done via individual calls to `canTranslate()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to `canTranslate()` and starting to return `"no"`. +As a first step, we require that detecting the availability of translation for a given language pair be done via individual calls to `canTranslate()` and `canDetect()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"no"`. Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this. -Finally, we intend to prohibit (in the specification) any use of user-specific information in producing the translations. For example, it would not be permissible to fine-tune the translation model based on information the user has entered into the browser in the past. +Finally, we intend to prohibit (in the specification) any use of user-specific information in producing the results. For example, it would not be permissible to fine-tune the translation model based on information the user has entered into the browser in the past. ## Alternatives considered and under consideration @@ -296,10 +359,10 @@ That said, we are aware of [research](https://arxiv.org/abs/2005.08595) on trans The current design requires multiple async steps to do useful things: ```js -const translator = await translation.createTranslator(options); +const translator = await ai.translator.create(options); const text = await translator.translate(sourceText); -const detector = await translation.createDetector(); +const detector = await ai.languageDetector.create(); const results = await detector.detect(sourceText); ``` @@ -307,15 +370,13 @@ Should we simplify these down with convenience APIs that do both steps at once? We're open to this idea, but we think the existing complexity is necessary to support the design wherein translation and language detection models might not be already downloaded. By separating the two stages, we allow web developers to perform the initial creation-and-possibly-downloading steps early in their page's lifecycle, in preparation for later, hopefully-quick calls to APIs like `translate()`. -Another possible simplification is to make some of the more informational APIs, namely `canTranslate()`, `canDetect()`, and `supportedLanguages()`, synchronous instead of asynchronous. This would be implementable by having the browser proactively load the information about supported languages into the main thread's process, upon creation of the global object. We think this is not worthwhile, though, as it imposes a non-negligible cost on all global object creation. - -### Separating language detection and translation +Another possible simplification is to make the `capabilities()` APIs synchronous instead of asynchronous. This would be implementable by having the browser proactively load the capabilities information into the main thread's process, upon creation of the global object. We think this is not worthwhile, as it imposes a non-negligible cost on all global object creation, even when the APIs are not used. -As discussed in [For an unknown source language](#for-an-unknown-source-language), we support performing both language detection and translation to the best-guess language at the same time, in one API. This slightly complicates the `translate()` and `translateStreaming()` APIs, by giving them polymorphic return types. +### Allowing unknown source languages for translation -We could instead require that developers always supply a `sourceLanguage`, and if they want to detect it ahead of time, they could use the `detect()` API. +An earlier revision of this API including support for combining the langauge detection and translation steps into a single translation call, which did a best-guess on the source language. The idea was that this would possibly be more efficient than requiring the web developer to do two separate calls, and it could possibly even be done using a single model. -We're open to this simplification, but suspect it would be worse for efficiency, as it bakes in a requirement of multiple traversals over the input text, mediated by JavaScript code. We plan to investigate whether multiple traversals over the input are necessary anyway according to the latest research, in which case this simplification would probably be preferable. +We abandoned this design when it became clear that existing browsers have very decoupled implementations of translation vs. language detection, using separate models for each. This includes supporting different languages for language detection vs. for translation. So even if the translation model supported an unknown-source-language mode, it might not support the same inputs as the language detection model, which would create a confusing developer experience and be hard to signal in the capabilities API. ## Stakeholder feedback diff --git a/security-privacy-questionnaire.md b/security-privacy-questionnaire.md index 026dedd..def2de3 100644 --- a/security-privacy-questionnaire.md +++ b/security-privacy-questionnaire.md @@ -5,9 +5,11 @@ This feature exposes two main pieces of information: -- The availability information for each `{ sourceLanguage, targetLanguage }` pair, so that web developers know what translations are possible and whether such translations will require the user to download a potentially-large language pack. +- The availability information for each `(sourceLanguage, targetLanguage)` translation pair, or possible language detection result, so that web developers know what translations and detections are possible and whether they will require the user to download a potentially-large language pack. -- The actual results of translations, which can be dependent on the language packs or translation models. + (This information has to be probed for each individual pair or possible language, and the browser can say that a language is unavailable even if it is, for privacy reasons.) + +- The actual results of translations and language detections, which can be dependent on the AI models in use. > 02. Do features in your specification expose the minimum amount of information > necessary to implement the intended functionality? @@ -18,7 +20,7 @@ We believe so. It's possible that we could remove the exposure of the availabili > personally-identifiable information (PII), or information derived from > either? -No. Although it's imaginable that the translation models could be fine-tuned on PII to give more accurate-to-this-user translations, we intend to disallow this in the specification. +No. Although it's imaginable that the translation or language detection models could be fine-tuned on PII to give more accurate-to-this-user translations, we intend to disallow this in the specification. > 04. How do the features in your specification deal with sensitive information? @@ -27,12 +29,12 @@ We do not deal with sensitive information. > 05. Do the features in your specification introduce state > that persists across browsing sessions? -Yes. The downloading of language packs and translation models persists across browsing sessions. +Yes. The downloading of language packs and translation or language detection models persists across browsing sessions. > 06. Do the features in your specification expose information about the > underlying platform to origins? -Possibly. If a browser does not bundle its own translation models and language packs, but instead uses the operating system's functionality, it is possible for a web developer to infer information about such operating system functionality. +Possibly. If a browser does not bundle its own models, but instead uses the operating system's functionality, it is possible for a web developer to infer information about such operating system functionality. > 07. Does this specification allow an origin to send data to the underlying > platform? @@ -92,13 +94,15 @@ No. > (instead of getting destroyed) after navigation, and potentially gets reused > on future navigations back to the document? -Ideally, nothing special should happen. In particular, `LanguageTranslator` and `LanguageDetector` objects should still be usable without interruption after navigating back. We'll need to add web platform tests to confirm this, as it's easy to imagine implementation architectures in which keeping these objects alive while the `Document` is in the back/forward cache is difficult. +Ideally, nothing special should happen. In particular, `AITranslator` and `AILanguageDetector` objects should still be usable without interruption after navigating back. We'll need to add web platform tests to confirm this, as it's easy to imagine implementation architectures in which keeping these objects alive while the `Document` is in the back/forward cache is difficult. -(For such implementations, failing to bfcache `Document`s with active `LanguageTranslator` or `LanguageDetector` objects would a simple way of being spec-compliant.) +(For such implementations, failing to bfcache `Document`s with active `AITranslator` or `AILanguageDetector` objects would a simple way of being spec-compliant.) > 18. What happens when a document that uses your feature gets disconnected? -As with the previous question, nothing special should happen. As with the previous question, it's easy to imagine implementations where this is difficult to implement. We may need to add a check in the specification to prevent such usage, if prototyping shows that the difficulty is significant. +As with the previous question, nothing special should happen: the objects should still stay alive and be usable. + +As with the previous question, it's easy to imagine implementations where this is difficult to implement. We may need to add a check in the specification to prevent such usage, if prototyping shows that the difficulty is significant. > 19. What should this questionnaire have asked?