From 6981c9c5336c885efd6e1cfdd202e887a7f024de Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Wed, 11 Dec 2024 14:41:46 +0900 Subject: [PATCH] Remove the capabilities objects Instead, we replace them with createOptionsAvailable() for translator, and canDetect() for language detector. Closes #24. --- README.md | 68 ++++++++++++++++++++----------------------------------- index.bs | 7 +++--- 2 files changed, 28 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index dbbdc59..43cb414 100644 --- a/README.md +++ b/README.md @@ -77,40 +77,34 @@ Here `results` will be an array of `{ detectedLanguage, confidence }` objects, w The language being unknown is represented by `detectedLanguage` being null. The array will always contain at least 1 entry, although it could be for the unknown (`null`) language. -### Capabilities, and a more realistic combined example +### Checking before creation, and a more realistic combined example -Both APIs provide a promise-returning `capabilities()` methods which let you know, before calling `create()`, what is possible with the implementation. The capabilities object that the promise fulfills with has an `available` property which is one of `"no"`, `"after-download"`, or `"readily"`: +Both APIs provide the ability to know, before calling `create()`,what is possible with the implementation. For translator, this is done via `ai.translator.createOptionsAvailable({ sourceLanguage, targetLanguage })`, whereas for language detector, this is done via `ai.languageDetector.canDetect(languageTag)`, or `ai.languageDetector.canDetect()` if you just want to test for the existence of language detection capabilities without any guarantee on which languages are detectable. -* `"no"` means that the implementation does not support translation or language detection. -* `"after-download"` means that the implementation supports translation or language detection, but it will have to download something (e.g. a machine learning model) before it can do anything. -* `"readily"` means that the implementation supports translation or language detection, and at least the base model is available without any downloads. +Both methods return promises, which fulfill with one of the following values: -Each of these capabilities objects has further methods which give the state of specific translation or language detection capabilities: - -* `languagePairAvailable(sourceLanguageTag, targetLanguageTag)`, for the `ai.translation.capabilities()` object -* `languageAvailable(languageTag)`, for the `ai.languageDetection.capabilities()` object - -Both of these methods return `"no"`, `"after-download"`, or `"readily"`, which have the same meanings as above, except specialized to the specific arguments in question. +* `"no"` means that the implementation does not support translation or language detection of the given language(s). +* `"after-download"` means that the implementation supports translation or language detection of the given language(s), but it will have to download something (e.g., a machine learning model) as part of creating the associated object. +* `"readily"` means that the implementation supports translation or language detection of the given language(s), without performing any downloads. Here is an example that adds capability checking to log more information and fall back to cloud services, as part of a language detection plus translation task: ```js async function translateUnknownCustomerInput(textToTranslate, targetLanguage) { - const languageDetectorCapabilities = await ai.languageDetector.capabilities(); - const translatorCapabilities = await ai.translator.capabilities(); + const canDetect = await ai.languageDetector.canDetect(); - // If `languageDetectorCapabilities.available === "no"`, then assume the source language is the + // If there is no language detector, then assume the source language is the // same as the document language. let sourceLanguage = document.documentElement.lang; // Otherwise, let's detect the source language. - if (languageDetectorCapabilities.available !== "no") { - if (languageDetectorCapabilities.available === "after-download") { + if (canDetect !== "no") { + if (canDetect === "after-download") { console.log("Language detection is available, but something will have to be downloaded. Hold tight!"); } // Special-case check for Japanese since for our site it's particularly important. - if (languageDetectorCapabilities.languageAvailable("ja") === "no") { + if (await ai.languageDetector.canDetect("ja") === "no") { console.warn("Japanese Language detection is not available. Falling back to cloud API."); sourceLanguage = await useSomeCloudAPIToDetectLanguage(textToTranslate); } else { @@ -127,9 +121,7 @@ async function translateUnknownCustomerInput(textToTranslate, targetLanguage) { } // Now we've figured out the source language. Let's translate it! - // Note how we can just check `translatorCapabilities.languagePairAvailable()` instead of also checking - // `translatorCapabilities.available`. - const availability = translatorCapabilities.languagePairAvailable(sourceLanguage, targetLanguage); + const availability = await ai.translator.createOptionsAvailable({ sourceLanguage, targetLanguage }); if (availability === "no") { console.warn("Translation is not available. Falling back to cloud API."); return await useSomeCloudAPIToTranslate(textToTranslate, { sourceLanguage, targetLanguage }); @@ -232,7 +224,7 @@ enum AICapabilityAvailability { "readily", "after-download", "no" }; [Exposed=(Window,Worker), SecureContext] interface AITranslatorFactory { Promise create(AITranslatorCreateOptions options); - Promise capabilities(); + Promise createOptionsAvailable(AITranslatorCreateCoreOptions options); }; [Exposed=(Window,Worker), SecureContext] @@ -246,19 +238,14 @@ interface AITranslator { undefined destroy(); }; -[Exposed=(Window,Worker), SecureContext] -interface AITranslatorCapabilities { - readonly attribute AICapabilityAvailability available; - - AICapabilityAvailability languagePairAvailable(DOMString sourceLanguage, DOMString targetLanguage); +dictionary AITranslatorCreateCoreOptions { + required DOMString sourceLanguage; + required DOMString targetLanguage; }; -dictionary AITranslatorCreateOptions { +dictionary AITranslatorCreateOptions : AITranslatorCreateCoreOptions { AbortSignal signal; AICreateMonitorCallback monitor; - - required DOMString sourceLanguage; - required DOMString targetLanguage; }; dictionary AITranslatorTranslateOptions { @@ -272,7 +259,7 @@ dictionary AITranslatorTranslateOptions { [Exposed=(Window,Worker), SecureContext] interface AILanguageDetectorFactory { Promise create(optional AILanguageDetectorCreateOptions options = {}); - Promise capabilities(); + Promise canDetect(optional DOMString languageTag); }; [Exposed=(Window,Worker), SecureContext] @@ -283,13 +270,6 @@ interface AILanguageDetector { undefined destroy(); }; -[Exposed=(Window,Worker), SecureContext] -interface AILanguageDetectorCapabilities { - readonly attribute AICapabilityAvailability available; - - AICapabilityAvailability languageAvailable(DOMString languageTag); -}; - dictionary AILanguageDetectorCreateOptions { AbortSignal signal; AICreateMonitorCallback monitor; @@ -315,10 +295,10 @@ We're not clear on what the right model is here, and are discussing it in [issue The current design envisions that the following operations will _not_ cause downloads of language packs or other material like a language detection model: -* `ai.translator.capabilities()` and the properties/methods of the returned object -* `ai.languageDetector.capabilities()` and the properties/methods of the returned object +* `ai.translator.createOptionsAvailable()` +* `ai.languageDetector.canDetect()` -The following _can_ cause downloads. In all cases, whether or not a call will initiate a download can be detected beforehand by checking the corresponding capabilities object. +The following _can_ cause downloads. In all cases, whether or not a call will initiate a download can be detected beforehand by the previously-listed methods. * `ai.translator.create()` * `ai.languageDetector.create()` @@ -339,7 +319,7 @@ Some sort of mitigation may be necessary here. We believe this is adjacent to ot * Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack. * Only exposing a fixed set of languages to this API, e.g. based on the user's locale or the document's main language. -As a first step, we require that detecting the availability of translation/detection be done via individual calls to `translationCapabilities.languagePairAvailable()` and `detectionCapabilities.languageAvailable()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"no"`. +As a first step, we require that detecting the availability of translation/detection be done via individual calls to `ai.translator.createOptionsAvailable()` and `ai.languageDetector.canDetect()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"no"`. Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this. @@ -373,13 +353,13 @@ Should we simplify these down with convenience APIs that do both steps at once? We're open to this idea, but we think the existing complexity is necessary to support the design wherein translation and language detection models might not be already downloaded. By separating the two stages, we allow web developers to perform the initial creation-and-possibly-downloading steps early in their page's lifecycle, in preparation for later, hopefully-quick calls to APIs like `translate()`. -Another possible simplification is to make the `capabilities()` APIs synchronous instead of asynchronous. This would be implementable by having the browser proactively load the capabilities information into the main thread's process, upon creation of the global object. We think this is not worthwhile, as it imposes a non-negligible cost on all global object creation, even when the APIs are not used. +Another possible simplification is to make the `createOptionsAvailable()` / `canDetect()` APIs synchronous instead of asynchronous. This would be implementable by having the browser proactively load the capabilities information into the main thread's process, upon creation of the global object. We think this is not worthwhile, as it imposes a non-negligible cost on all global object creation, even when the APIs are not used. ### Allowing unknown source languages for translation An earlier revision of this API including support for combining the language detection and translation steps into a single translation call, which did a best-guess on the source language. The idea was that this would possibly be more efficient than requiring the web developer to do two separate calls, and it could possibly even be done using a single model. -We abandoned this design when it became clear that existing browsers have very decoupled implementations of translation vs. language detection, using separate models for each. This includes supporting different languages for language detection vs. for translation. So even if the translation model supported an unknown-source-language mode, it might not support the same inputs as the language detection model, which would create a confusing developer experience and be hard to signal in the capabilities API. +We abandoned this design when it became clear that existing browsers have very decoupled implementations of translation vs. language detection, using separate models for each. This includes supporting different languages for language detection vs. for translation. So even if the translation model supported an unknown-source-language mode, it might not support the same inputs as the language detection model, which would create a confusing developer experience and be hard to signal in the API. ## Stakeholder feedback diff --git a/index.bs b/index.bs index 28d9e3b..b19254b 100644 --- a/index.bs +++ b/index.bs @@ -2,7 +2,7 @@ Title: Translator and Language Detector APIs Shortname: translation Level: None -Status: w3c/UD +Status: CG-DRAFT Group: webml Repository: webmachinelearning/translation-api URL: https://webmachinelearning.github.io/translation-api @@ -11,9 +11,10 @@ Abstract: The translator and langauge detector APIs gives web pages the ability Markup Shorthands: markdown yes, css no Complain About: accidental-2119 yes, missing-example-ids yes Assume Explicit For: yes -Die On: warning -Boilerplate: omit conformance Default Biblio Status: current +Boilerplate: omit conformance +Indent: 2 +Die On: warning Introduction {#intro}