Skip to content

Commit

Permalink
Fix spelling issues and remove URL localization
Browse files Browse the repository at this point in the history
Closes #17.
  • Loading branch information
domenic committed Oct 9, 2024
1 parent 12e810e commit f83312b
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Browsers are increasingly offering language translation to their users. Such tra

To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU. This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages, so that if present, they can serve as a simpler and less resource-intensive alternative.

An important supplement to translation is language detection. This can be combined with translation, e.g. taking user input in an unknown language and translating it to a specific target language. In a similar way, browsers today often already have langauge detection capabilities, and we want to offer them to web developers through a JavaScript API.
An important supplement to translation is language detection. This can be combined with translation, e.g. taking user input in an unknown language and translating it to a specific target language. In a similar way, browsers today often already have language detection capabilities, and we want to offer them to web developers through a JavaScript API.

## Goals

Expand All @@ -20,7 +20,7 @@ Our goals are to:
* Help web developers perform real-time translations (e.g. of user input).
* Help web developers perform real-time language detection.
* Guide web developers to gracefully handle failure cases, e.g. translation not being available or possible.
* Harmonize well with existing browser and OS translation technology ([Brave](https://support.brave.com/hc/en-us/articles/8963107404813-How-do-I-use-Brave-Translate), [Chrome](https://support.google.com/chrome/answer/173424?hl=en&co=GENIE.Platform%3DDesktop#zippy=%2Ctranslate-selected-text), [Edge](https://support.microsoft.com/en-us/topic/use-microsoft-translator-in-microsoft-edge-browser-4ad1c6cb-01a4-4227-be9d-a81e127fcb0b), [Firefox](https://support.mozilla.org/en-US/kb/website-translation), [Safari](https://9to5mac.com/2020/12/04/how-to-translate-websites-with-safari-mac/)), e.g. by allowing on-the-fly downloading of different languages instead of assuming all are present from the start.
* Harmonize well with existing browser and OS translation technology ([Brave](https://support.brave.com/hc/articles/8963107404813-How-do-I-use-Brave-Translate), [Chrome](https://support.google.com/chrome/answer/173424&co=GENIE.Platform%3DDesktop#zippy=%2Ctranslate-selected-text), [Edge](https://support.microsoft.com/topic/use-microsoft-translator-in-microsoft-edge-browser-4ad1c6cb-01a4-4227-be9d-a81e127fcb0b), [Firefox](https://support.mozilla.org/kb/website-translation), [Safari](https://9to5mac.com/2020/12/04/how-to-translate-websites-with-safari-mac/)), e.g. by allowing on-the-fly downloading of different languages instead of assuming all are present from the start.
* Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers.
* Allow implementations to expose different capabilities for translation vs. language detection. For example, an implementation might be able to detect 30+ languages, but only be able to translate between 6.

Expand Down Expand Up @@ -331,7 +331,7 @@ This design means that the implementation must have all information about the ca

This proposal as-is has privacy issues, which we are actively thinking about how to address. They are all centered around how sites that use this API might be able to uniquely fingerprint the user.

The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"no"`, `"readily"`, or `"after-download"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/en-US/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/en-US/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate.
The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"no"`, `"readily"`, or `"after-download"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate.

Some sort of mitigation may be necessary here. We believe this is adjacent to other areas that have seen similar mitigation, such as the [Local Font Access API](https://github.com/WICG/local-font-access/blob/main/README.md). Possible techniques are:

Expand Down Expand Up @@ -377,7 +377,7 @@ Another possible simplification is to make the `capabilities()` APIs synchronous

### Allowing unknown source languages for translation

An earlier revision of this API including support for combining the langauge detection and translation steps into a single translation call, which did a best-guess on the source language. The idea was that this would possibly be more efficient than requiring the web developer to do two separate calls, and it could possibly even be done using a single model.
An earlier revision of this API including support for combining the language detection and translation steps into a single translation call, which did a best-guess on the source language. The idea was that this would possibly be more efficient than requiring the web developer to do two separate calls, and it could possibly even be done using a single model.

We abandoned this design when it became clear that existing browsers have very decoupled implementations of translation vs. language detection, using separate models for each. This includes supporting different languages for language detection vs. for translation. So even if the translation model supported an unknown-source-language mode, it might not support the same inputs as the language detection model, which would create a confusing developer experience and be hard to signal in the capabilities API.

Expand Down

0 comments on commit f83312b

Please sign in to comment.