Transforming this repo from a documentation to a code repo #9

HadrienGardeur · 2024-04-11T08:24:59Z

HadrienGardeur
Apr 11, 2024
Maintainer

Proposal

As the number of languages supported by this repo increases, it makes it more and more relevant to explore going beyond a documentation repo.
Handling localization through code makes also more sense than documenting hundreds of different alt names (see #38 and #29).

In terms of scope, I believe that providing a utility class for voices could be the right approach to kickstart things. While there are already projects focusing on the entire Web Speech API such as Easy Speech, there's nothing yet with voice selection as a focus.

API

return a list of voices:
- filtered by:
  - quality (by default, we should filter out novelty and very low quality voices)
  - recommended voices
  - languages (with optional subtags for regions)
  - gender
  - offline availability
- grouped by:
  - language
  - region
  - gender
  - recommended voices vs everything else
- ordered by:
  - quality (default option)
  - name
  - language
  - region
  - gender
return a list of languages:
- that can be filtered by:
  - quality (by default, we should filter out novelty and very low quality voices)
  - recommended voices
  - offline availability

Output

Voices

When calling voices(), the API will return an array of voice objects where:

label using the label documented in recommended voices with a fallback on the name returned by getVoices()
voiceURI using the same property returned by getVoices()
language using the lang value returned by getVoices()
gender as documented in recommended voices, this property is omitted if this information is missing
age as documented in recommended voices, this property is omitted if this information is missing
offlineAvailability using a boolean based on localService, as returned by getVoices()
quality based on the best available quality that can be detected, this property is omitted if this information is missing
pitchControl using a boolean which defaults to true if it's undocumented
recommendedPitch using the pitch value documented in recommended voices, this property is omitted if this information is missing
recommendedRate using the rate value documented in recommended voices, this property is omitted if this information is missing

When grouped by another criteria, voices() will instead return a list of languages, regions, genders or recommended/others that will themselves contain an array of voice objects.

Languages

When calling languages(), the API will return an array of language objects where:

language is a BCP-47 language tag
count is the number of available voices for that language, based on filters that were applied

Examples

Example 1: Return a list of recommended voices for English

As a developer, I'm building a news app where all the content is in English. I want to make sure that users will have a great listening experience and only want to provide a list of high quality voices in a selector.

VoiceSelector.voices.filterByLanguage("en").recommended which would return a list of all recommended voices available on the current device, ranked by position in the list of recommended voices.

Example 2: Return a list of languages and list all voices per language

As a developer, I'm building a reading app.
I can detect the language of each utterance, but I'd like to provide an override that the user can rely on if the language is incorrectly documented. I want to make sure that the default voice is as good as possible, but provide the user with the ability to select any other voice, including ones that they've installed and that are not currently listed in the list of recommended voices.

A dropdown selector where the user can override the language used to read aloud their publication would use:

VoiceSelector.languages() which would return a list of all available languages.

A second dropdown selector would list voices available to the user based on a given language. Recommended voices would be displayed on top, followed by every other voice available, grouped by region:

VoiceSelector.voices.filterByLanguage("en").sortByRecommended().groupByRegion() which would return all voices available in English, grouped by region and sorted according to the list of recommended voices, with the remaining voices listed below in each region.

HadrienGardeur · 2024-04-11T08:27:07Z

HadrienGardeur
Apr 11, 2024
Maintainer Author

Any thoughts on that @danielweck, @chocolatkey, @oscar-rivera-demarque and @mickael-menu?

5 replies

mickael-menu Apr 11, 2024

That would definitely be helpful. Would this API check for the actual availability of each recommended voice? I guess it depends on the OS version whether or not they are available, maybe something to add to the JSON data too?

HadrienGardeur Apr 12, 2024
Maintainer Author

Yes it would and it should be the default behaviour.

There would be two sources of data for the utility class:

window.speechSynthesis.getVoices() as described on MDN
and the various JSON files available in this repo

There's no need to check for the OS/browser (this info is already documented in the JSON files BTW), comparing the voice names returned by getVoices() with the ones listed in the JSON files should be enough.

chocolatkey Apr 12, 2024

I think a way to proceed would be to start hashing out these APIs as Typescript definitions

HadrienGardeur Apr 13, 2024
Maintainer Author

Well that's another potential discussion point: TypeScript or plain/vanilla JS?

While most of the documentation work that I intend to do for the rest of this project is focused on the read aloud feature of reading apps, that's not necessarily the scope of this repo and/or proposal.
Would it be more useful for the wider Web community if this was plain JS?

chocolatkey Apr 13, 2024

I think it's definitely more useful in Typescript, and it will even help people who have no interest in Typescript. I'll explain briefly why using a popular project as an example:
If you check this source code directory: https://github.com/GoogleChrome/workbox/tree/v7/packages/workbox-core/src you'll see that it's pure Typescript (which is what I am suggesting we have in this repo). However, if you see what the compiled files are in the NPM package that I'm actually using, it has been transformed into a set of three files:

When people use the dependency in their project, the .js files (ignore .mjs for now but they're useful too) contain the actual code, and the .d.ts files provide the definitions, which help your IDE and the Typescript compiler in the client project. Even if you aren't using Typescript at all in your project, as long as you use a compatible IDE, the definitions will still help you integrate the project into your code.

Therefore, I think writing our wrapper code in Typescript will help us and anyone who uses the project.

Falcosc · 2024-06-08T10:03:48Z

Falcosc
Jun 8, 2024

That was the first thing I was searching here, since the demo did not order the voices I was feeling like I was missing something. While implementing my own sorting, I started to think about the next issue: how to get notified if the json structure changes?

For that reason, code would be much better.

I would only need
group by language (include unknown voices not found in your list)
order by recommendation (with custom solution for apples quality variants)

Then I could get the following voice list:

recommended voices in user locale
unknown voices in user locale
recommended voices in English
unknown voices in English
all other voices

I currently don't have a solution for apple high quality voices. "Lee (Premium)" could end up at the end of our your suggestions, since Lee is the last apple voice in your list. I didn't think about a solution for that, since Apply premium voices are basically dead, since market share of MacOS non safari browsers is nearly nothing.
Only iOS Safari would be relevant, but this isn't working, so no solution needed there :D

1 reply

HadrienGardeur Jun 8, 2024
Maintainer Author

I currently don't have a solution for apple high quality voices. "Lee (Premium)" could end up at the end of our your suggestions, since Lee is the last apple voice in your list. I didn't think about a solution for that, since Apply premium voices are basically dead, since market share of MacOS non safari browsers is nearly nothing.

Chrome on macOS should have a decent marketshare IMO, probably better than Safari on macOS.

For Apple voices, the most common scenarios are:

voices with two variants:
- default and high
- or default and very high
voices with three variants: default, high and very high
and in some very rare cases, voices with a single variant

Right now, I'm documenting this info through altNames, but the more I think about it, the more I believe that documenting this information with a specialized property (quality ?) and locales would work better.

With code, we could also handle Firefox better, where you can truly identify the quality of each variant. In this case, I think it's best to only list the highest quality voice available.

In other browsers, that's already what I do with my demo.

Falcosc · 2024-06-17T15:38:43Z

Falcosc
Jun 17, 2024

I did start to implement more features of your json, but I didn't like having even more code dependent on an external JSON structure. But my biggest issue was managing more lines of code, which aren't the focus of my project. As soon as I started to think about putting the voice selection in a 2nd JS file, I stopped and reverted everything.

Instead of implementing it in my proof of concept project. I will be patiently waiting for either your next step or somebody complaining about my not so user-friendly voice selection instead of getting distracted hunting for the perfect centrally managed voice attributes.

So I can't give you feedback for the implementation of the new ideas, which is a bit rude since I was the one mentioning new topics. I keep my one-liner as long as I am only in the proof of concept state.

If you go for an JS lib, I would recommend sticking to parts used by the demo. Without expending the demo, just the most simple thing, documented voices filtered by language.
I would be happy to create pull requests for things which are not part of the demo and needed for me:

get unknown or unrecommended voices
getter for pitchControl

1 reply

HadrienGardeur Jun 17, 2024
Maintainer Author

get unknown or unrecommended voices

I think that would be in scope but:

with novelty and very low quality voices filtered out by default
and with the ability to group them with recommended voices, but list them lower in the list

HadrienGardeur · 2024-07-12T17:25:30Z

HadrienGardeur
Jul 12, 2024
Maintainer Author

Here's a quick update on this matter:

I've made a revision of my initial message, refining the API and documenting what the output could look like
we're discussing implementing this project into Thorium, where it would power the voice selector component
we're also exploring the idea of a new Readium project (Readium Speech maybe?) that would implement the API documented in this discussion but also cover additional features such as:
- breaking down a document into structural elements
- generating utterances from these structural elements
- pre-processing utterances
- TTS playback
- highlighting

3 replies

HadrienGardeur Aug 1, 2024
Maintainer Author

I've set up the repo for this at https://github.com/readium/speech

Hopefully we can get this done in August.

HadrienGardeur Aug 9, 2024
Maintainer Author

I've spent the afternoon with the developer working on this project yesterday and he's making great strides, so I'm fairly confident that this should be mostly ready for a PR review by mid-August.

@Falcosc are you confortable with TypeScript for a review of that PR? I'll also loop in other stakeholders involved with Readium.

Falcosc Aug 9, 2024

I can't contribute on finding specific JS issues in your PR. But I could use my general knowledge to find issues/risks if this is good enough for you.

hobodrifterdavid · 2024-08-27T00:10:36Z

hobodrifterdavid
Aug 27, 2024

This repo is very useful. I was wondering if you could create a more detailed code example of how it's intended to select good voices, taking into account different quality levels and different localised names.

1 reply

HadrienGardeur Aug 27, 2024
Maintainer Author

There's on-going work on a first implementation: readium/speech#7

HadrienGardeur · 2024-09-14T16:29:17Z

HadrienGardeur
Sep 14, 2024
Maintainer Author

With the on-going work on Readium Speech and a working demo, I can safely close that discussion for now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforming this repo from a documentation to a code repo #9

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Transforming this repo from a documentation to a code repo #9

HadrienGardeur Apr 11, 2024 Maintainer

Proposal

API

Output

Voices

Languages

Examples

Replies: 6 comments · 11 replies

HadrienGardeur Apr 11, 2024 Maintainer Author

mickael-menu Apr 11, 2024

HadrienGardeur Apr 12, 2024 Maintainer Author

chocolatkey Apr 12, 2024

HadrienGardeur Apr 13, 2024 Maintainer Author

chocolatkey Apr 13, 2024

Falcosc Jun 8, 2024

HadrienGardeur Jun 8, 2024 Maintainer Author

Falcosc Jun 17, 2024

HadrienGardeur Jun 17, 2024 Maintainer Author

HadrienGardeur Jul 12, 2024 Maintainer Author

HadrienGardeur Aug 1, 2024 Maintainer Author

HadrienGardeur Aug 9, 2024 Maintainer Author

Falcosc Aug 9, 2024

hobodrifterdavid Aug 27, 2024

HadrienGardeur Aug 27, 2024 Maintainer Author

HadrienGardeur Sep 14, 2024 Maintainer Author

HadrienGardeur
Apr 11, 2024
Maintainer

Replies: 6 comments 11 replies

HadrienGardeur
Apr 11, 2024
Maintainer Author

HadrienGardeur Apr 12, 2024
Maintainer Author

HadrienGardeur Apr 13, 2024
Maintainer Author

Falcosc
Jun 8, 2024

HadrienGardeur Jun 8, 2024
Maintainer Author

Falcosc
Jun 17, 2024

HadrienGardeur Jun 17, 2024
Maintainer Author

HadrienGardeur
Jul 12, 2024
Maintainer Author

HadrienGardeur Aug 1, 2024
Maintainer Author

HadrienGardeur Aug 9, 2024
Maintainer Author

hobodrifterdavid
Aug 27, 2024

HadrienGardeur Aug 27, 2024
Maintainer Author

HadrienGardeur
Sep 14, 2024
Maintainer Author