Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

Merged
merged 10 commits into from
Feb 29, 2024

Conversation

yaozengwei
Copy link
Collaborator

Suggested by @csukuangfj, for the ljspeech/vits recipe we replace g2p with piper_phonemize (https://github.com/rhasspy/piper-phonemize) in the text tokenizer for deployment purposes.

Experiments are pending.

@JinZr might help to merge the changes to the vctk/vits recipe later.

@@ -80,6 +80,10 @@ fi

if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
log "Stage 3: Prepare phoneme tokens for LJSpeech"
# We assume you have installed piper_phonemize and espnet_tts_frontend.
# If not, please install them with:
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add the link to pre-built wheels
https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5

@@ -41,18 +41,28 @@ def __init__(self, tokens: str):
self.token2id[token] = id

self.blank_id = self.token2id["<blk>"]
self.sos_id = self.token2id["<sos>"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 41, we need to check that

assert token not in self.token2id, token

extra_tokens = [
"<blk>", # 0 for blank
"<sos/eos>", # 1 for sos and eos symbols.
"<unk>", # 2 for OOV
"<sos>", # 1 for sos
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From
https://github.com/rhasspy/piper/blob/master/TRAINING.md

phoneme_id_map (required)
Map from a phoneme (UTF-8 codepoint) to a list of ids
Id 0 ("_") is padding (pad)
Id 1 ("^") is the beginning of an utterance (bos)
Id 2 ("$") is the end of an utterance (eos)
Id 3 (" ") is a word separator (whitespace)

Can we reuse the tokens from piper-phonmize for sos, eos, and blank?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks!

for i, token in enumerate(all_tokens):
f.write(f"{token} {i}\n")
for token, token_id in all_tokens.items():
f.write(f"{token} {token_id[0]}\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you sort by token_id in filename?

That is, sort the second column from 0 to vocab_size-1 in ascending order?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Looks good to me. Left some minor comments.

token_ids.append(self.token2id[t])
else:
token_ids.append(self.oov_id)
assert t in self.token2id, t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert t in self.token2id, t
if t not in self.token2id:
logging.warning(f'Skip oov {t}')
continue

We just skip OOVs instead of throwing an assertion error, which
may kill the process.

@@ -17,88 +17,42 @@


"""
This file reads the texts in given manifest and generates the file that maps tokens to IDs.
This file generates the file that maps tokens to IDs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright should be 2023-2024.

@@ -45,7 +44,11 @@ def prepare_tokens_ljspeech():
# Text normalization
text = tacotron_cleaner.cleaners.custom_english_cleaners(text)
# Convert to phonemes
cut.tokens = g2p(text)
tokens_list = phonemize_espeak(text, "en-us")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 42

assert len(cut.supervisions) == 1, len(cut.supervisions)

Please use

assert len(cut.supervisions) == 1, (len(cut.supervisions), cut)

It is helpful to print the problematic cut on error.

intersperse_blank: bool = True,
add_sos: bool = False,
add_eos: bool = False,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give the return value a type hint.

@@ -63,30 +76,44 @@ def texts_to_token_ids(self, texts: List[str], intersperse_blank: bool = True):
# Text normalization
text = tacotron_cleaner.cleaners.custom_english_cleaners(text)
# Convert to phonemes
tokens = self.g2p(text)
tokens_list = phonemize_espeak(text, "en-us")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass en-us as an argument to this function.
You can use

lang: str = 'en-us`

as the last argument for this function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot!

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yaozengwei yaozengwei merged commit d89f4ea into k2-fsa:master Feb 29, 2024
75 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants