-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use piper_phonemize as text tokenizer in ljspeech recipe #1511
Conversation
egs/ljspeech/TTS/prepare.sh
Outdated
@@ -80,6 +80,10 @@ fi | |||
|
|||
if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then | |||
log "Stage 3: Prepare phoneme tokens for LJSpeech" | |||
# We assume you have installed piper_phonemize and espnet_tts_frontend. | |||
# If not, please install them with: | |||
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add the link to pre-built wheels
https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5
egs/ljspeech/TTS/vits/tokenizer.py
Outdated
@@ -41,18 +41,28 @@ def __init__(self, tokens: str): | |||
self.token2id[token] = id | |||
|
|||
self.blank_id = self.token2id["<blk>"] | |||
self.sos_id = self.token2id["<sos>"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 41, we need to check that
assert token not in self.token2id, token
extra_tokens = [ | ||
"<blk>", # 0 for blank | ||
"<sos/eos>", # 1 for sos and eos symbols. | ||
"<unk>", # 2 for OOV | ||
"<sos>", # 1 for sos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From
https://github.com/rhasspy/piper/blob/master/TRAINING.md
phoneme_id_map (required)
Map from a phoneme (UTF-8 codepoint) to a list of ids
Id 0 ("_") is padding (pad)
Id 1 ("^") is the beginning of an utterance (bos)
Id 2 ("$") is the end of an utterance (eos)
Id 3 (" ") is a word separator (whitespace)
Can we reuse the tokens from piper-phonmize for sos, eos, and blank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks!
for i, token in enumerate(all_tokens): | ||
f.write(f"{token} {i}\n") | ||
for token, token_id in all_tokens.items(): | ||
f.write(f"{token} {token_id[0]}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you sort by token_id in filename
?
That is, sort the second column from 0 to vocab_size-1 in ascending order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Looks good to me. Left some minor comments.
egs/ljspeech/TTS/vits/tokenizer.py
Outdated
token_ids.append(self.token2id[t]) | ||
else: | ||
token_ids.append(self.oov_id) | ||
assert t in self.token2id, t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert t in self.token2id, t | |
if t not in self.token2id: | |
logging.warning(f'Skip oov {t}') | |
continue |
We just skip OOVs instead of throwing an assertion error, which
may kill the process.
@@ -17,88 +17,42 @@ | |||
|
|||
|
|||
""" | |||
This file reads the texts in given manifest and generates the file that maps tokens to IDs. | |||
This file generates the file that maps tokens to IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright should be 2023-2024.
@@ -45,7 +44,11 @@ def prepare_tokens_ljspeech(): | |||
# Text normalization | |||
text = tacotron_cleaner.cleaners.custom_english_cleaners(text) | |||
# Convert to phonemes | |||
cut.tokens = g2p(text) | |||
tokens_list = phonemize_espeak(text, "en-us") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 42
assert len(cut.supervisions) == 1, len(cut.supervisions)
Please use
assert len(cut.supervisions) == 1, (len(cut.supervisions), cut)
It is helpful to print the problematic cut on error.
egs/ljspeech/TTS/vits/tokenizer.py
Outdated
intersperse_blank: bool = True, | ||
add_sos: bool = False, | ||
add_eos: bool = False, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give the return value a type hint.
egs/ljspeech/TTS/vits/tokenizer.py
Outdated
@@ -63,30 +76,44 @@ def texts_to_token_ids(self, texts: List[str], intersperse_blank: bool = True): | |||
# Text normalization | |||
text = tacotron_cleaner.cleaners.custom_english_cleaners(text) | |||
# Convert to phonemes | |||
tokens = self.g2p(text) | |||
tokens_list = phonemize_espeak(text, "en-us") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pass en-us
as an argument to this function.
You can use
lang: str = 'en-us`
as the last argument for this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Suggested by @csukuangfj, for the ljspeech/vits recipe we replace g2p with piper_phonemize (https://github.com/rhasspy/piper-phonemize) in the text tokenizer for deployment purposes.
Experiments are pending.
@JinZr might help to merge the changes to the vctk/vits recipe later.