Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

Merged
merged 10 commits into from
Feb 29, 2024
19 changes: 3 additions & 16 deletions egs/ljspeech/TTS/local/prepare_token_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,10 @@ def get_args():

def get_token2id(filename: Path) -> Dict[str, int]:
"""Get a dict that maps token to IDs, and save it to the given filename."""
extra_tokens = [
"<blk>", # 0 for blank
"<sos>", # 1 for sos
"<eos>", # 2 for eos
"<unk>", # 3 for OOV
]

all_tokens = list(get_espeak_map().keys())

for t in extra_tokens:
assert t not in all_tokens, t

all_tokens = extra_tokens + all_tokens

all_tokens = get_espeak_map()
with open(filename, "w", encoding="utf-8") as f:
for i, token in enumerate(all_tokens):
f.write(f"{token} {i}\n")
for token, token_id in all_tokens.items():
f.write(f"{token} {token_id[0]}\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you sort by token_id in filename?

That is, sort the second column from 0 to vocab_size-1 in ascending order?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.



if __name__ == "__main__":
Expand Down
6 changes: 4 additions & 2 deletions egs/ljspeech/TTS/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
log "Stage 3: Prepare phoneme tokens for LJSpeech"
# We assume you have installed piper_phonemize and espnet_tts_frontend.
# If not, please install them with:
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize,
# could install the pre-built wheels from https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5
# - espnet_tts_frontend, `pip install espnet_tts_frontend`, refer to https://github.com/espnet/espnet_tts_frontend/
if [ ! -e data/spectrogram/.ljspeech_with_token.done ]; then
./local/prepare_tokens_ljspeech.py
Expand Down Expand Up @@ -119,7 +120,8 @@ if [ $stage -le 5 ] && [ $stop_stage -ge 5 ]; then
log "Stage 5: Generate token file"
# We assume you have installed piper_phonemize and espnet_tts_frontend.
# If not, please install them with:
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize
# - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize,
# could install the pre-built wheels from https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5
# - espnet_tts_frontend, `pip install espnet_tts_frontend`, refer to https://github.com/espnet/espnet_tts_frontend/
if [ ! -e data/tokens.txt ]; then
./local/prepare_token_file.py --tokens data/tokens.txt
Expand Down
27 changes: 13 additions & 14 deletions egs/ljspeech/TTS/vits/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,15 @@ def __init__(self, tokens: str):
id = int(info[0])
else:
token, id = info[0], int(info[1])
assert token not in self.token2id, token
self.token2id[token] = id

self.blank_id = self.token2id["<blk>"]
self.sos_id = self.token2id["<sos>"]
self.eos_id = self.token2id["<eos>"]
self.oov_id = self.token2id["<unk>"]
# Refer to https://github.com/rhasspy/piper/blob/master/TRAINING.md
self.pad_id = self.token2id["_"] # padding
self.sos_id = self.token2id["^"] # beginning of an utterance (bos)
self.eos_id = self.token2id["$"] # end of an utterance (eos)
self.space_id = self.token2id[" "] # word separator (whitespace)

self.vocab_size = len(self.token2id)

def texts_to_token_ids(
Expand Down Expand Up @@ -80,13 +83,11 @@ def texts_to_token_ids(

token_ids = []
for t in tokens:
if t in self.token2id:
token_ids.append(self.token2id[t])
else:
token_ids.append(self.oov_id)
assert t in self.token2id, t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert t in self.token2id, t
if t not in self.token2id:
logging.warning(f'Skip oov {t}')
continue

We just skip OOVs instead of throwing an assertion error, which
may kill the process.

token_ids.append(self.token2id[t])

if intersperse_blank:
token_ids = intersperse(token_ids, self.blank_id)
token_ids = intersperse(token_ids, self.pad_id)
if add_sos:
token_ids = [self.sos_id] + token_ids
if add_eos:
Expand Down Expand Up @@ -122,13 +123,11 @@ def tokens_to_token_ids(
for tokens in tokens_list:
token_ids = []
for t in tokens:
if t in self.token2id:
token_ids.append(self.token2id[t])
else:
token_ids.append(self.oov_id)
assert t in self.token2id, t
token_ids.append(self.token2id[t])

if intersperse_blank:
token_ids = intersperse(token_ids, self.blank_id)
token_ids = intersperse(token_ids, self.pad_id)
if add_sos:
token_ids = [self.sos_id] + token_ids
if add_eos:
Expand Down
Loading