Use piper_phonemize as text tokenizer in ljspeech recipe #1511

yaozengwei · 2024-02-20T10:09:12Z

Suggested by @csukuangfj, for the ljspeech/vits recipe we replace g2p with piper_phonemize (https://github.com/rhasspy/piper-phonemize) in the text tokenizer for deployment purposes.

Experiments are pending.

@JinZr might help to merge the changes to the vctk/vits recipe later.

…okenizer

csukuangfj · 2024-02-20T10:12:03Z

egs/ljspeech/TTS/prepare.sh

@@ -80,6 +80,10 @@ fi

 if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
  log "Stage 3: Prepare phoneme tokens for LJSpeech"
+  # We assume you have installed piper_phonemize and espnet_tts_frontend.
+  # If not, please install them with:
+  #   - piper_phonemize: refer to https://github.com/rhasspy/piper-phonemize


Please also add the link to pre-built wheels
https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5

csukuangfj · 2024-02-20T10:19:09Z

egs/ljspeech/TTS/vits/tokenizer.py

@@ -41,18 +41,28 @@ def __init__(self, tokens: str):
                self.token2id[token] = id

        self.blank_id = self.token2id["<blk>"]
+        self.sos_id = self.token2id["<sos>"]


At line 41, we need to check that

assert token not in self.token2id, token

csukuangfj · 2024-02-20T10:22:40Z

egs/ljspeech/TTS/local/prepare_token_file.py

    extra_tokens = [
        "<blk>",  # 0 for blank
-        "<sos/eos>",  # 1 for sos and eos symbols.
-        "<unk>",  # 2 for OOV
+        "<sos>",  # 1 for sos


From
https://github.com/rhasspy/piper/blob/master/TRAINING.md

phoneme_id_map (required) Map from a phoneme (UTF-8 codepoint) to a list of ids Id 0 ("_") is padding (pad) Id 1 ("^") is the beginning of an utterance (bos) Id 2 ("$") is the end of an utterance (eos) Id 3 (" ") is a word separator (whitespace)

Can we reuse the tokens from piper-phonmize for sos, eos, and blank?

Ok. Thanks!

csukuangfj · 2024-02-20T13:29:36Z

egs/ljspeech/TTS/local/prepare_token_file.py

-        for i, token in enumerate(all_tokens):
-            f.write(f"{token} {i}\n")
+        for token, token_id in all_tokens.items():
+            f.write(f"{token} {token_id[0]}\n")


Could you sort by token_id in filename?

That is, sort the second column from 0 to vocab_size-1 in ascending order?

csukuangfj

Thanks!

Looks good to me. Left some minor comments.

csukuangfj · 2024-02-20T13:31:34Z

egs/ljspeech/TTS/vits/tokenizer.py

-                    token_ids.append(self.token2id[t])
-                else:
-                    token_ids.append(self.oov_id)
+                assert t in self.token2id, t


Suggested change

assert t in self.token2id, t

if t not in self.token2id:

logging.warning(f'Skip oov {t}')

continue

We just skip OOVs instead of throwing an assertion error, which
may kill the process.

csukuangfj · 2024-02-20T13:32:29Z

egs/ljspeech/TTS/local/prepare_token_file.py

@@ -17,88 +17,42 @@


 """
-This file reads the texts in given manifest and generates the file that maps tokens to IDs.
+This file generates the file that maps tokens to IDs.


The copyright should be 2023-2024.

csukuangfj · 2024-02-20T13:34:58Z

egs/ljspeech/TTS/local/prepare_tokens_ljspeech.py

@@ -45,7 +44,11 @@ def prepare_tokens_ljspeech():
        # Text normalization
        text = tacotron_cleaner.cleaners.custom_english_cleaners(text)
        # Convert to phonemes
-        cut.tokens = g2p(text)
+        tokens_list = phonemize_espeak(text, "en-us")


At line 42

assert len(cut.supervisions) == 1, len(cut.supervisions)

Please use

assert len(cut.supervisions) == 1, (len(cut.supervisions), cut)

It is helpful to print the problematic cut on error.

csukuangfj · 2024-02-20T13:38:49Z

egs/ljspeech/TTS/vits/tokenizer.py

+        intersperse_blank: bool = True,
+        add_sos: bool = False,
+        add_eos: bool = False,
+    ):


Please give the return value a type hint.

csukuangfj · 2024-02-20T13:40:12Z

egs/ljspeech/TTS/vits/tokenizer.py

@@ -63,30 +76,44 @@ def texts_to_token_ids(self, texts: List[str], intersperse_blank: bool = True):
            # Text normalization
            text = tacotron_cleaner.cleaners.custom_english_cleaners(text)
            # Convert to phonemes
-            tokens = self.g2p(text)
+            tokens_list = phonemize_espeak(text, "en-us")


Please pass en-us as an argument to this function.
You can use

lang: str = 'en-us`

as the last argument for this function

thanks a lot!

csukuangfj

Thanks!

yaozengwei added 4 commits February 19, 2024 16:40

minor fix of vits/tokenizer.py

27b1bf4

minor fix of vits/tokenizer.py

ff6784d

use piper_phonemize as text tokenizer in ljspeech recipe

2cf5891

Merge branch 'master' of github.com:k2-fsa/icefall into update-vits-t…

e774912

…okenizer

yaozengwei mentioned this pull request Feb 20, 2024

Implementation of VITS-2 #1508

Closed

csukuangfj reviewed Feb 20, 2024

View reviewed changes

remove extra tokens

cb04833

csukuangfj reviewed Feb 20, 2024

View reviewed changes

minor updates

1851443

csukuangfj approved these changes Feb 20, 2024

View reviewed changes

yaozengwei added 4 commits February 21, 2024 17:50

modify usage of tokenizer in vits/train.py

595d4a3

minor updates related to the tokenizer change

ae83d80

update docs

956e58f

update huggingface link

1d66426

yaozengwei added the ready label Feb 28, 2024

yaozengwei merged commit d89f4ea into k2-fsa:master Feb 29, 2024
75 of 118 checks passed

JinZr mentioned this pull request Feb 29, 2024

Use piper_phonemize as text tokenizer in vctk TTS recipe #1522

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

yaozengwei commented Feb 20, 2024

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

yaozengwei Feb 20, 2024

csukuangfj Feb 20, 2024

yaozengwei Feb 20, 2024

csukuangfj left a comment

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

csukuangfj Feb 20, 2024

yaozengwei Feb 20, 2024

csukuangfj left a comment

-                assert t in self.token2id, t
+                if t not in self.token2id:
+                  logging.warning(f'Skip oov {t}')
+                  continue

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

Use piper_phonemize as text tokenizer in ljspeech recipe #1511

Conversation

yaozengwei commented Feb 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj left a comment

Choose a reason for hiding this comment