Guided synthesis - API Improvement #376

Patchethium · 2022-03-21T03:02:52Z

内容

Following #252, as I'm digging down into the GUI part I found some parts of the API design are obviously not clever enough and improved them.

Now the AudioQuery has a new optional section guidedInfo at root, containing all the information needed for guided synthesis and would directly be passed to the engine.

{
    "accent_phrases": [
       "..."
    ],
    "guidedInfo": {
        "enabled": true,
        "audioPath": "/home/.../sample.wav",
        "normalize": true,
        "precise": true
    }
}

As shown above, I replaced the uploaded file with a full path to the file in string so I can get rid of the form data and use a simpler design. As a result, guided_synthesis and guided_accent_phrases now are exactly the same with the synthesis API, which I think would ease the GUI development a lot.

More details could be found in the change of README.

PS: Considering the usage in GUI, guided_accent_phrases doesn't actually work like accent_phrases. That's the reason why I removed most of the parameters as well as the text and is_kana.

# Conflicts: # .gitignore # voicevox_engine/dev/synthesis_engine/mock.py

# Conflicts: # run.py # voicevox_engine/dev/synthesis_engine/mock.py # voicevox_engine/synthesis_engine/synthesis_engine.py # voicevox_engine/synthesis_engine/synthesis_engine_base.py

github-actions · 2022-03-21T03:05:47Z

Coverage Result

Resultを開く

Name	Stmts	Miss
voicevox_engine/init.py	1	0
voicevox_engine/acoustic_feature_extractor.py	75	0
voicevox_engine/dev/synthesis_engine/init.py	2	0
voicevox_engine/dev/synthesis_engine/mock.py	40	4
voicevox_engine/experimental/init.py	0	0
voicevox_engine/experimental/guided_extractor.py	128	94
voicevox_engine/experimental/julius4seg/init.py	0	0
voicevox_engine/experimental/julius4seg/converter.py	298	295
voicevox_engine/experimental/julius4seg/sp_inserter.py	116	89
voicevox_engine/full_context_label.py	162	3
voicevox_engine/kana_parser.py	86	1
voicevox_engine/model.py	163	7
voicevox_engine/mora_list.py	4	0
voicevox_engine/part_of_speech_data.py	5	0
voicevox_engine/preset/Preset.py	12	0
voicevox_engine/preset/PresetLoader.py	34	1
voicevox_engine/preset/init.py	3	0
voicevox_engine/synthesis_engine/init.py	5	0
voicevox_engine/synthesis_engine/core_wrapper.py	167	132
voicevox_engine/synthesis_engine/make_synthesis_engines.py	52	43
voicevox_engine/synthesis_engine/synthesis_engine.py	181	69
voicevox_engine/synthesis_engine/synthesis_engine_base.py	68	9
voicevox_engine/user_dict.py	98	10
voicevox_engine/utility/init.py	3	0
voicevox_engine/utility/connect_base64_waves.py	35	3
voicevox_engine/utility/engine_root.py	9	2
TOTAL	1747	762

Hiroshiba

I noticed from reading the README you wrote, guided synthesis is at the frame level!
This has one positive and two negative consequences.

The advantage is of course the higher resolution of the input. It is undefined behavior for the model, but the user will feel happy.

The first disadvantage is that the VOICEVOX UI (mora level) does not allow adjustment of the pitch or length. Users will have to re-create the audio for the guide.

The second disadvantage is that it may become unavailable in the future. In fact, I am currently developing a decoder model for high quality, which does not allow frame-level F0 input.

I think it would be ok to either keep the frame level or change to mora level, since we are in the experimental stage right now. But we have to choose one of them....
Sorry for the late notice...

If we change to the mora level, the code would be very straightforward because we would create an API to get the AccentPhrase from the voice for the guide.

書いてくれたREADMEを読んで気づきました。guided synthesisはフレームレベルですね！
これには１つの嬉しいことと、２つの損があります。

利点はもちろん入力の解像度が高いことです。モデルにとっては未定義動作ですが、ユーザーは嬉しく感じるでしょう。

１つ目の欠点は、VOICEVOXのUI（モーラレベル）で音高や長さを微調整できない点です。ユーザーはガイド用の音声を作り直す必要があります。

２つ目の欠点は、将来的に利用不可になるかもしれない点です。実はいま高品質用デコーダーモデルを作成中なのですが、これはフレームレベルのF0入力ができません。

フレームレベルのままにするのか、モーラレベルに変更するか、今は実験段階なのでどちらでも良いと思います。が、どちらかを選ぶ必要があります･･･。
気づくのが遅れてしまって申し訳ないです･･･。

もしモーラレベルに変更すると、ガイド用音声からAccentPhraseを得るAPIを作ることでコードがとてもわかりやすくなりそうです。

Patchethium · 2022-03-29T01:59:54Z

In fact, I am currently developing a decoder model for high quality, which does not allow frame-level F0 input.

Interesting, I wonder how you'd handle the alignment.

Since I've already completed the GUI part I don't feel like giving up on a working feature. I suggest we keep this feature until the new architecture is introduced into this repository. No matter how it turns out to be there'll be a breaking change where we can remove this API by the way.

Hiroshiba · 2022-03-29T02:22:57Z

Ok, I understand!

Hiroshiba

Sorry for the wait!

voicevox_engine/model.py

README.md

voicevox_engine/synthesis_engine/synthesis_engine.py

Co-authored-by: Hiroshiba <[email protected]>

Patchethium · 2022-04-03T14:30:14Z

Should be okay now.

Patchethium · 2022-04-09T15:15:00Z

Another week is passing, how's it going on now?

Hiroshiba

LGTM！！

Sorry to keep you waiting!

Patchethium · 2022-04-11T12:49:37Z

Great, you may also want to check out the pr on GUI side so we can finish this feature for the next release.

Patchethium added 30 commits December 28, 2021 14:04

forced alignment, f0 extraction and entry point

a73892b

Merge branch 'master' into guided_synthesis

28cf7c2

kind of finished

a060398

change julius4seg, doesn't seem to help

f7a3713

run pysen format

6b0651f

add speaker id to api

f1a663a

run pysen format

668df80

add accent_phrase api, finish

ad4bdbd

add request parameter

ea95405

improve error handling

6dff2ec

run pysen format

34eec39

add parameters

a0cba4d

run pysen format

90e41e2

a little boundary check

e889207

add normalization for different WAV format

c98c8be

run format

1c6d96e

run format

2d74993

Merge branch 'master' into guided_synthesis

ca356df

move synthesis and accent phrase to synthesis engine

f088176

add test for mock

cf18c3c

change url for apis

98d387c

simplify

48b629f

error type

061483c

Merge branch 'master' into guided_synthesis

fc45886

# Conflicts: # .gitignore # voicevox_engine/dev/synthesis_engine/mock.py

do something

0e26bbb

do something

365ed92

run format

29427d9

Merge branch 'master' into guided_synthesis

ddc6537

# Conflicts: # run.py # voicevox_engine/dev/synthesis_engine/mock.py # voicevox_engine/synthesis_engine/synthesis_engine.py # voicevox_engine/synthesis_engine/synthesis_engine_base.py

resolve conflict

ca6df3b

add usage to README

730917f

Patchethium added 6 commits February 27, 2022 21:26

Merge branch 'master' into guided_synthesis

3522370

add comments and experimental flag for guided api

9b75c6c

add guided info to AudioQuery model

588177d

improve api definition

9aa3ed0

Merge branch 'master' into guided_synthesis

1aec6a3

run format, update README

967a8a1

add error handling for wrong audio formats, edit README

80831a4

Patchethium mentioned this pull request Mar 21, 2022

Guided Synthesis VOICEVOX/voicevox#769

Closed

Patchethium added 2 commits March 28, 2022 09:59

reserve unvoiced mora, add response type

f33a8aa

remove 422 error, move boundary check

623cadd

Hiroshiba reviewed Mar 28, 2022

View reviewed changes

Hiroshiba requested changes Apr 2, 2022

View reviewed changes

voicevox_engine/model.py Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

voicevox_engine/synthesis_engine/synthesis_engine.py Outdated Show resolved Hide resolved

Patchethium and others added 3 commits April 2, 2022 22:59

Update voicevox_engine/synthesis_engine/synthesis_engine.py

7f6f575

Co-authored-by: Hiroshiba <[email protected]>

move guided info to the outside of query

b54f803

run fmt

31b89ba

Patchethium added 2 commits April 3, 2022 22:39

update README

3b429e8

fix README

b3348ef

Hiroshiba approved these changes Apr 10, 2022

View reviewed changes

Hiroshiba merged commit 9af4963 into VOICEVOX:master Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided synthesis - API Improvement #376

Guided synthesis - API Improvement #376

Patchethium commented Mar 21, 2022 •

edited

Loading

github-actions bot commented Mar 21, 2022 •

edited

Loading

Hiroshiba left a comment

Patchethium commented Mar 29, 2022 •

edited

Loading

Hiroshiba commented Mar 29, 2022

Hiroshiba left a comment

Patchethium commented Apr 3, 2022

Patchethium commented Apr 9, 2022

Hiroshiba left a comment

Patchethium commented Apr 11, 2022

Guided synthesis - API Improvement #376

Guided synthesis - API Improvement #376

Conversation

Patchethium commented Mar 21, 2022 • edited Loading

内容

github-actions bot commented Mar 21, 2022 • edited Loading

Coverage Result

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Mar 29, 2022 • edited Loading

Hiroshiba commented Mar 29, 2022

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Apr 3, 2022

Patchethium commented Apr 9, 2022

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Apr 11, 2022

Patchethium commented Mar 21, 2022 •

edited

Loading

github-actions bot commented Mar 21, 2022 •

edited

Loading

Patchethium commented Mar 29, 2022 •

edited

Loading