-
Notifications
You must be signed in to change notification settings - Fork 206
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Guided synthesis - API Improvement (#376)
* forced alignment, f0 extraction and entry point * kind of finished * change julius4seg, doesn't seem to help * run pysen format * add speaker id to api * run pysen format * add accent_phrase api, finish * add request parameter * improve error handling * run pysen format * add parameters * run pysen format * a little boundary check * add normalization for different WAV format * run format * run format * move synthesis and accent phrase to synthesis engine * add test for mock * change url for apis * simplify * error type * do something * do something * run format * resolve conflict * add usage to README * add comments and experimental flag for guided api * add guided info to AudioQuery model * improve api definition * run format, update README * add error handling for wrong audio formats, edit README * reserve unvoiced mora, add response type * remove 422 error, move boundary check * Update voicevox_engine/synthesis_engine/synthesis_engine.py Co-authored-by: Hiroshiba <[email protected]> * move guided info to the outside of query * run fmt * update README * fix README Co-authored-by: Hiroshiba <[email protected]>
- Loading branch information
1 parent
0751917
commit 9af4963
Showing
6 changed files
with
109 additions
and
116 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,34 +245,49 @@ curl -s \ | |
``` | ||
|
||
### Guidied Synthsis | ||
Currently, we have two apis which accept an uploaded audio file and return corresponding synthesis information. | ||
Both of them recommend setting `is_kana` to be `true` and use `kana` section from `AudioQuery` for the best performance. | ||
You can also get the kana text in AquesTalk section. | ||
Currently, we have two apis generates audio (`guided_synthesis`) and a list of AccentPhrase (`guided_accent_phrases`) referencing an external audio source. | ||
It's worth noting that different from `guided_accent_phrases`, `guided_synthesis` works in the resolution of frames, as a result they are not compatible with each other. | ||
**The external audio should be in wav format.** | ||
```bash | ||
# Returns an audio file which is synthesised referencing uploaded audio | ||
# this example needs a recording whose content is | ||
# "また 東寺のように 五大明王と 呼ばれる 主要な 明王の 中央に 配されることも多い" | ||
|
||
curl -L -X POST 'localhost:50021/guided_synthesis' \ | ||
-F 'kana="マ'\''タ、ト'\''オジノヨオニ、ゴダイミョオオ'\''オト、ヨ'\''/バレ'\''ル、シュ'\''ヨオナ、ミョオ'\''オオ/ノ'\''、チュ'\''ウオオニ、ハイサレルコ'\''/トモ'\''オオイ"' \ | ||
-F 'speaker_id="5"' \ | ||
-F 'audio_file=@"/full_path_to_your_recording"' \ | ||
-F 'normalize="true"' \ | ||
-F 'stereo="true"' \ | ||
-F 'sample_rate="24000"' \ | ||
-F 'volume_scale="1"' \ | ||
-F 'pitch_scale="0"' \ | ||
-F 'speed_scale="1"' | ||
|
||
# Returns a list of AccentPhrases | ||
|
||
curl -L -X POST 'localhost:50021/guided_accent_phrase' \ | ||
-F 'text="マ'\''タ、ト'\''オジノヨオニ、ゴダイミョオオ'\''オト、ヨ'\''/バレ'\''ル、シュ'\''ヨオナ、ミョオ'\''オオ/ノ'\''、チュ'\''ウオオニ、ハイサレルコ'\''/トモ'\''オオイ"' \ | ||
-F 'speaker="5"' \ | ||
-F 'audio_file=@"/full_path_to_your_recording"' \ | ||
-F 'normalize="true"' \ | ||
-F 'is_kana="true"' \ | ||
-F 'enable_interrogative="false"' | ||
# guided_syhthesis returns an audio file which is synthesised referencing the external audio source | ||
|
||
echo -n "また 東寺のように 五大明王と 呼ばれる 主要な 明王の 中央に 配されることも多い" > text.txt | ||
|
||
curl -s \ | ||
-X POST \ | ||
"localhost:50021/audio_query?speaker=1" \ | ||
--get --data-urlencode [email protected] \ | ||
> query.json | ||
|
||
# if true, the average of f0 will be normalized to the predicted average | ||
normalize="true" | ||
# full path to your audio record | ||
audio_path="/home/.../sample.wav" | ||
|
||
curl -s \ | ||
-H "Content-Type: application/json" \ | ||
-X POST \ | ||
-d @query.json \ | ||
"localhost:50021/guided_synthesis?speaker=1&normalize=$normalize&audio_path=$audio_path" \ | ||
> audio.wav | ||
|
||
# guided_accent_phrases returns a list of AccentPhrases | ||
curl -s \ | ||
-H "Content-Type: application/json" \ | ||
-X POST \ | ||
-d @query.json \ | ||
"http://localhost:50021/guided_accent_phrases?speaker=0&normalize=$normalize&audio_path=$audio_path" \ | ||
> newphrases.json | ||
|
||
# replace the accent_phrases section in query | ||
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json | ||
|
||
curl -s \ | ||
-H "Content-Type: application/json" \ | ||
-X POST \ | ||
-d @newquery.json \ | ||
"localhost:50021/synthesis?speaker=1" \ | ||
> audio.wav | ||
``` | ||
|
||
### 話者の追加情報を取得するサンプルコード | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.