Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription error: wav file is empty #11

Closed
GregoryBetsey opened this issue Mar 26, 2021 · 58 comments
Closed

Transcription error: wav file is empty #11

GregoryBetsey opened this issue Mar 26, 2021 · 58 comments
Assignees
Labels
bug Something isn't working

Comments

@GregoryBetsey
Copy link

Hello

I am running the Voice-Cloning-App.exe on Windows 10. I have a GeForce RTX 2060 Graphics Card with the GeForce Game Ready Driver Version 461.92.

When I attempt build the data set, the windows console stops after the following:

[12644] WARNING: file already exists but should not: C:\Users\GREGOR1\AppData\Local\Temp_MEI126442\torch_C.cp38-win_amd64.pyd
Server initialized for threading.
Server initialized for threading.
pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
INFO:matplotlib.font_manager:Generating new fontManager, this may take some time...
[nltk_data] Downloading package wordnet to C:\Users\GREGOR
1\AppData\L
[nltk_data] ocal\Temp_MEI126442\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

  • Serving Flask app "main" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:25] "GET / HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "POST / HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /static/error.css HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /favicon.ico HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:11] "GET / HTTP/1.1" 200 -
    Starting Thread
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST / HTTP/1.1" 200 -
    qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmkr HTTP/1.1" 200 -
    qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice,
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice,
    qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"}
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"}
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjKmlA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}]
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlb&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}]
    INFO:voice:Loading audio from data\datasets\JamesEarlJones\audio.mp3...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}]
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKnxd&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}]
    INFO:voice:Loading script from data\datasets\JamesEarlJones\text.txt...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:voice:Fetching segments...
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgH&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    qINJoZN0iSsAW66FAAAA: Sending packet PING data None
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgS&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
    qINJoZN0iSsAW66FAAAA: Received packet PONG data
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "POST /socket.io/?EIO=4&transport=polling&t=NXjKsst&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}]
    INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:20] "GET /socket.io/?EIO=4&transport=polling&t=NXjKssu&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
    INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}]
    INFO:voice:Transcribing segments...
    Using cache found in C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master
    torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to [Announcement] Improving I/O for correct and consistent experience pytorch/audio#903 for the detail.
    Exception in thread Thread-13:
    Traceback (most recent call last):
    File "application\utils.py", line 47, in background_task
    max_seqlength = max(max([len(_) for _ in batch]), 12800)
    File "application\utils.py", line 32, in create_dataset
    if wav.size(0) > 1:
    File "dataset\forced_alignment\align.py", line 123, in align
    File "dataset\transcribe.py", line 34, in stt
    File "dataset\transcribe.py", line 16, in transcribe
    File "torch\hub.py", line 370, in load
    File "torch\hub.py", line 399, in _load_local
    File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt
    model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit,
    File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model
    model = torch.jit.load(model_path, map_location=device)
    File "torch\jit_serialization.py", line 161, in load
    RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "threading.py", line 932, in bootstrap_inner
File "threading.py", line 870, in run
File "application\utils.py", line 50, in background_task
inputs[i, :len(wav)].copy
(wav)
NameError: name 'traceback' is not defined
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "GET /socket.io/?EIO=4&transport=polling&t=NXjKvy4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "POST /socket.io/?EIO=4&transport=polling&t=NXjKyzw&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "GET /socket.io/?EIO=4&transport=polling&t=NXjKyzw.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "POST /socket.io/?EIO=4&transport=polling&t=NXjL358&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "GET /socket.io/?EIO=4&transport=polling&t=NXjL358.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "POST /socket.io/?EIO=4&transport=polling&t=NXjL9CA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjL9CB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "POST /socket.io/?EIO=4&transport=polling&t=NXjLFJ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "GET /socket.io/?EIO=4&transport=polling&t=NXjLFJ8.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "POST /socket.io/?EIO=4&transport=polling&t=NXjLLQ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "GET /socket.io/?EIO=4&transport=polling&t=NXjLLQ9&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "POST /socket.io/?EIO=4&transport=polling&t=NXjLRWz&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjLRW-&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "POST /socket.io/?EIO=4&transport=polling&t=NXjLXe3&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "GET /socket.io/?EIO=4&transport=polling&t=NXjLXe4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "POST /socket.io/?EIO=4&transport=polling&t=NXjLdkv&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "GET /socket.io/?EIO=4&transport=polling&t=NXjLdkv.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "POST /socket.io/?EIO=4&transport=polling&t=NXjLjrp&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "GET /socket.io/?EIO=4&transport=polling&t=NXjLjrq&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "POST /socket.io/?EIO=4&transport=polling&t=NXjLpyh&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjLpyh.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None
qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjLw3g&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 -
qINJoZN0iSsAW66FAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:03:07] "qINJoZN0iSsAW66FAAAA: Received packet CLOSE data
GET /socket.io/?EIO=4&transport=polling&t=NXjLw3g.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1qINJoZN0iSsAW66FAAAA: Client is gone, closing socket
Error.txt

@BenAAndrew BenAAndrew added the bug Something isn't working label Mar 28, 2021
@BenAAndrew
Copy link
Collaborator

@GregoryBetsey It looks like something went wrong when trying to transcribe your audio to build the dataset. Could you firstly check that you used the latest executable Version 0.3 as the second error should have been fixed in that release.

If you did use that or the error still occurs could you upload your audio/text to google drive or email it to me at [email protected] so I can run some analysis

@GregoryBetsey
Copy link
Author

@BenAAndrew Thanks for responding. I will send you a download link to your email address. I did not use the "automatic" audiobook method shown in your Youtube video, rather I transcribed the text manually.

@BenAAndrew BenAAndrew self-assigned this Mar 29, 2021
@GregoryBetsey
Copy link
Author

Update: I tried the latest release and got this error: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory.

Server initialized for threading.
Server initialized for threading.
pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
['C:\Users\GREGOR1\AppData\Local\Temp\_MEI104602\base_library.zip', 'C:\Users\GREGOR1\AppData\Local\Temp\_MEI104602', 'synthesis/waveglow/', 'C:\Users\Gregory Betsey']
torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
INFO:matplotlib.font_manager:Generating new fontManager, this may take some time...
[nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L
[nltk_data] ocal\Temp_MEI104602\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
INSTALLING FFMPEG
VERIFYING FFMPEG INSTALL
WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

  • Serving Flask app "main" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:26:40] "GET / HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:26:42] "GET /static/main.css HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:26:42] "GET /static/pane.js HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:26:42] "GET /static/favicon/favicon-16x16.png HTTP/1.1" 200 -
    Starting Thread
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:09] "POST / HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:09] "GET /static/application.js HTTP/1.1" 200 -
    CxB55ktHT5jOvFCmAAAA: Sending packet OPEN data {'sid': 'CxB55ktHT5jOvFCmAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet OPEN data {'sid': 'CxB55ktHT5jOvFCmAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:09] "GET /socket.io/?EIO=4&transport=polling&t=NYGQCZ1 HTTP/1.1" 200 -
    CxB55ktHT5jOvFCmAAAA: Received packet MESSAGE data 0/voice,
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Received packet MESSAGE data 0/voice,
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 0/voice,{"sid":"aOmhwVQaFaYr50KBAAAB"}
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:09] "GET /socket.io/?EIO=4&transport=polling&t=NYGQCZM&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 0/voice,{"sid":"aOmhwVQaFaYr50KBAAAB"}
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:09] "POST /socket.io/?EIO=4&transport=polling&t=NYGQCZL&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\datasets\JamesEarlJones\audio.mp3..."}]
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:14] "GET /socket.io/?EIO=4&transport=polling&t=NYGQCZn&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\datasets\JamesEarlJones\audio.mp3..."}]
    INFO:voice:Coverting data\datasets\JamesEarlJones\audio.mp3...
    ffmpeg version 4.3.2-2021-02-27-essentials_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
    built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
    configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
    libavutil 56. 51.100 / 56. 51.100
    libavcodec 58. 91.100 / 58. 91.100
    libavformat 58. 45.100 / 58. 45.100
    libavdevice 58. 10.100 / 58. 10.100
    libavfilter 7. 85.100 / 7. 85.100
    libswscale 5. 7.100 / 5. 7.100
    libswresample 3. 7.100 / 3. 7.100
    libpostproc 55. 7.100 / 55. 7.100
    Input #0, mp3, from 'data\datasets\JamesEarlJones\audio.mp3':
    Duration: 02:12:54.56, start: 0.025057, bitrate: 96 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, mono, fltp, 96 kb/s
    Metadata:
    encoder : LAME3.100
    Stream mapping:
    Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
    Press [q] to stop, [?] for help
    Output #0, wav, to 'data\datasets\JamesEarlJones\audio-converted.wav':
    Metadata:
    ISFT : Lavf58.45.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
    encoder : Lavc58.91.100 pcm_s16le
    size= 343434kB time=02:12:54.52 bitrate= 352.8kbits/s speed=1.3e+03x
    video:0kB audio:343434kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000022%
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}]
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:20] "GET /socket.io/?EIO=4&transport=polling&t=NYGQDj5&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}]
    INFO:voice:Loading script from data\datasets\JamesEarlJones\text.txt...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
    INFO:voice:Searching text for matching fragments...
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:20] "emitting event "logs" to all [/voice]
    GET /socket.io/?EIO=4&transport=polling&t=NYGQFEN&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
    INFO:voice:Changing sample rate...
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:20] "GET /socket.io/?EIO=4&transport=polling&t=NYGQFF7&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:21] "GET /socket.io/?EIO=4&transport=polling&t=NYGQFFF&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:voice:Fetching segments...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:23] "GET /socket.io/?EIO=4&transport=polling&t=NYGQFaD&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
    INFO:voice:Matching segments...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
    INFO:voice:Generating segments...
    emitting event "progress" to all [/voice]
    INFO:socketio.server:emitting event "progress" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"2725"}]
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"2725"}]
    INFO:voice:Progress - 1/2725
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:23] "GET /socket.io/?EIO=4&transport=polling&t=NYGQG4i&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    Using cache found in C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master
    torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to [Announcement] Improving I/O for correct and consistent experience pytorch/audio#903 for the detail.
    error logging recieved invalid response
    emitting event "error" to all [/voice]
    INFO:socketio.server:emitting event "error" to all [/voice]
    CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["error",{"type":"RuntimeError","text":"[enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory","stacktrace":"Traceback (most recent call last):\n File "application\utils.py", line 63, in background_task\n File "application\utils.py", line 39, in create_dataset\n File "dataset\clip_generator.py", line 60, in clip_generator\n File "dataset\forced_alignment\align.py", line 69, in process_segments\n File "dataset\transcribe.py", line 16, in transcribe\n File "torch\hub.py", line 370, in load\n File "torch\hub.py", line 399, in _load_local\n File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt\n model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit,\n File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model\n model = torch.jit.load(model_path, map_location=device)\n File "torch\jit\_serialization.py", line 161, in load\nRuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory\n"}]
    INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:34] "GET /socket.io/?EIO=4&transport=polling&t=NYGQG4q&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["error",{"type":"RuntimeError","text":"[enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory","stacktrace":"Traceback (most recent call last):\n File "application\utils.py", line 63, in background_task\n File "application\utils.py", line 39, in create_dataset\n File "dataset\clip_generator.py", line 60, in clip_generator\n File "dataset\forced_alignment\align.py", line 69, in process_segments\n File "dataset\transcribe.py", line 16, in transcribe\n File "torch\hub.py", line 370, in load\n File "torch\hub.py", line 399, in _load_local\n File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt\n model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit,\n File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model\n model = torch.jit.load(model_path, map_location=device)\n File "torch\jit\_serialization.py", line 161, in load\nRuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory\n"}]
    [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
    CxB55ktHT5jOvFCmAAAA: Sending packet PING data None
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet PING data None
    CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
    CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
    INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey if you look at the folder which contains your .exe, is there a file called latest_silero_models.yml ?

@GregoryBetsey
Copy link
Author

Yes, it does. I ran it through edge this time and got farther than before but got a new error this time: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous.

Error.txt

@BenAAndrew
Copy link
Collaborator

I'll investigate this and get back to you.

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?

@GregoryBetsey
Copy link
Author

GregoryBetsey commented Apr 4, 2021

@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?

Thanks. I tried the latest built today any got stuck on the "generating segments..." section. I will attach the log file.
Error 4.4.2021.txt

P.S. I am using the same files I sent to you via google drive.

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey Thank you for the error log. The issue seems to be with the torchaudio library not being able to change the audio sample rate. I will investigate now

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey I've removed the code throwing the bug and replaced it with a different library. If you get a minute could you try release 0.5.1?

@GregoryBetsey
Copy link
Author

I gave it a go and got a different error this time: The expanded size of the tensor (12800) must match the existing size (0) at non-singleton dimension 0. Target sizes: [12800]. Tensor sizes: [0]
Error.txt
Image

@BenAAndrew BenAAndrew changed the title Voice-Cloning-App.exe not working on Windows Dataset Generation audio processing Apr 8, 2021
@BenAAndrew BenAAndrew changed the title Dataset Generation audio processing Error: The expanded size of the tensor must match the existing size at non-singleton dimension 0. Apr 8, 2021
@BenAAndrew
Copy link
Collaborator

@GregoryBetsey did this error occur with the data source you sent to me?

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey I haven't been able to replicate the issue but I have identified what may have caused it and tried to fix in 0.5.3.

@GregoryBetsey
Copy link
Author

@GregoryBetsey did this error occur with the data source you sent to me?

Yes, I am using the same files I sent you earlier. I will try your latest release and test the results.

@GregoryBetsey
Copy link
Author

Update: I tried the latest release. I got a different error: data\datasets\JamesEarlJones\wavs\1470_2520.wav wav file is empty
Image
Text.txt

@BenAAndrew
Copy link
Collaborator

BenAAndrew commented Apr 9, 2021

@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?

@GregoryBetsey
Copy link
Author

@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?

I am using the same audio and text transcript that I sent to you using google drive. The audio file is fine. If you need the link again, I can send it to you.

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey I've produced the dataset and that clip (1470_2520.wav) is playable and can be transcribed. Just to double-check did you try playing the original audio or the 1470_2520.wav clip?

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release

@GregoryBetsey
Copy link
Author

@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release

Thanks for the update. I haven't got past the error.

@BenAAndrew BenAAndrew changed the title Error: The expanded size of the tensor must match the existing size at non-singleton dimension 0. Transcription error: wav file is empty Apr 14, 2021
@BenAAndrew
Copy link
Collaborator

Hi @GregoryBetsey, thank you for your patience. This should be handled in 0.6. Please let me know how you get on

@GregoryBetsey
Copy link
Author

your

Thanks for working on this. I don't know if this is progress, but it actually started generating segments this time except I got a message saying the audio can't be transcribed. [Again, I using the files from my Google Drive].

Voice Cloning - Profile 1 - Microsoft​ Edge 4_15_2021 1_15_39 PM
Log.txt

@BenAAndrew
Copy link
Collaborator

BenAAndrew commented Apr 15, 2021

@GregoryBetsey That's interesting. It looks like there's an issue with FFmpeg cutting the clips. Could you do the following:

  1. Check that the audio files listed in the logs exist
  2. Check if there is a folder called 'ffmpeg' in the same directory as the application. If there is, delete it.
  3. Try installing FFmpeg manually. i.e. following https://www.youtube.com/watch?v=hD9bQE4R6eA

The issue must be to do with FFmpeg, so if those files exist then it is not working correctly

@GregoryBetsey
Copy link
Author

GregoryBetsey commented Apr 17, 2021

Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.

01
02
Error.txt

@GregoryBetsey
Copy link
Author

Yes, it works.
Untitled

@BenAAndrew
Copy link
Collaborator

Hmm, this is interesting. You see the app just runs the conversion command and then the trim command which is exactly what you've done here. Have you tried running the app again since reinstalling ffmpeg?

@GregoryBetsey
Copy link
Author

GregoryBetsey commented Apr 20, 2021

Yes, I

Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.

01
02

Yes I I did that here. The app generates clips. It says it cannot transcribe at the end and then it deletes all the generated waves.
Error Log.txt

@BenAAndrew
Copy link
Collaborator

BenAAndrew commented Apr 20, 2021

@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to data\datasets\ dataset_name\wavs where dataset_name is the name of the dataset. Then could you check if that is playable?

@GregoryBetsey
Copy link
Author

@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to data\datasets\ dataset_name\wavs where dataset_name is the name of the dataset. Then could you check if that is playable?

The wavs file can be opened, but since the generated length is 00:00:00 there isn't any audio sound. [see attachment]
Example.zip

@BenAAndrew
Copy link
Collaborator

Ok so FFmpeg isn't working when cutting the audio as all of these clips should be at least 1 second long. I don't understand why the command would outside of the app but not in it as both should be using the same FFmpeg and command. I will try and resolve this week

@RayDAnt3D
Copy link

also experiencing this issue exactly as described in #27 (nothing but "Could not transcribe data\datasets..." messages and zero-length wave files despite having a tested working ffmpeg install) when attempting to build either my own or the provided demo datasets.

Something I noticed that does seem off is that regardless of whether my source audio file is an mp3 or a wave, the application logfile always says that it is converting from an mp3. eg:

Coverting data\datasets\TestVoice\audio.mp3...
Loading script from data\datasets\TestVoice\text.txt...
Searching text for matching fragments...
Changing sample rate...
Fetching segments...
Matching segments...
Generating segments...
Could not transcribe data\datasets\TestVoice\wavs\1650_2730.wav
Could not transcribe data\datasets\TestVoice\wavs\5850_7680.wav
Could not transcribe data\datasets\TestVoice\wavs\7680_9330.wav
Could not transcribe data\datasets\TestVoice\wavs\9450_10530.wav
Could not transcribe data\datasets\TestVoice\wavs\10560_12720.wav

The audio file being converted above was a wave file named "this_is_a_wave_file.wav". Having said that, the "audio-converted.wav" and "audio-converted-16000.wav" files generated in the dataset's working directly isare playable and seemingly in the right format according to VLC Player:

Stream 0 ("audio-converted.wav")
Codec: PCM S16 LE (s16l)
Type: Audio
Channels: Mono
Sample rate: 22050 Hz
Bits per sample: 16

Stream 0 ("audio-converted-16000.wav")
Codec: PCM S16 LE (s16l)
Type: Audio
Channels: Mono
Sample rate: 16000 Hz
Bits per sample: 16

It's just the separated out segments that are inoperable (nothing but 78 bytes of metadata in each one.)

@BenAAndrew
Copy link
Collaborator

@RayDAnt3D thank you for this info. This seems to be an issue for several people so it is my number one priority. I'm hoping to have it fixed by Sunday 🤞

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.

To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing

In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called test-final.wav that is playable & 3 seconds long.

Thank you for your patience

@arthur465
Copy link

@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.

To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing

In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called test-final.wav that is playable & 3 seconds long.

Thank you for your patience

Hey I downloaded it and ran the script. I can confirm it produced a 3 second playable clip called "test-clip.wav"

@RayDAnt3D
Copy link

Also downloaded/ran the test script and audio clip and got the following tested working audio files generated:

test-clean.wav
test-clean-16000.wav
test-clip.wav

No "test-final.wav" though.

@BenAAndrew
Copy link
Collaborator

@RayDAnt3D @arthur465 Sorry I meant test-clip.wav. So it sounds like the FFmpeg commands are working for all of you. I'm going to try and create a release today which has improved error logging on the clip building process so we can find out where it is failing in the app

@BenAAndrew
Copy link
Collaborator

@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens

@arthur465
Copy link

@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens

Ok here's the error I get

INFO:voice:Progress - 391/416
INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cm&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 -
ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 56. 73.100 / 56. 73.100
libavcodec 58.136.101 / 58.136.101
libavformat 58. 78.100 / 58. 78.100
libavdevice 58. 14.100 / 58. 14.100
libavfilter 7.111.100 / 7.111.100
libswscale 5. 10.100 / 5. 10.100
libswresample 3. 10.100 / 3. 10.100
libpostproc 55. 10.100 / 55. 10.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'data\datasets\Arthur 2\audio-converted.wav':
Metadata:
encoder : Lavf58.78.100
Duration: 00:17:31.99, bitrate: 352 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'data\datasets\Arthur 2\wavs\994140_995250.wav':
Metadata:
ISFT : Lavf58.78.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc58.136.101 pcm_s16le
size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}]
INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cs&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 -
INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}]
INFO:voice:Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav
emitting event "progress" to all [/voice]
INFO:socketio.server:emitting event "progress" to all [/voice]
7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}]
INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}]

@RayDAnt3D
Copy link

Here's what I get for the first sample cutting attempt (and every other thereafter) using the Ayaode dataset assets:

INFO:voice:Generating segments...
ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 56. 73.100 / 56. 73.100
libavcodec 58.136.101 / 58.136.101
libavformat 58. 78.100 / 58. 78.100
libavdevice 58. 14.100 / 58. 14.100
libavfilter 7.111.100 / 7.111.100
libswscale 5. 10.100 / 5. 10.100
libswresample 3. 10.100 / 3. 10.100
libpostproc 55. 10.100 / 55. 10.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'data\datasets\Ayoade\audio-converted.wav':
Metadata:
artist : Richard Ayoade
comment : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe
copyright : ©2019 Richard Ayoade (P)2019 Audible, Ltd
date : 2019
genre : Audiobook
title : 1 - Ayoade on Top
album : Ayoade on Top
track : 1/1
encoder : Lavf58.78.100
Duration: 04:39:25.09, bitrate: 352 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'data\datasets\Ayoade\wavs\60_1680.wav':
Metadata:
IART : Richard Ayoade
ICMT : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe
ICOP : ©2019 Richard Ayoade (P)2019 Audible, Ltd
ICRD : 2019
IGNR : Audiobook
INAM : 1 - Ayoade on Top
IPRD : Ayoade on Top
IPRT : 1/1
ISFT : Lavf58.78.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc58.136.101 pcm_s16le
size= 1kB time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)
Using cache found in C:\Users\gbase/.cache\torch\hub\snakers4_silero-models_master
NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None
INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "GET /socket.io/?EIO=4&transport=polling&t=Na0B7hC&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 -
NpmL9qWiQt7YXzLnAAAA: Received packet PONG data
INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "POST /socket.io/?EIO=4&transport=polling&t=Na0B84R&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 -
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}]
INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}]
INFO:voice:Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav
emitting event "progress" to all [/voice]
INFO:socketio.server:emitting event "progress" to all [/voice]
NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}]
INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}]
INFO:voice:Progress - 1/5021

For what it's worth, here also is my app log at first startup:

[12568] WARNING: file already exists but should not: C:\Users\gbase\AppData\Local\Temp_MEI125682\torch_C.cp38-win_amd64.pyd
Server initialized for threading.
Server initialized for threading.
torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
INFO:matplotlib.font_manager:Generating new fontManager, this may take some time...
[nltk_data] Downloading package wordnet to
[nltk_data] C:\Users\gbase\AppData\Local\Temp_MEI125682\nltk_data
[nltk_data] ...
[nltk_data] Package wordnet is already up-to-date!
WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

  • Serving Flask app "main" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
    INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:17:35] "GET / HTTP/1.1" 200 -

@RayDAnt3D
Copy link

Did some source code snooping and noticed that running:

start_timestamp = datetime.fromtimestamp(start / 1000).strftime("%H:%M:%S.%f")

As appears in dataset\audio_processing.py inside the cut_audio() routine with start=60 (as the first Ayoade clip would be) on the Python commandline like so:

from subprocess import call
from pathlib import Path
from datetime import datetime
from pydub import AudioSegment
import os
datetime.fromtimestamp(60 / 1000).strftime("%H:%M:%S.%f")

results in the following output:

'19:00:00.060000'

Pretty sure that additional '19:00:00.000000' shouldn't be there. The root of the problem may just be a date/time localization mismatch.

@BenAAndrew
Copy link
Collaborator

@RayDAnt3D great find. What time localization do you use?

@ironpanther
Copy link

I tried the latest version (0.63) just to see if anything was different-----the initial files it creates from my sample mp3------audio.mp3, audio-converted.wav, and audio-converted-16000.wav are all fine, same as before. The many individual clip-wavs inside the folder, are all "empty" files, with length 00:00:00, size 78 bytes. I believe that's the same as before (I stopped it before it auto-deleted them this time, so I could check them)

Error when trying to process are similar to the post above:

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'data\datasets\Kate\audio-converted.wav':
Metadata:
encoder : Lavf58.76.100
Duration: 04:18:04.84, bitrate: 352 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'data\datasets\Kate\wavs\1436520_1438290.wav':
Metadata:
ISFT : Lavf58.76.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc58.134.100 pcm_s16le
size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}]
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 12:19:54] "GET /socket.io/?EIO=4&transport=polling&t=Na4vHg3&sid=MRg0ipT8vkRYLkMJAAAA HTTP/1.1" 200 -
INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}]
INFO:voice:Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav
emitting event "progress" to all [/voice]
INFO:socketio.server:emitting event "progress" to all [/voice]
MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}]
INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}]
INFO:voice:Progress - 356/5134

@RayDAnt3D
Copy link

@BenAAndrew US EST (technically currently EDT.)

@BenAAndrew
Copy link
Collaborator

BenAAndrew commented Apr 24, 2021

@RayDAnt3D @ironpanther @arthur465 @GregoryBetsey I've rewritten the timestamp function to fix this. Added in https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.7. Please test if you get a chance

@arthur465
Copy link

@RayDAnt3D @ironpanther @arthur465 @GregoryBetsey I've rewritten the timestamp function to fix this. Will be added in release 0.7. Please test if you get a chance

It looks like it's working!

Capture
checkpoint

@KoolenDasheppi
Copy link

Release 0.7 fixed it for me (I've been keeping an eye on this repo and this issue so I can know when it got fixed). I'm also excited about the HiFi-GAN addition. Thanks for developing this by the way, you're doing an awesome job!

@RayDAnt3D
Copy link

It's fixed for me! Currently doing my first training run now.

@ironpanther
Copy link

ironpanther commented Apr 24, 2021

The .wav generation/clips seems to work now, and it gets much further, but it's been "stuck in a loop" for a while now----I have:

Coverting data\datasets\Kate\audio.mp3...
Loading script from data\datasets\Kate\text.txt...
Searching text for matching fragments...
Changing sample rate...
Fetching segments...
Matching segments...
Generating segments...

And the cmd window just keeps repeating:
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:54:33] "POST /socket.io/?EIO=4&transport=polling&t=Na5gP-I&sid=Fo5fjmXqSKaqAfxdAAAC HTTP/1.1" 200 -
Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:54:58] "GET /socket.io/?EIO=4&transport=polling&t=Na5gP-K&sid=Fo5fjmXqSKaqAfxdAAAC HTTP/1.1" 200 -
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:54:58] "POST /socket.io/?EIO=4&transport=polling&t=Na5gWB0&sid=Fo5fjmXqSKaqAfxdAAAC HTTP/1.1" 200 -
Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:55:23] "GET /socket.io/?EIO=4&transport=polling&t=Na5gWB1&sid=Fo5fjmXqSKaqAfxdAAAC HTTP/1.1" 200 -
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:engineio.server:Fo5fjmXqSKaqAfxdAAAC: Received packet PONG data
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:55:23] "POST /socket.io/?EIO=4&transport=polling&t=Na5gcM6&sid=Fo5fjmXqSKaqAfxdAAAC HTTP/1.1" 200 -
Fo5fjmXqSKaqAfxdAAAC: Sending packet PING data None
INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 15:55:48] "GET /socket.io/?

::edit:: saw something new while typing---
INFO:engineio.server:M-50GuX-ODIZG_s_AAAE: Received packet CLOSE data
INFO:engineio.server:M-50GuX-ODIZG_s_AAAE: Client is gone, closing socket

Could a future version, have an option on which browser to open with? I think that being able to choose chrome etc, may work better, as my PC has 4 different browsers, and all behave differently when running scripts.

@ironpanther
Copy link

ironpanther commented Apr 24, 2021

Update----I "let the browser window it opened automatically" just sit there, and opened a new browser window but in chrome, and that worked. So I think either "let user choose browser, or default to chrome browser instead of OS browser" is a needed option.

::edit:: Would also suggest that in train.py, "ITERS_PER_CHECKPOINT = 1000" be lowered----currently, that results in only saving approximately once per hour, on my GTX 1080. I could easily lose internet connection etc before it saves again, and lose many iterations. Or if I wish to stop for a while, and do something else with my GPU, after 45 mins of training--that would all be lost, as it wouldn't have saved since then. "Manual save" and/or more frequent checkpoints would also allow more experimenting with determining optimum batch size etc.

@BenAAndrew
Copy link
Collaborator

BenAAndrew commented Apr 25, 2021

@ironpanther thanks for the feedback. Trying to handle the non-default browser is a bit complex and also the browser shouldn't affect performance. Additionally, the app does not need an internet connection to run (despite running in the browser).

As for the "stuck in a loop" I don't think it is, those messages are just logging for the app and not the process itself. It may take a while to finish processing even after the progress bar is done.

Changing the checkpoint frequency is a good idea and I will add in the future

@BenAAndrew
Copy link
Collaborator

Closing as everyone seems happy this particular issue is now fixed. If it has not been fixed please reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants