use huggingface model not support mp4 format #1018

NightMoonIsland · 2024-12-07T08:19:22Z

model use zongxiao/whisper-small-zh-CN
occur error
`
Traceback (most recent call last):
File "multiprocessing\process.py", line 314, in _bootstrap
File "multiprocessing\process.py", line 108, in run
File "buzz\transcriber\whisper_file_transcriber.py", line 98, in transcribe_whisper
File "buzz\transcriber\whisper_file_transcriber.py", line 123, in transcribe_hugging_face
File "buzz\transformers_whisper.py", line 194, in transcribe
File "transformers\pipelines\automatic_speech_recognition.py", line 283, in call
return super().call(inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\base.py", line 1294, in call
return next(
^^^^^
File "transformers\pipelines\pt_utils.py", line 124, in next
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\pt_utils.py", line 269, in next
processed = self.infer(next(self.iterator), **self.params)
^^^^^^^^^^^^^^^^^^^
File "torch\utils\data\dataloader.py", line 631, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "torch\utils\data\dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch\utils\data_utils\fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\pt_utils.py", line 186, in next
processed = next(self.subiterator)
^^^^^^^^^^^^^^^^^^^^^^
File "buzz\transformers_whisper.py", line 53, in preprocess
File "transformers\pipelines\audio_utils.py", line 41, in ffmpeg_read
raise ValueError(
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to download the audio file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "main.py", line 4, in
File "buzz\buzz.py", line 38, in main
File "PyInstaller\hooks\rthooks\pyi_rth_multiprocessing.py", line 50, in _freeze_support
File "multiprocessing\spawn.py", line 122, in spawn_main
File "multiprocessing\spawn.py", line 135, in _main
File "multiprocessing\process.py", line 329, in _bootstrap
AttributeError: 'NoneType' object has no attribute 'write'
`

but mp3 format is ok

raivisdejus · 2024-12-10T17:46:23Z

@NightMoonIsland In general .mp4 files work fine with the Huggingface model. I have tested this with several files that have been downloaded from the Youtube. I also tested your model with some Chinese video I downloaded from the Youtube, it also worked.

Any chance you can share this file or some other file that does not work?

NightMoonIsland · 2024-12-15T18:21:24Z

34.Day03-04-._.mp4

buzz version v1.2.0

@NightMoonIsland In general .mp4 files work fine with the Huggingface model. I have tested this with several files that have been downloaded from the Youtube. I also tested your model with some Chinese video I downloaded from the Youtube, it also worked.

Any chance you can share this file or some other file that does not work?

raivisdejus · 2025-01-02T11:47:44Z

@NightMoonIsland In the very latest development version an ability to extract speech was added. When this setting is enabled it will improve recognition accuracy by removing background noises. Part of this pre-processing will try to correct any issues with video file formats. Your test file was transcribed fine with this setting enabled.

To get the latest development version go here https://github.com/chidiwilliams/buzz/actions/workflows/ci.yml?query=branch%3Amain
Log into the Github and click on the latest build, then scroll down to the bottom and download the installation file.
More info on development versions is here https://chidiwilliams.github.io/buzz/docs/faq#9-where-can-i-get-latest-development-version

raivisdejus added the needs info Additional information is needed label Dec 10, 2024

raivisdejus closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use huggingface model not support mp4 format #1018

use huggingface model not support mp4 format #1018

NightMoonIsland commented Dec 7, 2024 •

edited

Loading

raivisdejus commented Dec 10, 2024

NightMoonIsland commented Dec 15, 2024 •

edited

Loading

raivisdejus commented Jan 2, 2025

use huggingface model not support mp4 format #1018

use huggingface model not support mp4 format #1018

Comments

NightMoonIsland commented Dec 7, 2024 • edited Loading

raivisdejus commented Dec 10, 2024

NightMoonIsland commented Dec 15, 2024 • edited Loading

raivisdejus commented Jan 2, 2025

NightMoonIsland commented Dec 7, 2024 •

edited

Loading

NightMoonIsland commented Dec 15, 2024 •

edited

Loading