Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use huggingface model not support mp4 format #1018

Closed
NightMoonIsland opened this issue Dec 7, 2024 · 3 comments
Closed

use huggingface model not support mp4 format #1018

NightMoonIsland opened this issue Dec 7, 2024 · 3 comments
Labels
needs info Additional information is needed

Comments

@NightMoonIsland
Copy link

NightMoonIsland commented Dec 7, 2024

image
model use zongxiao/whisper-small-zh-CN
occur error
`
Traceback (most recent call last):
File "multiprocessing\process.py", line 314, in _bootstrap
File "multiprocessing\process.py", line 108, in run
File "buzz\transcriber\whisper_file_transcriber.py", line 98, in transcribe_whisper
File "buzz\transcriber\whisper_file_transcriber.py", line 123, in transcribe_hugging_face
File "buzz\transformers_whisper.py", line 194, in transcribe
File "transformers\pipelines\automatic_speech_recognition.py", line 283, in call
return super().call(inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\base.py", line 1294, in call
return next(
^^^^^
File "transformers\pipelines\pt_utils.py", line 124, in next
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\pt_utils.py", line 269, in next
processed = self.infer(next(self.iterator), **self.params)
^^^^^^^^^^^^^^^^^^^
File "torch\utils\data\dataloader.py", line 631, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "torch\utils\data\dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch\utils\data_utils\fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\pipelines\pt_utils.py", line 186, in next
processed = next(self.subiterator)
^^^^^^^^^^^^^^^^^^^^^^
File "buzz\transformers_whisper.py", line 53, in preprocess
File "transformers\pipelines\audio_utils.py", line 41, in ffmpeg_read
raise ValueError(
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to download the audio file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "main.py", line 4, in
File "buzz\buzz.py", line 38, in main
File "PyInstaller\hooks\rthooks\pyi_rth_multiprocessing.py", line 50, in _freeze_support
File "multiprocessing\spawn.py", line 122, in spawn_main
File "multiprocessing\spawn.py", line 135, in _main
File "multiprocessing\process.py", line 329, in _bootstrap
AttributeError: 'NoneType' object has no attribute 'write'
`

but mp3 format is ok

@raivisdejus
Copy link
Collaborator

@NightMoonIsland In general .mp4 files work fine with the Huggingface model. I have tested this with several files that have been downloaded from the Youtube. I also tested your model with some Chinese video I downloaded from the Youtube, it also worked.

Any chance you can share this file or some other file that does not work?

@raivisdejus raivisdejus added the needs info Additional information is needed label Dec 10, 2024
@NightMoonIsland
Copy link
Author

NightMoonIsland commented Dec 15, 2024

34.Day03-04-._.mp4

buzz version v1.2.0

@NightMoonIsland In general .mp4 files work fine with the Huggingface model. I have tested this with several files that have been downloaded from the Youtube. I also tested your model with some Chinese video I downloaded from the Youtube, it also worked.

Any chance you can share this file or some other file that does not work?

@raivisdejus
Copy link
Collaborator

@NightMoonIsland In the very latest development version an ability to extract speech was added. When this setting is enabled it will improve recognition accuracy by removing background noises. Part of this pre-processing will try to correct any issues with video file formats. Your test file was transcribed fine with this setting enabled.

To get the latest development version go here https://github.com/chidiwilliams/buzz/actions/workflows/ci.yml?query=branch%3Amain
Log into the Github and click on the latest build, then scroll down to the bottom and download the installation file.
More info on development versions is here https://chidiwilliams.github.io/buzz/docs/faq#9-where-can-i-get-latest-development-version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info Additional information is needed
Projects
None yet
Development

No branches or pull requests

2 participants