VAD in example doesn't seem to work #70

p-e-w · 2022-05-09T06:11:26Z

When running full_example.py, the speech recognition itself works fine, but the VAD iterator completely fails to detect voice activity, distinguishing only between "sound" and "silence".

My understanding is that audio_iterator should yield a block of audio data if the input contains voice, and None otherwise. If so, this doesn't work on my system. As long as there is any sound being recorded by the microphone at all, the iterator yields audio blocks. I have tested this with snapping my fingers, scratching on the desk, even the background noise of a ceiling fan running – they all cause the iterator to produce blocks. Only virtually total silence produces None.

As a result, the end of phrase isn't detected unless the room is very, very quiet. I have done multiple test recordings from the same microphone setup and found them to be clear and without additional noise. Yet as soon as there is any input above a certain threshold, even if it is obviously non-human in origin, it is classified as voice. A modern VAD should be able to do much better.

Is this actually working for you? What could be the reason for the VAD to fail so completely?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD in example doesn't seem to work #70

VAD in example doesn't seem to work #70

p-e-w commented May 9, 2022

VAD in example doesn't seem to work #70

VAD in example doesn't seem to work #70

Comments

p-e-w commented May 9, 2022