-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vad plugin #161
Merged
AustinCasteel
merged 9 commits into
NaomiProject:naomi-dev
from
aaronchantrill:vad_plugin
Feb 24, 2019
Merged
Vad plugin #161
AustinCasteel
merged 9 commits into
NaomiProject:naomi-dev
from
aaronchantrill:vad_plugin
Feb 24, 2019
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replaced imp module with importlib module due to deprecation warning. Simplified import of configparser so it is no longer trying to handle Python2 imports. Re-wrote the parse_plugin_class function to use importlib rather than imp. Changed plugin_classes initialization to fix a pep8 complaint: W504 line break after binary operator
naomi/application.py -------------------- Attached input device parameters (input_samplerate, input_samplewidth, input_channels, input_chunksize) to the input_device object so they are all available to the mic and vad objects as the input_device object is passed around. For consistency, also moved the output_chunksize and output_padding parameters to the output_device object. Added initialization of Voice Activity Detection object and passed it to the initialization of the mic object. naomi/mic.py ------------ Removed the main logic around listen and active listen so I could move it into the VADPlugin class. This should make it easier to implement the "Passive Listen for Commands" project, because once the passive listener identifies a keyword in the audio returned, we can just pass the same block of audio to the active listener for transcription. Simplified a lot of stuff. The original authors were running two threads constantly scanning the audio input for keywords, and it appears that the only reason was to speed up keyword detection. The new VAD method works much differently, but I'm interested in hearing whether anyone notices a difference. naomi/plugin.py --------------- Added skeleton for VADPlugin class naomi/pluginstore.py -------------------- Added the VADPlugin as a new plugin class plugins/vad ----------- Added two new plugins, snr_vad and webrtc_vad. plugins/vad/snr_vad ------------------- This is based on the way voice activity currently works with naomi, which is basically just waiting for audio levels to go above a certain threshold and below a certain threshold. I have always had trouble with this method, as different sound cards and different microphones register sound quite differently, so choosing a proper threshold level is often problematic. I am now saying that anything over the mean plus one and a half times the standard deviation should be considered audio to pay attention to. I also reset every 100 samples by cutting all the counts in half, thus ensuring that we aren't counting to rediculous numbers over time and allowing noise levels to adjust fairly quickly to changes in the environment. plugins/webrtc_vad ------------------ Uses the webrtcvad module, which can be installed via pip. This module requires that all chunks be 10, 20, or 30 ms. The default chunk size for Naomi is 64 ms, so you have to adjust the value of audio: input_chunksize: to either 160 (10ms), 320 (20ms), or 480 (30ms) in profile.yml.
Modified the two VAD plugins so they can receive configuration values passed through from the profile.yml For SNR_VAD, the following options can be set: snr_vad: timeout: 1 minimum_capture: 0.25 threshold: 20 For webrtc_vad, the following options can be set: webrtc_vad: timeout: 1 minimum_capture: 0.25 aggressiveness: 1
Oops, I changed the name of the SNR VAD plugin from snr to snr_vad. Left the default at snr, though, so if you have no vad selected in profile, then you get an error that plugin "vad" does not exist. Fixed, set default to "snr_vad" matching documentation and reality.
Added a couple of unit tests for the VAD plugins. Also changed the strucure of the __init__ methods to accommodate testing. The tests work by sending an "empty" sound which should result in _voice_detected() returning False, and then a small clip from the naomi/data/audio/naomi.wav file which should result in _voice_detected() returning True.
Removed some unused modules and variables and cleaned up the formatting some.
Added the ability to change the logging level while running VAD tests. If logging is set to INFO or DEBUG, then this will print a timeline of where voice data is detected by the plugin. Also removed extra assert statements that were bugging codacy.
After completing the VAD Plugin Testing classes for both plugins, realized that almost all the code was duplicated between the two plugins. Combined almost all the code into the Test_VADPlugin class located in naomi/testutils.py, which should make it easier to maintain and simplify the development of new plugins. I had also added some code which caused the test routine to output a map of where audio was and was not detected if the test is run at the info or debug logging levels (the default logging level for unittests is warn). This had to be enabled by uncommenting a line on each test if I wanted to compare the results. With this change, uncommenting or commenting one line in testutils.py affects the behavior of both tests.
aaronchantrill
added
Status: Review Needed
Status: In Progress
and removed
Status: In Progress
Status: Review Needed
labels
Feb 7, 2019
Codacy complained that the overridden setUp methods used a different set of parameters. I have set all to just def setUp(self): and also added a callback from the testutils.Test_VADPlugin class
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
VAD Plugin
naomi/application.py
Attached input device parameters (input_samplerate, input_samplewidth, input_channels, input_chunksize) to the input_device object so they are all available to the mic and vad objects as the input_device object is passed around.
For consistency, also moved the output_chunksize and output_padding parameters to the output_device object.
Added initialization of Voice Activity Detection object and passed it to the initialization of the mic object.
naomi/mic.py
Removed the main logic around listen and active listen so I could move it into the VADPlugin class. This should make it easier to implement the "Passive Listen for Commands" project, because once
the passive listener identifies a keyword in the audio returned, we can just pass the same block of audio to the active listener for transcription. Simplified a lot of stuff. The original authors
were running two threads constantly scanning the audio input for keywords, and it appears that the only reason was to speed up keyword detection. The new VAD method works much differently, but
I'm interested in hearing whether anyone notices a difference.
naomi/plugin.py
Added skeleton for VADPlugin class
naomi/pluginstore.py
Added the VADPlugin as a new plugin class
naomi/testutils.py
Added a test audio_device class for my VAD tests
plugins/vad
Added two new plugins, snr_vad and webrtc_vad.
plugins/vad/snr_vad
This is based on the way voice activity currently works with naomi, which is basically just waiting for audio levels to go above a certain threshold and below a certain threshold.
I have always had trouble with this method, as different sound cards and different microphones register sound quite differently, so choosing a proper threshold level is often problematic.
I am now saying that anything over the mean plus one and a half times the standard deviation should be considered audio to pay attention to. I also reset every 100 samples by cutting all the
counts in half, thus ensuring that we aren't counting to ridiculous numbers over time and allowing noise levels to adjust fairly quickly to changes in the environment.
plugins/webrtc_vad
Uses the webrtcvad module, which can be installed via pip. This module requires that all chunks be 10, 20, or 30 ms. The default chunk size for Naomi is 64 ms, so you have to adjust the value of
to either 160 (10ms), 320 (20ms), or 480 (30ms) in profile.yml, assuming a rate of 16000 samples/sec.
Related Issue
VAD plugin #144
[Feature-Request] - Passive Listening for commands #48
Automate STT training #103
Motivation and Context
This allows us to quickly and easily write and test Voice Activity Detection plugins without having to modify the main structure of Naomi. It also simplifies some handling of audio which should make some other projects simpler. And all of this should allow us to improve overall speech capture for building catalogs of data samples for training the STT engines.
How Has This Been Tested?
I have tested both plugins on both my x86 Raspbian Stretch VirtualBox machine and my Raspberry Pi 3B+.
Types of changes
Checklist: