Vad plugin #161

aaronchantrill · 2019-01-29T03:31:54Z

Description

VAD Plugin

naomi/application.py

Attached input device parameters (input_samplerate, input_samplewidth, input_channels, input_chunksize) to the input_device object so they are all available to the mic and vad objects as the input_device object is passed around.

For consistency, also moved the output_chunksize and output_padding parameters to the output_device object.

Added initialization of Voice Activity Detection object and passed it to the initialization of the mic object.

naomi/mic.py

Removed the main logic around listen and active listen so I could move it into the VADPlugin class. This should make it easier to implement the "Passive Listen for Commands" project, because once
the passive listener identifies a keyword in the audio returned, we can just pass the same block of audio to the active listener for transcription. Simplified a lot of stuff. The original authors
were running two threads constantly scanning the audio input for keywords, and it appears that the only reason was to speed up keyword detection. The new VAD method works much differently, but
I'm interested in hearing whether anyone notices a difference.

naomi/plugin.py

Added skeleton for VADPlugin class

naomi/pluginstore.py

Added the VADPlugin as a new plugin class

naomi/testutils.py

Added a test audio_device class for my VAD tests

plugins/vad

Added two new plugins, snr_vad and webrtc_vad.

plugins/vad/snr_vad

This is based on the way voice activity currently works with naomi, which is basically just waiting for audio levels to go above a certain threshold and below a certain threshold.

I have always had trouble with this method, as different sound cards and different microphones register sound quite differently, so choosing a proper threshold level is often problematic.

I am now saying that anything over the mean plus one and a half times the standard deviation should be considered audio to pay attention to. I also reset every 100 samples by cutting all the
counts in half, thus ensuring that we aren't counting to ridiculous numbers over time and allowing noise levels to adjust fairly quickly to changes in the environment.

plugins/webrtc_vad

Uses the webrtcvad module, which can be installed via pip. This module requires that all chunks be 10, 20, or 30 ms. The default chunk size for Naomi is 64 ms, so you have to adjust the value of

audio:
  input_chunksize:

to either 160 (10ms), 320 (20ms), or 480 (30ms) in profile.yml, assuming a rate of 16000 samples/sec.

Related Issue

VAD plugin #144
[Feature-Request] - Passive Listening for commands #48
Automate STT training #103

Motivation and Context

This allows us to quickly and easily write and test Voice Activity Detection plugins without having to modify the main structure of Naomi. It also simplifies some handling of audio which should make some other projects simpler. And all of this should allow us to improve overall speech capture for building catalogs of data samples for training the STT engines.

How Has This Been Tested?

I have tested both plugins on both my x86 Raspbian Stretch VirtualBox machine and my Raspberry Pi 3B+.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project. (In fact, I fixed a bunch of flake8 complaints)
My change requires a change to the documentation. (Need to explain the new plugin type)
I have updated the documentation accordingly. (Added VAD Plugin documentation #4)
I have added tests to cover my changes.
All new and existing tests passed. (or at least didn't get worse)

Replaced imp module with importlib module due to deprecation warning. Simplified import of configparser so it is no longer trying to handle Python2 imports. Re-wrote the parse_plugin_class function to use importlib rather than imp. Changed plugin_classes initialization to fix a pep8 complaint: W504 line break after binary operator

naomi/application.py -------------------- Attached input device parameters (input_samplerate, input_samplewidth, input_channels, input_chunksize) to the input_device object so they are all available to the mic and vad objects as the input_device object is passed around. For consistency, also moved the output_chunksize and output_padding parameters to the output_device object. Added initialization of Voice Activity Detection object and passed it to the initialization of the mic object. naomi/mic.py ------------ Removed the main logic around listen and active listen so I could move it into the VADPlugin class. This should make it easier to implement the "Passive Listen for Commands" project, because once the passive listener identifies a keyword in the audio returned, we can just pass the same block of audio to the active listener for transcription. Simplified a lot of stuff. The original authors were running two threads constantly scanning the audio input for keywords, and it appears that the only reason was to speed up keyword detection. The new VAD method works much differently, but I'm interested in hearing whether anyone notices a difference. naomi/plugin.py --------------- Added skeleton for VADPlugin class naomi/pluginstore.py -------------------- Added the VADPlugin as a new plugin class plugins/vad ----------- Added two new plugins, snr_vad and webrtc_vad. plugins/vad/snr_vad ------------------- This is based on the way voice activity currently works with naomi, which is basically just waiting for audio levels to go above a certain threshold and below a certain threshold. I have always had trouble with this method, as different sound cards and different microphones register sound quite differently, so choosing a proper threshold level is often problematic. I am now saying that anything over the mean plus one and a half times the standard deviation should be considered audio to pay attention to. I also reset every 100 samples by cutting all the counts in half, thus ensuring that we aren't counting to rediculous numbers over time and allowing noise levels to adjust fairly quickly to changes in the environment. plugins/webrtc_vad ------------------ Uses the webrtcvad module, which can be installed via pip. This module requires that all chunks be 10, 20, or 30 ms. The default chunk size for Naomi is 64 ms, so you have to adjust the value of audio: input_chunksize: to either 160 (10ms), 320 (20ms), or 480 (30ms) in profile.yml.

Modified the two VAD plugins so they can receive configuration values passed through from the profile.yml For SNR_VAD, the following options can be set: snr_vad: timeout: 1 minimum_capture: 0.25 threshold: 20 For webrtc_vad, the following options can be set: webrtc_vad: timeout: 1 minimum_capture: 0.25 aggressiveness: 1

Oops, I changed the name of the SNR VAD plugin from snr to snr_vad. Left the default at snr, though, so if you have no vad selected in profile, then you get an error that plugin "vad" does not exist. Fixed, set default to "snr_vad" matching documentation and reality.

Added a couple of unit tests for the VAD plugins. Also changed the strucure of the __init__ methods to accommodate testing. The tests work by sending an "empty" sound which should result in _voice_detected() returning False, and then a small clip from the naomi/data/audio/naomi.wav file which should result in _voice_detected() returning True.

Removed some unused modules and variables and cleaned up the formatting some.

Added the ability to change the logging level while running VAD tests. If logging is set to INFO or DEBUG, then this will print a timeline of where voice data is detected by the plugin. Also removed extra assert statements that were bugging codacy.

After completing the VAD Plugin Testing classes for both plugins, realized that almost all the code was duplicated between the two plugins. Combined almost all the code into the Test_VADPlugin class located in naomi/testutils.py, which should make it easier to maintain and simplify the development of new plugins. I had also added some code which caused the test routine to output a map of where audio was and was not detected if the test is run at the info or debug logging levels (the default logging level for unittests is warn). This had to be enabled by uncommenting a line on each test if I wanted to compare the results. With this change, uncommenting or commenting one line in testutils.py affects the behavior of both tests.

Codacy complained that the overridden setUp methods used a different set of parameters. I have set all to just def setUp(self): and also added a callback from the testutils.Test_VADPlugin class

aaronchantrill added 2 commits January 27, 2019 08:38

aaronchantrill self-assigned this Jan 29, 2019

aaronchantrill added Status: Review Needed Type: Enhancement Priority: High labels Jan 29, 2019

aaronchantrill added this to the 3.0.m1 milestone Jan 29, 2019

aaronchantrill added 2 commits January 29, 2019 00:36

AustinCasteel modified the milestones: 3.0.m1, 3.0.m2 Feb 5, 2019

aaronchantrill added 3 commits February 7, 2019 00:04

Fixed Codacy and Flake8 issues

5c61bd0

Removed some unused modules and variables and cleaned up the formatting some.

Selective Logging

6f6d9cf

Added the ability to change the logging level while running VAD tests. If logging is set to INFO or DEBUG, then this will print a timeline of where voice data is detected by the plugin. Also removed extra assert statements that were bugging codacy.

aaronchantrill added Status: In Progress and removed Status: Review Needed labels Feb 7, 2019

aaronchantrill added Status: Review Needed Status: In Progress and removed Status: In Progress Status: Review Needed labels Feb 7, 2019

Fixed codacy issue

3fc74b2

Codacy complained that the overridden setUp methods used a different set of parameters. I have set all to just def setUp(self): and also added a callback from the testutils.Test_VADPlugin class

aaronchantrill added Status: Review Needed and removed Status: In Progress labels Feb 7, 2019

AustinCasteel merged commit 0182873 into NaomiProject:naomi-dev Feb 24, 2019

AustinCasteel added Status: Completed and removed Status: Review Needed labels Feb 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vad plugin #161

Vad plugin #161

aaronchantrill commented Jan 29, 2019 •

edited

Loading

Vad plugin #161

Vad plugin #161

Conversation

aaronchantrill commented Jan 29, 2019 • edited Loading

Description

naomi/application.py

naomi/mic.py

naomi/plugin.py

naomi/pluginstore.py

naomi/testutils.py

plugins/vad

plugins/vad/snr_vad

plugins/webrtc_vad

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

aaronchantrill commented Jan 29, 2019 •

edited

Loading