-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancements to ASR API #5
Merged
Merged
Changes from all commits
Commits
Show all changes
58 commits
Select commit
Hold shift + click to select a range
4904cd6
feat: Add sink arg and make stream optional
ar13pit 7a4fd8c
feat: Add base class for sinks
ar13pit ab9c6a5
feat: Make WaveFileSink inherit from AudioSinkBase
ar13pit b1abddb
feat: Add method to open output stream
ar13pit 2e47aba
feat: Add a common base class definition
ar13pit 526c5c5
feat: Add source and sink args
ar13pit d7fb652
feat: Add method to link elements
ar13pit d6f4259
feat: Add definition of open method
ar13pit 9cd803b
feat: Change base class to AsrPipelineElementBase
ar13pit 4e967fa
feat: Add args timeout and chunk
ar13pit 9f23331
feat: Update subclassing around AsrPipelineElementBase
ar13pit c88c9d5
feat: Make stream instance attribute and start in start
ar13pit e03e2ea
fix: Remove AudioSourceBase
ar13pit a432281
feat: Rename module to asr
ar13pit 7f82b59
feat: Add timeout as an arg
ar13pit a930738
feat: Initialize timeout arg of super class
ar13pit af32118
feat: Add abstract method register_callback
ar13pit 124f6d1
feat: Move to asr module and subclass AsrPipelineElementBase
ar13pit 04e726e
feat: Import Asr class from asr
ar13pit 9c87654
feat: Futurize module
ar13pit f396957
feat: Make source and sink private
ar13pit f1d1105
feat: Make open, next_chunk and close abstract methods
ar13pit 3fa685f
fix: Do not override method definition
ar13pit 681a885
refactor: Use default implementation of stop
ar13pit a9fd722
feat: Add return statement back
ar13pit f17ac8f
feat: Add preliminary design of the pipeline class
ar13pit b54083f
test: Comment out calls to sink
ar13pit ab3f994
feat: Add AsrPipeline to module exports
ar13pit 7edf517
fix: Initialize source and sink to None and set using link
ar13pit 5bf8bbc
feat: Only set non empty source or sink and not replace them
ar13pit acc2154
feat: Set properties of source and sink args to point to current object
ar13pit 6c80542
feat: Add optional source arg
ar13pit 87073ea
feat: Add optional sink arg
ar13pit e3cc3e4
feat: Add back optional source and sink args
ar13pit 5d29e3b
feat: Add support to add multiple elements
ar13pit 74e56dd
test: Add structure of io test on AsrPipeline
ar13pit 989c63c
feat: Replace start_state with stop_state and complete stop API
ar13pit 8648311
feat: Add register callback and callback execution
ar13pit 598d1e1
feat: Add a iteration counter
ar13pit 9d73079
test: Complete ASR pipeline io test script
ar13pit bef5f94
fix: Fix broken pipeline element check logic
ar13pit 1453a44
feat: Make register_callback optional in elements
ar13pit 8eec552
refactor: Remove commented sink code
ar13pit 5f44874
feat: Add a finalize Event
ar13pit a4f7281
feat: Add internal finalize state
ar13pit 9b179cb
fix: Fix setting of finalize state
ar13pit ef59ca3
fix: Remove default value of arg in abstractmethod
ar13pit a27bd2f
feat: Use internal function to catch StopIteration exception
ar13pit ffea3e5
feat: Add chunksize and rate to init, add open, close defs
ar13pit ab73e8b
feat: Convert while loop into single iteration function
ar13pit 104d554
test: Restructure test using AsrPipeline
ar13pit e0c31d7
fix: Store model and make model, decoder private
ar13pit 9c6bd39
feat: Set stop state on StopIteration
ar13pit 00761dc
feat: Log info about finalizing decoding
ar13pit 17a0594
test: Remove commented code
ar13pit 12c8374
feat: Minor version bump
ar13pit ddca995
fix: Use ABC instead of ABCMeta for raising exceptions in python2/3
ar13pit e42e8a1
feat: Raise NotImplementedError as default
ar13pit File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
""" | ||
Yapykaldi ASR: Classes and functions for ASR pipeline | ||
""" | ||
|
||
__all__ = [ | ||
# From .asr | ||
"Asr", | ||
|
||
# From .pipeline | ||
"AsrPipeline", | ||
|
||
# From .sources | ||
"PyAudioMicrophoneSource", "WaveFileSource", | ||
|
||
# From .sinks | ||
"WaveFileSink" | ||
] | ||
|
||
from .asr import Asr | ||
from .pipeline import AsrPipeline | ||
from .sources import PyAudioMicrophoneSource, WaveFileSource | ||
from .sinks import WaveFileSink |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
"""Base classes for the ASR pipeline""" | ||
from __future__ import print_function, division, absolute_import, unicode_literals | ||
from builtins import * | ||
from abc import ABC, abstractmethod | ||
from threading import Event | ||
import pyaudio | ||
|
||
|
||
class AsrPipelineElementBase(ABC): | ||
"""Class AsrPipelineElementBase is the base class for all Asr Pipeline elements. | ||
It requires three abstract methods to be implemented: | ||
1. open | ||
2. close | ||
3. next_chunk | ||
The right order of setting up an element is: | ||
1. element = AsrPipelineElementBase() | ||
2. element.open() # To open the file, connect the mic etc. | ||
3. element.start() # Start streaming audio data | ||
4. element.next_chunk() # Use the audio data | ||
5. element.stop() # stop getting audio data | ||
6. element.close() # close the file | ||
Elements need to support open and close at least once but must support | ||
start, next_chunk, stop several times | ||
""" | ||
# pylint: disable=too-many-instance-attributes | ||
|
||
def __init__(self, source=None, sink=None, rate=16000, chunksize=1024, fmt=pyaudio.paInt16, channels=1, timeout=1): | ||
self._source = None | ||
self._sink = None | ||
self.rate = rate | ||
self.chunksize = chunksize | ||
self.format = fmt | ||
self.channels = channels | ||
self.timeout = timeout | ||
self._finalize = Event() | ||
|
||
self.link(source=source, sink=sink) | ||
|
||
@abstractmethod | ||
def open(self): | ||
"""Abstract method to open the stream of the element. Opening may or may not start the stream.""" | ||
|
||
@abstractmethod | ||
def next_chunk(self, chunk): | ||
"""Abstract method to process a chunk generated in the source element or received from the source element""" | ||
|
||
@abstractmethod | ||
def close(self): | ||
"""Abstract method to close the stream of the element. In this method all resources of the stream should be | ||
freed.""" | ||
|
||
def start(self): | ||
"""Optional method to start the stream of the element""" | ||
|
||
def stop(self): | ||
"""Optional method to stop the stream of the element""" | ||
|
||
def register_callback(self, callback): | ||
"""Register a callback to the element outside the pipeline""" | ||
raise NotImplementedError() | ||
|
||
def link(self, source=None, sink=None): | ||
"""Link a source or a sink to the element | ||
This method does not override preset source or sink of the element. | ||
:param source: (default None) A source object | ||
:param sink: (default None) A sink object | ||
""" | ||
if (not self._source) and source: | ||
self._source = source | ||
source.link(sink=self) | ||
|
||
if (not self._sink) and sink: | ||
self._sink = sink | ||
sink.link(source=self) | ||
|
||
def finalize(self): | ||
"""Set the finalize flag of the element""" | ||
self._finalize.set() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
""" | ||
Yapykaldi ASR: Class definition for ASR component. It connects to a source and an optional sink | ||
""" | ||
from __future__ import (print_function, division, absolute_import, unicode_literals) | ||
from builtins import * | ||
import struct | ||
import numpy as np | ||
from ._base import AsrPipelineElementBase | ||
from ..logger import logger | ||
from ..nnet3 import KaldiNNet3OnlineDecoder, KaldiNNet3OnlineModel | ||
from ..gmm import KaldiGmmOnlineDecoder, KaldiGmmOnlineModel | ||
from ..utils import volume_indicator | ||
|
||
|
||
ONLINE_MODELS = {'nnet3': KaldiNNet3OnlineModel, 'gmm': KaldiGmmOnlineModel} | ||
ONLINE_DECODERS = {'nnet3': KaldiNNet3OnlineDecoder, 'gmm': KaldiGmmOnlineDecoder} | ||
|
||
|
||
class Asr(AsrPipelineElementBase): | ||
"""API for ASR""" | ||
# pylint: disable=too-many-instance-attributes, useless-object-inheritance | ||
|
||
def __init__(self, model_dir, model_type, rate=16000, chunksize=1024, debug=False, source=None, sink=None): | ||
""" | ||
:param model_dir: Path to model directory | ||
:param model_type: Type of ASR model 'nnet3' or 'hmm' | ||
:param rate: (default 16000) sampling frequency of audio data. This must be the same as the audio source | ||
:param chunksize: (default 1024) size of audio data buffer. This must be the same as the audio source | ||
:param debug: (default False) Flag to set logger to log audio chunk volume and partially decoded string and | ||
likelihood | ||
:param source: (default None) Element to be connected as source when constructing an AsrPipeline | ||
:type source: AsrPipelineElementBase | ||
:param sink: (default None) Element to be connected as sink when constructing an AsrPipeline | ||
:type sink: AsrPipelineElementBase | ||
""" | ||
super().__init__(chunksize=chunksize, rate=rate, source=source, sink=sink) | ||
self.model_dir = model_dir | ||
self.model_type = model_type | ||
|
||
self._model = None | ||
self._decoder = None | ||
self._decoded_string = None | ||
self._likelihood = None | ||
|
||
self._string_partially_recognized_callbacks = [] | ||
self._string_fully_recognized_callbacks = [] | ||
|
||
self._debug = debug | ||
|
||
def open(self): | ||
# No definition for this method while inheriting abstract class AsrPipelineElementBase | ||
pass | ||
|
||
def close(self): | ||
# No definition for this method while inheriting abstract class AsrPipelineElementBase | ||
pass | ||
|
||
def next_chunk(self, chunk): | ||
"""Method to start the recognition process on audio stream added to process queue""" | ||
try: | ||
data = np.array(struct.unpack_from('<%dh' % self.chunksize, chunk), dtype=np.float32) | ||
except Exception as e: # pylint: disable=invalid-name, broad-except | ||
logger.error("Other exception happened: %s", e) | ||
raise | ||
else: | ||
if self._decoder.decode(self.rate, data, self._finalize.is_set()): | ||
if self._finalize.is_set(): | ||
logger.info("Finalized decoding with latest data chunk") | ||
|
||
self._decoded_string, self._likelihood = self._decoder.get_decoded_string() | ||
if self._debug: | ||
chunk_volume_level = volume_indicator(data) | ||
logger.info("Chunk volume level: %s", chunk_volume_level) | ||
logger.info("Partially decoded (%s): %s", self._likelihood, self._decoded_string) | ||
|
||
for callback in self._string_partially_recognized_callbacks: | ||
callback(self._decoded_string) | ||
|
||
return chunk | ||
|
||
raise RuntimeError("Decoding failed") | ||
|
||
def stop(self): | ||
"""Stop ASR process""" | ||
logger.info("Stop ASR") | ||
|
||
logger.info("Decoding of input stream is complete") | ||
logger.info("Final result (%s): %s", self._likelihood, self._decoded_string) | ||
|
||
for callback in self._string_fully_recognized_callbacks: | ||
callback(self._decoded_string) | ||
|
||
def start(self): | ||
"""Begin ASR process""" | ||
logger.info("Starting speech recognition") | ||
# Reset internal states at the start of a new call | ||
|
||
self._finalize.clear() | ||
|
||
logger.info("Trying to initialize %s model from %s", self.model_type, self.model_dir) | ||
self._model = ONLINE_MODELS[self.model_type](self.model_dir) | ||
logger.info("Successfully initialized %s model from %s", self.model_type, self.model_dir) | ||
|
||
logger.info("Trying to initialize %s model decoder", self.model_type) | ||
self._decoder = ONLINE_DECODERS[self.model_type](self._model) | ||
logger.info("Successfully initialized %s model decoder", self.model_type) | ||
|
||
self._decoded_string = "" | ||
self._likelihood = None | ||
|
||
def register_callback(self, callback, partial=False): | ||
""" | ||
Register a callback to receive the decoded string both partial and complete. | ||
:param callback: a function taking a single string as it's parameter | ||
:param partial: (default False) flag to set callback for partial recognitions | ||
:return: None | ||
""" | ||
if partial: | ||
self._string_partially_recognized_callbacks += [callback] | ||
else: | ||
self._string_fully_recognized_callbacks += [callback] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be either implemented or raise
NotImplementedError
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have this implemented then by default it is
pass
. But do you suggest to add aNotImplementedError
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
pass
in a valid behavior, could there be a class that works by usingpass
? If not, then useNotImplementedError
. Otherwise,pass
is a valid default implementation.