Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of ASR api #1

Merged
merged 84 commits into from
Apr 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
42864af
feat: Convert test_live into Asr class
ar13pit Mar 7, 2020
546be1f
fix: Fix variable name queue
ar13pit Mar 9, 2020
6110bb1
docs: Add docstring to Asr class
ar13pit Mar 9, 2020
a0c3058
feat: Use a cleaner interface for makedirs
ar13pit Mar 9, 2020
63fabff
feat: Add templating for wav_out_fmt
ar13pit Mar 9, 2020
03a4462
test: Add test for ASR api
ar13pit Mar 9, 2020
e4a027d
feat: Add python2 support for os.makedirs(..., exist_ok=True)
ar13pit Mar 9, 2020
a35e5c4
Use an Event rather than a plain bool to cimmunicate between differen…
LoyVanBeek Apr 7, 2020
a51f6b2
Report decoded strings back via a callback
LoyVanBeek Apr 7, 2020
223d8b8
refactor: Remove IDE files
ar13pit Apr 7, 2020
c4580dc
feat: Ignore pycharm files
ar13pit Apr 7, 2020
9a0532b
Move signal handling to test script instead of reusable class
LoyVanBeek Apr 13, 2020
a3aaf42
style: Remove space in arg
ar13pit Apr 13, 2020
40d608a
feat: Use Event from multiprocessing instead of threading
ar13pit Apr 13, 2020
e3a0f02
Start a class-based input API for ASR to untie the Asr class from a p…
LoyVanBeek Apr 13, 2020
6ce592a
Create WaveFileStreamer class to read audio from a file
LoyVanBeek Apr 13, 2020
6905011
Optionally switch between live and file audio
LoyVanBeek Apr 13, 2020
5de31d4
Also stop stream when stopping ASR
LoyVanBeek Apr 13, 2020
ee3039b
Set up stream only in start method
LoyVanBeek Apr 13, 2020
97e65f5
Remove seemingly unnecessary multiprcessing code, synchronous code se…
LoyVanBeek Apr 13, 2020
2eee124
Compose saving of recorded audio into microphone interface
LoyVanBeek Apr 13, 2020
b0a6a6c
Cleanup code a bit
LoyVanBeek Apr 13, 2020
dbfead1
Move initialisation of recognizer to constructor so it's ready when w…
LoyVanBeek Apr 13, 2020
6718bce
Test Asr with correct usage of WaveFileStreeamer
LoyVanBeek Apr 13, 2020
4ca988a
Extend callbacks to partial and full result strings
LoyVanBeek Apr 13, 2020
0d3c0f7
Optionally write audio to a file
LoyVanBeek Apr 13, 2020
d07a4e5
Move classes for audio input and output to different submodule
LoyVanBeek Apr 14, 2020
e055d05
Extend interface of AudioSourceBase to better support re-opening micr…
LoyVanBeek Apr 16, 2020
73649d0
fixup! Move classes for audio input and output to different submodule
LoyVanBeek Apr 16, 2020
9a44790
Trying to make the PyAudioMicrophoneSource async again
LoyVanBeek Apr 16, 2020
90c9a84
Add some debug sound level bars to see what the audio is doing roughly
LoyVanBeek Apr 18, 2020
6c0a1e6
Add more logging and exception handling
LoyVanBeek Apr 18, 2020
3788d7e
Use specifically named logger rather than generic one
LoyVanBeek Apr 18, 2020
2b5ba3f
Move audio stream opening/closing to the listen-thread
LoyVanBeek Apr 19, 2020
c531bc7
Only set stop Event when not already set
LoyVanBeek Apr 19, 2020
607fa76
fix: Remove unnecessay module init files
ar13pit Apr 21, 2020
19ea6f1
test: Futurize script
ar13pit Apr 21, 2020
2c6856d
test: Fix import orders
ar13pit Apr 21, 2020
3bf6e00
test: Fix linting errors
ar13pit Apr 21, 2020
f05cbc4
docs: Add docstrings to the functions
ar13pit Apr 21, 2020
cf50c12
style: Formatting fixes
ar13pit Apr 21, 2020
d4ffe30
feat: Move util function to independent module
ar13pit Apr 21, 2020
9a4b271
fix: Use Event class from threading module
ar13pit Apr 21, 2020
392f9ef
feat: Add support for GMM models
ar13pit Apr 21, 2020
15dce1b
fix: Fix linter issues with log strings
ar13pit Apr 21, 2020
b8a366b
docs: Add docstrings to interface
ar13pit Apr 21, 2020
0d898ff
feat: Merge callback register functions
ar13pit Apr 21, 2020
9bc3a9b
refactor: Remove unused import
ar13pit Apr 21, 2020
9e4c179
fix: Cleanup logging strings
ar13pit Apr 21, 2020
39e85a0
fix: Use timeout from method arg
ar13pit Apr 21, 2020
2ec921b
fix: Move attribute declaration to init
ar13pit Apr 21, 2020
00d7e54
fix: Read file only if object is None
ar13pit Apr 21, 2020
0c767fc
refactor: Remove unnecessary else
ar13pit Apr 21, 2020
834d8eb
feat: Reset internal variables upon clean file close
ar13pit Apr 21, 2020
478cc21
refactor: Remove unnecessary log
ar13pit Apr 21, 2020
20e0b53
test: Update callback function names and remove prints
ar13pit Apr 21, 2020
cd689c2
refactor: Combine import statements
ar13pit Apr 21, 2020
0a9cd3c
docs: Add module docstrings and fix typo in comment
ar13pit Apr 21, 2020
09e1f84
fix: Check framerate of opened audio file
ar13pit Apr 21, 2020
0dcda71
feat: Version bump
ar13pit Apr 21, 2020
1946461
fix: Add back the stop function in class WaveFileSource
ar13pit Apr 22, 2020
7420e8c
feat: Add logging to WaveFileSource
ar13pit Apr 22, 2020
b5b1808
test: Remove unnecessary print statements
ar13pit Apr 22, 2020
b1bc213
test: Remove repeated asr and streamer stop commands
ar13pit Apr 22, 2020
6ba4067
test: Remove unnecessary event handler
ar13pit Apr 22, 2020
a65b3cd
feat: Rename module to io for larger inclusion
ar13pit Apr 22, 2020
ba3405c
docs: Add module docstring
ar13pit Apr 22, 2020
7ed0a23
refactor: Update io module imports
ar13pit Apr 22, 2020
80f9e3c
fix: Fix broken sink api in python3
ar13pit Apr 22, 2020
4424a40
test: Add test script for io module
ar13pit Apr 22, 2020
dc73d35
test: Fix reading of audio file
ar13pit Apr 23, 2020
b5087a8
feat: Use python3 primitives and futurize module
ar13pit Apr 23, 2020
caf15b2
docs: Add class constructor docstrings
ar13pit Apr 23, 2020
93ef4d2
feat: Move logging config to test script
ar13pit Apr 23, 2020
ff20822
feat: Move logging to separate module
ar13pit Apr 23, 2020
aa81189
style: Fix pylint object inheritance warning
ar13pit Apr 23, 2020
b891489
feat: Add args to print decoding results to log
ar13pit Apr 23, 2020
b6aa81b
style: Fix linting warnings
ar13pit Apr 23, 2020
7f47b9a
test: Set log args in Asr constructor
ar13pit Apr 23, 2020
ff519e1
feat: Add volume indicator function
ar13pit Apr 23, 2020
c13a254
refactor: Merge chunk unpacking and numpy array creation
ar13pit Apr 23, 2020
f5585e0
feat: Replace log flags with 1 debug flag and add volume indicator
ar13pit Apr 23, 2020
7c5c1c6
test: Update debug args
ar13pit Apr 23, 2020
ee25a70
docs: Add missing docstring about arg stream
ar13pit Apr 23, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
version.py
*.egg-info/
*.so

.idea
ar13pit marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import pkgconfig


VERSION = "0.1.0"
VERSION = "0.2.0"
PACKAGE = "yapykaldi"
PACKAGE_DIR = os.path.join('src', 'python')

Expand Down
120 changes: 120 additions & 0 deletions src/python/yapykaldi/asr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
"""
Yapykaldi ASR: Class definition for ASR component. It connects to a source and an optional sink
"""
from __future__ import (print_function, division, absolute_import, unicode_literals)
ar13pit marked this conversation as resolved.
Show resolved Hide resolved
from builtins import *
LoyVanBeek marked this conversation as resolved.
Show resolved Hide resolved
import struct
from threading import Event
import numpy as np
from .logger import logger
from .nnet3 import KaldiNNet3OnlineDecoder, KaldiNNet3OnlineModel
from .gmm import KaldiGmmOnlineDecoder, KaldiGmmOnlineModel
from .io import AudioSourceBase
from .utils import volume_indicator


ONLINE_MODELS = {'nnet3': KaldiNNet3OnlineModel, 'gmm': KaldiGmmOnlineModel}
ONLINE_DECODERS = {'nnet3': KaldiNNet3OnlineDecoder, 'gmm': KaldiGmmOnlineDecoder}


class Asr(object):
"""API for ASR"""
# pylint: disable=too-many-instance-attributes, useless-object-inheritance

def __init__(self, model_dir, model_type, stream, timeout=2, debug=False):
"""
:param model_dir: Path to model directory
:param model_type: Type of ASR model 'nnet3' or 'hmm'
:param stream: Audio source object
:param timeout: (default 2) Time to wait for a new data buffer before stopping recognition due to unavailability
of data
:param debug: (default False) Flag to set logger to log audio chunk volume and partially decoded string and
likelihood
"""
self.model_dir = model_dir
self.model_type = model_type

self.stream = stream # type: AudioSourceBase

logger.info("Trying to initialize %s model from %s", self.model_type, self.model_dir)
self.model = ONLINE_MODELS[self.model_type](self.model_dir)
logger.info("Successfully initialized %s model from %s", self.model_type, self.model_dir)

self.timeout = timeout

self._finalize = Event()

self._string_partially_recognized_callbacks = []
self._string_fully_recognized_callbacks = []

self._debug = debug

def recognize(self):
"""Method to start the recognition process on audio stream added to process queue"""

if self._finalize.is_set():
raise Exception("Asr object not initialized for recognition")

logger.info("Trying to initialize %s model decoder", self.model_type)
decoder = ONLINE_DECODERS[self.model_type](self.model)
logger.info("Successfully initialized %s model decoder", self.model_type)

decoded_string = ""
while not self._finalize.is_set():
try:
chunk = self.stream.get_next_chunk(self.timeout)
data = np.array(struct.unpack_from('<%dh' % self.stream.chunksize, chunk), dtype=np.float32)
except StopIteration as e: # pylint: disable=invalid-name
logger.info("Stream reached it end")
logger.error(e)
self.stop()
except Exception as e: # pylint: disable=invalid-name, broad-except
logger.error("Other exception happened: %s", e)
break
else:
if decoder.decode(self.stream.rate, data, self._finalize.is_set()):
decoded_string, likelihood = decoder.get_decoded_string()

if self._debug:
chunk_volume_level = volume_indicator(data)
logger.info("Chunk volume level: %s", chunk_volume_level)
logger.info("Partially decoded (%s): %s", likelihood, decoded_string)

for callback in self._string_partially_recognized_callbacks:
callback(decoded_string)
else:
raise RuntimeError("Decoding failed")

logger.info("Decoding of input stream is complete")
logger.info("Final result (%s): %s", likelihood, decoded_string)

for callback in self._string_fully_recognized_callbacks:
callback(decoded_string)

def stop(self):
"""Stop ASR process"""
logger.info("Stop ASR")
self._finalize.set()
self.stream.stop()

def start(self):
"""Begin ASR process"""
logger.info("Starting speech recognition")
# Reset internal states at the start of a new call

self._finalize.clear()

self.stream.start()

def register_callback(self, callback, partial=False):
"""
Register a callback to receive the decoded string both partial and complete.

:param callback: a function taking a single string as it's parameter
:param partial: (default False) flag to set callback for partial recognitions
:return: None
"""
if partial:
self._string_partially_recognized_callbacks += [callback]
else:
self._string_fully_recognized_callbacks += [callback]
14 changes: 14 additions & 0 deletions src/python/yapykaldi/io/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
"""
Yapykaldi I/O: Classes and functions for I/O operations with all the wrappers
"""

__all__ = [
# From .sources
"AudioSourceBase", "PyAudioMicrophoneSource", "WaveFileSource",

# From .sinks
"WaveFileSink"
]

from .sources import AudioSourceBase, PyAudioMicrophoneSource, WaveFileSource
from .sinks import WaveFileSink
46 changes: 46 additions & 0 deletions src/python/yapykaldi/io/sinks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""Audio sinks supported by Yapykaldi"""
import wave
import pyaudio


class WaveFileSink(object):
"""WaveFileSink class"""

def __init__(self, wavpath, fmt=pyaudio.paInt16, channels=1, rate=16000, chunk=1024):
"""

:param wavpath: location where to save audio to
:param fmt: (default pyaudio.paInt16) Data type of the audio stream
:param channels: (default 1) Number of channels of the audio stream
:param rate: (default 16000) Sampling frequency of the audio stream
:param chunk: (default 1024) Size of the audio stream buffer
"""
self._pyaudio = pyaudio.PyAudio()
self.wavpath = wavpath
self.format = fmt
self.channels = channels
self.rate = rate
self.chunk = chunk

self.frames = []

def add_chunk(self, frames):
"""Add frame chunk to the WaveFileSink object

:param frames: audio frames to be added to the sink object
"""
# Only append method works for both python 2 and 3
# List concatenation does not work as it converts byte strings to int
self.frames.append(frames)

def write_frames(self, frames=None):
"""Write audio frames into a file

:param frames: (default None) Frames to write to a file. This bypasses the frames stored in the sink object.
"""
wav_out = wave.open(self.wavpath, 'wb')
wav_out.setnchannels(self.channels)
wav_out.setsampwidth(self._pyaudio.get_sample_size(self.format))
wav_out.setframerate(self.rate)
wav_out.writeframes(b''.join(frames if frames else self.frames))
wav_out.close()
187 changes: 187 additions & 0 deletions src/python/yapykaldi/io/sources.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
"""Audio sources supported by Yapykaldi"""
from __future__ import print_function, division, absolute_import, unicode_literals
from builtins import *
import math
import wave
from threading import Event, Thread
from queue import Empty, Queue
import pyaudio

from .sinks import WaveFileSink
from ..logger import logger

try:
from typing import Optional
except ImportError:
pass


class AudioSourceBase(object):
"""The AudioSource
It requires some setup before we can get audio bytes from it and
requires some teardown afterwards

The right order is:
1. source = AudioSourceBase()
2. source.open() # to open the file, connect the mic etc.
3. source.start() # actually start getting audio data
4. source.get_next_chunk() # use the audio data
5. source.stop() # stop getting audio data
6. source.close() # close the file

Some sources only support opening them once but
they should all support going through start, get.., stop
several times

"""
# pylint: disable=useless-object-inheritance

def __init__(self, rate=16000, chunksize=1024):
self.rate = rate
self.chunksize = chunksize

def open(self):
raise NotImplementedError()

def start(self):
raise NotImplementedError()

def stop(self):
raise NotImplementedError()

def close(self):
raise NotImplementedError()

def get_next_chunk(self, timeout):
raise NotImplementedError()


class PyAudioMicrophoneSource(AudioSourceBase):
def __init__(self, fmt=pyaudio.paInt16, channels=1, rate=16000, chunksize=1024, saver=None):
"""
:param fmt: (default pyaudio.paInt16) format of the audio data
:param channels: (default 1) number of channels in audio data
:param rate: (default 16000) sampling frequency of audio data
:param chunksize: (default 1024) size of audio data buffer
:param saver: (default None) audio sink object
"""
super().__init__(rate=rate, chunksize=chunksize)

self._pyaudio = pyaudio.PyAudio()
self.format = fmt
self.channels = channels

self.stream = None # type: Optional[pyaudio.PyAudio]

self.saver = saver # type: WaveFileSink

self._queue = Queue()
self._worker = None # type: Optional[Thread]

self._stop = Event()

def open(self):
# This function is needed to maintain generality in api of stream sources
pass

def start(self):
# Start async process to put audio chunks in a queue
self._stop.clear()
self._worker = Thread(target=self._listen, args=(self._stop,))
logger.info("Starting audio stream in a separate thread")
self._worker.start()

def _listen(self, stop_event):
stream = self._pyaudio.open(format=self.format,
channels=self.channels,
rate=self.rate,
input=True,
frames_per_buffer=self.chunksize)

while not stop_event.wait(0):
chunk = stream.read(self.chunksize)
# logger.debug("{}\t+1 chunks in the queue".format(self._queue.qsize()))
self._queue.put(chunk)

stream.stop_stream()
stream.close()
logger.info("Stopped streaming audio")

def get_next_chunk(self, timeout=1):
try:
# logger.debug("{}\t-1 chunks in the queue".format(self._queue.qsize()))
chunk = self._queue.get(block=True, timeout=timeout)
if self.saver:
self.saver.add_chunk(chunk)
return chunk
except Empty:
raise StopIteration()

def stop(self):
if not self._stop.is_set():
self._stop.set()

logger.info("Waiting for audio stream to stop")
self._worker.join()
logger.info("Exited audio stream thread")
else:
logger.info("No running audio stream to stop")

def close(self):
self._pyaudio.terminate()

if self.saver:
self.saver.write_frames()


class WaveFileSource(AudioSourceBase):
def __init__(self, filename, rate=16000, chunksize=1024):
"""
:param filename: path to the wave file
:type filename: str
:param rate: (default 16000) sampling frequency of audio data
:param chunksize: (default 1024) size of audio data buffer
"""
super().__init__(rate=rate, chunksize=chunksize)
self.filename = filename
self.wavf = None
self.total_num_frames = None
self.total_chunks = None
self.read_chunks = None

def open(self):
if not self.wavf:
self.wavf = wave.open(self.filename, 'rb')
assert self.wavf.getnchannels() == 1
assert self.wavf.getsampwidth() == 2
assert self.wavf.getnframes() > 0
assert self.wavf.getframerate() == self.rate
logger.info("Stream opened from %s", self.filename)
else:
logger.error("Stream already open from %s. Call the close() method first", self.filename)

def start(self):
self.total_num_frames = self.wavf.getnframes()
self.total_chunks = math.floor(self.total_num_frames / self.chunksize)
self.read_chunks = 0

def get_next_chunk(self, timeout):
if self.read_chunks < self.total_chunks:
frames = self.wavf.readframes(self.chunksize)
self.read_chunks += 1
return frames

raise StopIteration()

def stop(self):
# This function is needed to maintain generality in api of stream sources
pass

def close(self):
self.wavf.close()
logger.info("Stream closed from %s", self.filename)

self.wavf = None
self.total_num_frames = None
self.total_chunks = None
self.read_chunks = None
4 changes: 4 additions & 0 deletions src/python/yapykaldi/logger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import logging

LOGGER_NAME = "yapykaldi"
logger = logging.getLogger(LOGGER_NAME)
Loading