Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.3.0 #187

Merged
merged 16 commits into from
Dec 18, 2023
Merged

4.3.0 #187

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,60 @@
# Changelog

## [4.2.0](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.0) (2023-10-27)
## [4.2.1a7](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a7) (2023-12-13)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a6...4.2.0)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a6...4.2.1a7)

**Fixed bugs:**
**Merged pull requests:**

- \[BUG\] Docker `start_listening` resource missing [\#170](https://github.com/NeonGeckoCom/neon_speech/issues/170)
- Update neon-utils dependency to stable release [\#186](https://github.com/NeonGeckoCom/neon_speech/pull/186) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a6) (2023-10-26)
## [4.2.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a6) (2023-11-29)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a5...4.1.1a6)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a5...4.2.1a6)

**Merged pull requests:**

- OVOS Dinkum Listener Backwards Compat [\#178](https://github.com/NeonGeckoCom/neon_speech/pull/178) ([NeonDaniel](https://github.com/NeonDaniel))
- Override ovos.language.stt handler for server/API usage [\#185](https://github.com/NeonGeckoCom/neon_speech/pull/185) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a5) (2023-10-26)
## [4.2.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a5) (2023-11-22)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a4...4.1.1a5)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a4...4.2.1a5)

**Merged pull requests:**

- Stable dependencies for release [\#177](https://github.com/NeonGeckoCom/neon_speech/pull/177) ([NeonDaniel](https://github.com/NeonDaniel))
- Update global config on local user STT language change [\#184](https://github.com/NeonGeckoCom/neon_speech/pull/184) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a4) (2023-10-13)
## [4.2.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a4) (2023-11-22)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a3...4.1.1a4)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a3...4.2.1a4)

**Merged pull requests:**

- Update Dinkum Listener dependency [\#176](https://github.com/NeonGeckoCom/neon_speech/pull/176) ([NeonDaniel](https://github.com/NeonDaniel))
- Add timing metrics [\#183](https://github.com/NeonGeckoCom/neon_speech/pull/183) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a3) (2023-10-03)
## [4.2.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a3) (2023-11-14)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a2...4.1.1a3)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a2...4.2.1a3)

**Merged pull requests:**

- Add timing metrics for minerva testing [\#175](https://github.com/NeonGeckoCom/neon_speech/pull/175) ([NeonDaniel](https://github.com/NeonDaniel))
- Improved timing context handling with unit tests [\#182](https://github.com/NeonGeckoCom/neon_speech/pull/182) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a2) (2023-07-28)
## [4.2.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a2) (2023-11-10)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a1...4.1.1a2)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a1...4.2.1a2)

**Merged pull requests:**

- Kubernetes/No-audio server compat. [\#174](https://github.com/NeonGeckoCom/neon_speech/pull/174) ([NeonDaniel](https://github.com/NeonDaniel))
- Add timing metrics for audio input to handler in speech service [\#181](https://github.com/NeonGeckoCom/neon_speech/pull/181) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a1) (2023-07-27)
## [4.2.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a1) (2023-11-09)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.0...4.1.1a1)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.0...4.2.1a1)

**Merged pull requests:**

- Update container config handling and resolve logged warnings [\#173](https://github.com/NeonGeckoCom/neon_speech/pull/173) ([NeonDaniel](https://github.com/NeonDaniel))
- Resample API input wav audio to ensure format matches listener config [\#180](https://github.com/NeonGeckoCom/neon_speech/pull/180) ([NeonDaniel](https://github.com/NeonDaniel))



Expand Down
3 changes: 3 additions & 0 deletions neon_speech/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# Import to ensure patched class is applied
from neon_speech.transformers import NeonAudioTransformerService
163 changes: 127 additions & 36 deletions neon_speech/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import os
from typing import Dict

import ovos_dinkum_listener.plugins

from tempfile import mkstemp
Expand Down Expand Up @@ -80,8 +82,6 @@ def on_started():


class NeonSpeechClient(OVOSDinkumVoiceService):
_stopwatch = Stopwatch("get_stt")

def __init__(self, ready_hook=on_ready, error_hook=on_error,
stopping_hook=on_stopping, alive_hook=on_alive,
started_hook=on_started, watchdog=lambda: None,
Expand Down Expand Up @@ -112,6 +112,8 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
watchdog=watchdog)
self.daemon = daemonic
self.config.bus = self.bus
self._stt_stopwatch = Stopwatch("get_stt", allow_reporting=True,
bus=self.bus)
from neon_utils.signal_utils import init_signal_handlers, \
init_signal_bus
init_signal_bus(self.bus)
Expand All @@ -133,6 +135,37 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
LOG.info("Skipping api_stt init")
self.api_stt = None

def _record_begin(self):
self._stt_stopwatch.start()
OVOSDinkumVoiceService._record_begin(self)

def _stt_text(self, text: str, stt_context: dict):
self._stt_stopwatch.stop()
stt_context.setdefault("timing", dict())
stt_context["timing"]["get_stt"] = self._stt_stopwatch.time

# This is where the first Message of the interaction is created
OVOSDinkumVoiceService._stt_text(self, text, stt_context)
self._stt_stopwatch.report()

def _save_stt(self, audio_bytes, stt_meta, save_path=None):
stopwatch = Stopwatch("save_audio", True, self.bus)
with stopwatch:
path = OVOSDinkumVoiceService._save_stt(self, audio_bytes, stt_meta,
save_path)
stt_meta.setdefault('timing', dict())
stt_meta['timing']['save_audio'] = stopwatch.time
return path

def _save_ww(self, audio_bytes, ww_meta, save_path=None):
stopwatch = Stopwatch("save_ww", True, self.bus)
with stopwatch:
path = OVOSDinkumVoiceService._save_ww(self, audio_bytes, ww_meta,
save_path)
ww_meta.setdefault('timing', dict())
ww_meta['timing']['save_ww'] = stopwatch.time
return path

def _validate_message_context(self, message: Message, native_sources=None):
if message.context.get('destination') and \
"audio" not in message.context['destination']:
Expand Down Expand Up @@ -188,6 +221,16 @@ def register_event_handlers(self):
self.bus.on("neon.enable_wake_word", self.handle_enable_wake_word)
self.bus.on("neon.disable_wake_word", self.handle_disable_wake_word)

def _handle_get_languages_stt(self, message):
if self.config.get('listener', {}).get('enable_voice_loop', True):
return OVOSDinkumVoiceService._handle_get_languages_stt(self,
message)
# For server use, get the API STT langs
stt_langs = self.api_stt.available_languages or \
[self.config.get('lang') or 'en-us']
LOG.debug(f"Got stt_langs: {stt_langs}")
self.bus.emit(message.response({'langs': list(stt_langs)}))

def handle_disable_wake_word(self, message: Message):
"""
Disable a wake word. If the requested wake word is the only one enabled,
Expand Down Expand Up @@ -295,10 +338,18 @@ def handle_profile_update(self, message):
:param message: Message associated with profile update
"""
updated_profile = message.data.get("profile")
if updated_profile["user"]["username"] == \
if updated_profile["user"]["username"] != \
self._default_user["user"]["username"]:
apply_local_user_profile_updates(updated_profile,
self._default_user)
LOG.info(f"Ignoring profile update for "
f"{updated_profile['user']['username']}")
return
apply_local_user_profile_updates(updated_profile,
self._default_user)
if updated_profile.get("speech", {}).get("stt_language"):
new_stt_lang = updated_profile["speech"]["stt_language"]
if new_stt_lang != self.config['lang']:
from neon_speech.utils import patch_config
patch_config({"lang": new_stt_lang})

def handle_wake_words_state(self, message):
"""
Expand Down Expand Up @@ -327,31 +378,46 @@ def handle_get_stt(self, message: Message):
Emits a response to the sender with stt data or error data
:param message: Message associated with request
"""
received_time = time()
if message.data.get("audio_data"):
wav_file_path = self._write_encoded_file(
message.data.pop("audio_data"))
else:
wav_file_path = message.data.get("audio_file")
lang = message.data.get("lang")
ident = message.context.get("ident") or "neon.get_stt.response"

message.context.setdefault("timing", dict())
LOG.info(f"Handling STT request: {ident}")
if not wav_file_path:
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(
ident, data={"error": f"audio_file not specified!"}))
return

if not os.path.isfile(wav_file_path):
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(
ident, data={"error": f"{wav_file_path} Not found!"}))

try:

_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
timing = parser_data.pop('timing')
message.context["timing"] = {**message.context["timing"], **timing}
sent_time = message.context["timing"].get("client_sent",
received_time)
if received_time != sent_time:
message.context['timing']['client_to_core'] = \
received_time - sent_time
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(ident,
data={"parser_data": parser_data,
"transcripts": transcriptions}))
except Exception as e:
LOG.error(e)
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(ident, data={"error": repr(e)}))

def handle_audio_input(self, message):
Expand All @@ -370,11 +436,18 @@ def build_context(msg: Message):
'username': self._default_user["user"]["username"] or
"local",
'user_profiles': [self._default_user.content]}
ctx = {**defaults, **ctx, 'destination': ['skills'],
'timing': {'start': msg.data.get('time'),
'transcribed': time()}}
ctx = {**defaults, **ctx, 'destination': ['skills']}
ctx['timing'] = {**ctx.get('timing', {}),
**{'start': msg.data.get('time'),
'transcribed': time()}}
return ctx

received_time = time()
sent_time = message.context.get("timing", {}).get("client_sent",
received_time)
if received_time != sent_time:
message.context['timing']['client_to_core'] = \
received_time - sent_time
ident = message.context.get("ident") or "neon.audio_input.response"
LOG.info(f"Handling audio input: {ident}")
if message.data.get("audio_data"):
Expand All @@ -384,18 +457,23 @@ def build_context(msg: Message):
wav_file_path = message.data.get("audio_file")
lang = message.data.get("lang")
try:
with self._stopwatch:
_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
# _=transformed audio_data
_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
timing = parser_data.pop('timing')
message.context["audio_parser_data"] = parser_data
message.context.setdefault('timing', dict())
message.context['timing'] = {**timing, **message.context['timing']}
context = build_context(message)
context['timing']['get_stt'] = self._stopwatch.time
data = {
"utterances": transcriptions,
"lang": message.data.get("lang", "en-us")
}
# Send a new message to the skills module with proper routing ctx
handled = self._emit_utterance_to_skills(Message(
'recognizer_loop:utterance', data, context))

# Reply to original message with transcription/audio parser data
self.bus.emit(message.reply(ident,
data={"parser_data": parser_data,
"transcripts": transcriptions,
Expand Down Expand Up @@ -423,7 +501,7 @@ def handle_offline(self, _):
Handle notification to operate in offline mode
"""
LOG.info("Offline mode selected, Reloading STT Plugin")
config = dict(self.config)
config: Dict[str, dict] = dict(self.config)
if config['stt'].get('offline_module'):
config['stt']['module'] = config['stt'].get('offline_module')
self.voice_loop.stt = STTFactory.create(config)
Expand Down Expand Up @@ -456,35 +534,48 @@ def _get_stt_from_file(self, wav_file: str,
:return: (AudioData of object, extracted context, transcriptions)
"""
from neon_utils.file_utils import get_audio_file_stream
lang = lang or 'en-us' # TODO: read default from config
segment = AudioSegment.from_file(wav_file)
_stopwatch = Stopwatch()
lang = lang or self.config.get('lang')
desired_sample_rate = self.config['listener'].get('sample_rate', 16000)
desired_sample_width = self.config['listener'].get('sample_width', 2)
segment = (AudioSegment.from_file(wav_file).set_channels(1)
.set_frame_rate(desired_sample_rate)
.set_sample_width(desired_sample_width))
LOG.debug(f"Audio fr={segment.frame_rate},sw={segment.sample_width},"
f"fw={segment.frame_width},ch={segment.channels}")
audio_data = AudioData(segment.raw_data, segment.frame_rate,
segment.sample_width)
audio_stream = get_audio_file_stream(wav_file)
if not self.api_stt:
raise RuntimeError("api_stt not initialized."
" is `listener['enable_stt_api'] set to False?")
if hasattr(self.api_stt, 'stream_start'):
if self.lock.acquire(True, 30):
LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
self.api_stt.stream_start(lang)
while True:
try:
data = audio_stream.read(1024)
self.api_stt.stream_data(data)
except EOFError:
break
transcriptions = self.api_stt.stream_stop()
self.lock.release()
with _stopwatch:
if hasattr(self.api_stt, 'stream_start'):
audio_stream = get_audio_file_stream(wav_file, desired_sample_rate)
if self.lock.acquire(True, 30):
LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
self.api_stt.stream_start(lang)
while True:
try:
data = audio_stream.read(1024)
self.api_stt.stream_data(data)
except EOFError:
break
transcriptions = self.api_stt.stream_stop()
self.lock.release()
else:
LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
transcriptions = []
else:
LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
transcriptions = []
else:
transcriptions = self.api_stt.execute(audio_data, lang)
if isinstance(transcriptions, str):
LOG.warning("Transcriptions is a str, no alternatives provided")
transcriptions = [transcriptions]
audio, audio_context = self.transformers.transform(audio_data)
transcriptions = self.api_stt.execute(audio_data, lang)
if isinstance(transcriptions, str):
LOG.warning("Transcriptions is a str, no alternatives provided")
transcriptions = [transcriptions]

get_stt = float(_stopwatch.time)
with _stopwatch:
audio, audio_context = self.transformers.transform(audio_data)
audio_context["timing"] = {"get_stt": get_stt,
"transform_audio": _stopwatch.time}
LOG.info(f"Transcribed: {transcriptions}")
return audio, audio_context, transcriptions

Expand Down
Loading
Loading