fix/confirmation_state #125

JarbasAl · 2024-06-19T00:25:01Z

handle confirmation state audio chunks in dedicated handler, drop those chunks from STT if instant_listen is False

closes #107
closes OpenVoiceOS/ovos-core#488

dynamically determines sound duration

2024-06-19 18:25:10.372 - voice - ovos_dinkum_listener.voice_loop.hotwords:load_hotword_engines:186 - DEBUG - snd/start_listening.wav duration: 0.3484583333333333 seconds

without instant_listen

2024-06-19 18:01:20.327 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:201 - INFO - Starting loop in mode: ListeningMode.WAKEWORD
2024-06-19 18:01:23.016 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:268 - DEBUG - Detected wake_word: hey_mycroft
2024-06-19 18:01:23.016 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:488 - DEBUG - Wake word detected=hey_mycroft
2024-06-19 18:01:23.017 - voice - ovos_dinkum_listener.service:_hotword_audio:614 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-06-19 18:01:23.017 - voice - ovos_dinkum_listener.service:_hotword_audio:633 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-06-19 18:01:23.018 - voice - ovos_dinkum_listener.service:_record_begin:501 - DEBUG - Record begin
2024-06-19 18:01:23.019 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:524 - DEBUG - STATE: ListeningState.CONFIRMATION
2024-06-19 18:01:23.020 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:232 - INFO - Wakeword detected
2024-06-19 18:01:23.022 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.143 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.271 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.399 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.399 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:582 - DEBUG - STATE: ListeningState.BEFORE_COMMAND
2024-06-19 18:01:23.527 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.655 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.783 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.911 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.039 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.167 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.169 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_before_cmd:625 - DEBUG - STATE: ListeningState.IN_COMMAND
2024-06-19 18:01:24.295 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.423 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.551 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.679 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.807 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.935 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.063 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.191 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.319 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.321 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_in_cmd:664 - DEBUG - STATE: ListeningState.AFTER_COMMAND
2024-06-19 18:01:25.447 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:266 - INFO - speech finished
2024-06-19 18:01:25.448 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:733 - DEBUG - recorded 1.92 seconds of audio
2024-06-19 18:01:25.487 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:741 - DEBUG - removed 0.5399999999999998 seconds of silence, trimmed audio has 1.3800000000000001 seconds
2024-06-19 18:01:26.364 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:768 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'transcription': 'tell me a joke'}
2024-06-19 18:01:26.365 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:769 - INFO - transcribed: tell me a joke
2024-06-19 18:01:26.369 - voice - ovos_dinkum_listener.service:_record_end_signal:642 - DEBUG - Record end
2024-06-19 18:01:26.372 - voice - ovos_dinkum_listener.service:_stt_text:660 - DEBUG - STT: tell me a joke
2024-06-19 18:01:26.372 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:792 - DEBUG - STATE: ListeningState.DETECT_WAKEWORD
2024-06-19 18:01:26.374 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:803 - DEBUG - reset VAD

with instant_listen

2024-06-19 18:02:49.490 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:201 - INFO - Starting loop in mode: ListeningMode.WAKEWORD
2024-06-19 18:02:52.848 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:268 - DEBUG - Detected wake_word: hey_mycroft
2024-06-19 18:02:52.849 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:488 - DEBUG - Wake word detected=hey_mycroft
2024-06-19 18:02:52.849 - voice - ovos_dinkum_listener.service:_hotword_audio:614 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-06-19 18:02:52.850 - voice - ovos_dinkum_listener.service:_hotword_audio:633 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-06-19 18:02:52.850 - voice - ovos_dinkum_listener.service:_record_begin:501 - DEBUG - Record begin
2024-06-19 18:02:52.851 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:524 - DEBUG - STATE: ListeningState.CONFIRMATION
2024-06-19 18:02:52.852 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:232 - INFO - Wakeword detected
2024-06-19 18:02:52.853 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:02:52.855 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:569 - DEBUG - instant_listen is on
2024-06-19 18:02:52.856 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:572 - DEBUG - STATE: ListeningState.BEFORE_COMMAND
2024-06-19 18:02:52.975 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.103 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.231 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.359 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.487 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.615 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.743 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.871 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.999 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:54.002 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_before_cmd:625 - DEBUG - STATE: ListeningState.IN_COMMAND
2024-06-19 18:02:54.127 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.255 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.383 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.511 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.639 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.767 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.895 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.023 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.152 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.279 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.407 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.535 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.663 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.666 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_in_cmd:664 - DEBUG - STATE: ListeningState.AFTER_COMMAND
2024-06-19 18:02:55.792 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:266 - INFO - speech finished
2024-06-19 18:02:55.792 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:733 - DEBUG - recorded 2.944 seconds of audio
2024-06-19 18:02:55.860 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:737 - DEBUG - audio appears to be full silence! skipping VAD silence removal
2024-06-19 18:02:56.879 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:768 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'transcription': 'tell me a joke'}
2024-06-19 18:02:56.880 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:769 - INFO - transcribed: tell me a joke
2024-06-19 18:02:56.880 - voice - ovos_dinkum_listener.service:_record_end_signal:642 - DEBUG - Record end
2024-06-19 18:02:56.881 - voice - ovos_dinkum_listener.service:_stt_text:660 - DEBUG - STT: tell me a joke
2024-06-19 18:02:56.881 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:792 - DEBUG - STATE: ListeningState.DETECT_WAKEWORD
2024-06-19 18:02:56.882 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:803 - DEBUG - reset VAD

closes #107

NeonDaniel · 2024-06-19T00:31:22Z

The instant_listen flag existed before the refactoring of the playback confirmation sound. This change appears to permanently set behavior to match instant_listen=True which could cause the WW confirmation sound to be recorded as part of the utterance.

I think a better solution would be to roll this back to just play the sound in this service as it is in the latest stable release

JarbasAl · 2024-06-19T00:43:10Z

instant_listen is True by default

remove_silence now is also True by default and removes the sound from the final recording

NeonDaniel · 2024-06-19T00:49:32Z

instant_listen is True by default

remove_silence now is also True by default and removes the sound from the final recording

Right, but instant_listen was originally defaulted to False to prevent recording the listening sound as part of a user utterance. If the confirmation sound is included in the STT recording, it can cause problems with the transcription. i.e. if the confirmation sound was a recording of "yes", then "yes" would be prepended to every STT audio segment

JarbasAl · 2024-06-19T01:02:43Z

the microphone plugin reads audio in a thread https://github.com/OpenVoiceOS/ovos-microphone-plugin-alsa/blob/dev/ovos_microphone_plugin_alsa/__init__.py#L42

blocking here has no impact in what audio makes it to STT, it will still record the sound ?

instant_listen only made sense in the classic listener because it was blocking, it doesnt make sense in dinkum

(i typed this before in more detail but accidentally deleted comment instead of editing 😫 )

also note instant_listen was not part of mycroft-core and was always flagged as experimental, so in my view backwards compat was not warranted anyway even if it made sense in dinkum-listener

JarbasAl · 2024-06-19T01:12:04Z

if we want to know when sound stops playing

    def _play_sound(self, uri: str, timeout=0.5, message: Optional[Message] = None):
        message = message or Message("", context={
            'client_name': 'ovos_dinkum_listener', 'source': 'listener',
            'destination': ["audio"]  # default native-source
        })
        self.bus.emit(message.forward("mycroft.audio.play_sound", {"uri": uri}))
        # block waiting for ovos-audio to report sound finished playing
        if not self.config.get("instant_listen", True):
            sess = SessionManager.get(message)
            SessionManager.wait_while_speaking(timeout=timeout, session=sess)

we could use this to to know how much time (and therefore chunks) to drop from beginning of STT audio

but i would put this into it's own flag and do it in a separated PR

JarbasAl · 2024-06-19T01:57:18Z

@NeonDaniel please re-review, given your feedback i added back the confirmation state, but with a dedicated handler for chunks during that period so that they can get dropped from the STT buffer (which didn't happen before)

please test and manually inspect some recordings, as this can potentially crop the initial 0.5 seconds of audio

ovos_dinkum_listener/service.py

JarbasAl · 2024-06-19T18:08:39Z

@mikejgray and @goldyfruit can you verify this solves #107 ?

please test with both instant_listen set to True and to False, and if possible with docker/voice satellite also.

want to get this one right for this stable release :)

refactor/drop_confirmation_state

126764a

closes #107

JarbasAl added the refactor code improvements with no functional changes label Jun 19, 2024

JarbasAl requested review from mikejgray, NeonDaniel and a team June 19, 2024 00:25

JarbasAl mentioned this pull request Jun 19, 2024

Refactor minimum recording length handling #112

Closed

JarbasAl mentioned this pull request Jun 19, 2024

feat/get_response_dynamic_timeout OpenVoiceOS/OVOS-workshop#211

Merged

JarbasAl mentioned this pull request Jun 19, 2024

feat/drop_ww_sounds_from_STT_audio #126

Closed

skip listen_sound from STT buffer

02fb967

JarbasAl commented Jun 19, 2024

View reviewed changes

ovos_dinkum_listener/service.py Show resolved Hide resolved

validate source

cb91269

JarbasAl changed the title ~~refactor/drop_confirmation_state~~ fix/confirmation_state Jun 19, 2024

JarbasAl added the bug Something isn't working label Jun 19, 2024

JarbasAl added 6 commits June 19, 2024 03:43

get default sound duration from the file itself if available

950c5ea

get default sound duration from the file itself if available

3e67444

log

47c5f83

more logs

ec9feb0

resolve sound uris

9262953

new_util/get_sound_duration

29c8212

JarbasAl mentioned this pull request Jun 19, 2024

new_util/get_sound_duration OpenVoiceOS/ovos-utils#254

Merged

JarbasAl added 4 commits June 19, 2024 18:26

new_util/get_sound_duration

2e4fd51

new_util/get_sound_duration

7e882f3

fix sound destination

e967013

utils 0.0.38 compat

241561e

fix tests

44a4fcb

JarbasAl requested a review from goldyfruit June 19, 2024 18:08

test

649760f

JarbasAl merged commit 1f0f99a into dev Jun 20, 2024
9 checks passed

JarbasAl deleted the refactor/drop_confirmation_state branch June 20, 2024 19:21

github-actions bot mentioned this pull request Sep 2, 2024

0.1.0 #129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/confirmation_state #125

fix/confirmation_state #125

JarbasAl commented Jun 19, 2024 •

edited

Loading

NeonDaniel commented Jun 19, 2024

JarbasAl commented Jun 19, 2024

NeonDaniel commented Jun 19, 2024

JarbasAl commented Jun 19, 2024 •

edited

Loading

JarbasAl commented Jun 19, 2024

JarbasAl commented Jun 19, 2024 •

edited

Loading

JarbasAl commented Jun 19, 2024

fix/confirmation_state #125

fix/confirmation_state #125

Conversation

JarbasAl commented Jun 19, 2024 • edited Loading

NeonDaniel commented Jun 19, 2024

JarbasAl commented Jun 19, 2024

NeonDaniel commented Jun 19, 2024

JarbasAl commented Jun 19, 2024 • edited Loading

JarbasAl commented Jun 19, 2024

JarbasAl commented Jun 19, 2024 • edited Loading

JarbasAl commented Jun 19, 2024

JarbasAl commented Jun 19, 2024 •

edited

Loading

JarbasAl commented Jun 19, 2024 •

edited

Loading

JarbasAl commented Jun 19, 2024 •

edited

Loading