Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/confirmation_state #125

Merged
merged 15 commits into from
Jun 20, 2024
Merged

fix/confirmation_state #125

merged 15 commits into from
Jun 20, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Jun 19, 2024

handle confirmation state audio chunks in dedicated handler, drop those chunks from STT if instant_listen is False

closes #107
closes OpenVoiceOS/ovos-core#488

dynamically determines sound duration

2024-06-19 18:25:10.372 - voice - ovos_dinkum_listener.voice_loop.hotwords:load_hotword_engines:186 - DEBUG - snd/start_listening.wav duration: 0.3484583333333333 seconds

without instant_listen

2024-06-19 18:01:20.327 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:201 - INFO - Starting loop in mode: ListeningMode.WAKEWORD
2024-06-19 18:01:23.016 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:268 - DEBUG - Detected wake_word: hey_mycroft
2024-06-19 18:01:23.016 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:488 - DEBUG - Wake word detected=hey_mycroft
2024-06-19 18:01:23.017 - voice - ovos_dinkum_listener.service:_hotword_audio:614 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-06-19 18:01:23.017 - voice - ovos_dinkum_listener.service:_hotword_audio:633 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-06-19 18:01:23.018 - voice - ovos_dinkum_listener.service:_record_begin:501 - DEBUG - Record begin
2024-06-19 18:01:23.019 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:524 - DEBUG - STATE: ListeningState.CONFIRMATION
2024-06-19 18:01:23.020 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:232 - INFO - Wakeword detected
2024-06-19 18:01:23.022 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.143 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.271 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.399 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:01:23.399 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:582 - DEBUG - STATE: ListeningState.BEFORE_COMMAND
2024-06-19 18:01:23.527 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.655 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.783 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:23.911 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.039 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.167 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:01:24.169 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_before_cmd:625 - DEBUG - STATE: ListeningState.IN_COMMAND
2024-06-19 18:01:24.295 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.423 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.551 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.679 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.807 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:24.935 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.063 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.191 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.319 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:01:25.321 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_in_cmd:664 - DEBUG - STATE: ListeningState.AFTER_COMMAND
2024-06-19 18:01:25.447 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:266 - INFO - speech finished
2024-06-19 18:01:25.448 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:733 - DEBUG - recorded 1.92 seconds of audio
2024-06-19 18:01:25.487 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:741 - DEBUG - removed 0.5399999999999998 seconds of silence, trimmed audio has 1.3800000000000001 seconds
2024-06-19 18:01:26.364 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:768 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'transcription': 'tell me a joke'}
2024-06-19 18:01:26.365 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:769 - INFO - transcribed: tell me a joke
2024-06-19 18:01:26.369 - voice - ovos_dinkum_listener.service:_record_end_signal:642 - DEBUG - Record end
2024-06-19 18:01:26.372 - voice - ovos_dinkum_listener.service:_stt_text:660 - DEBUG - STT: tell me a joke
2024-06-19 18:01:26.372 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:792 - DEBUG - STATE: ListeningState.DETECT_WAKEWORD
2024-06-19 18:01:26.374 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:803 - DEBUG - reset VAD

with instant_listen

2024-06-19 18:02:49.490 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:201 - INFO - Starting loop in mode: ListeningMode.WAKEWORD
2024-06-19 18:02:52.848 - voice - ovos_dinkum_listener.voice_loop.hotwords:found:268 - DEBUG - Detected wake_word: hey_mycroft
2024-06-19 18:02:52.849 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:488 - DEBUG - Wake word detected=hey_mycroft
2024-06-19 18:02:52.849 - voice - ovos_dinkum_listener.service:_hotword_audio:614 - DEBUG - Handling listen sound: snd/start_listening.wav
2024-06-19 18:02:52.850 - voice - ovos_dinkum_listener.service:_hotword_audio:633 - DEBUG - Emitting hotword event: recognizer_loop:wakeword
2024-06-19 18:02:52.850 - voice - ovos_dinkum_listener.service:_record_begin:501 - DEBUG - Record begin
2024-06-19 18:02:52.851 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_detect_ww:524 - DEBUG - STATE: ListeningState.CONFIRMATION
2024-06-19 18:02:52.852 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:232 - INFO - Wakeword detected
2024-06-19 18:02:52.853 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:256 - DEBUG - playing listen sound
2024-06-19 18:02:52.855 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:569 - DEBUG - instant_listen is on
2024-06-19 18:02:52.856 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_confirmation_sound:572 - DEBUG - STATE: ListeningState.BEFORE_COMMAND
2024-06-19 18:02:52.975 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.103 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.231 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.359 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.487 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.615 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.743 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.871 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:53.999 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:260 - DEBUG - waiting for speech
2024-06-19 18:02:54.002 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_before_cmd:625 - DEBUG - STATE: ListeningState.IN_COMMAND
2024-06-19 18:02:54.127 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.255 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.383 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.511 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.639 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.767 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:54.895 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.023 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.152 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.279 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.407 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.535 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.663 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:263 - DEBUG - recording speech
2024-06-19 18:02:55.666 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_in_cmd:664 - DEBUG - STATE: ListeningState.AFTER_COMMAND
2024-06-19 18:02:55.792 - voice - ovos_dinkum_listener.voice_loop.voice_loop:run:266 - INFO - speech finished
2024-06-19 18:02:55.792 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:733 - DEBUG - recorded 2.944 seconds of audio
2024-06-19 18:02:55.860 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_vad_remove_silence:737 - DEBUG - audio appears to be full silence! skipping VAD silence removal
2024-06-19 18:02:56.879 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:768 - DEBUG - transformers metadata: {'client_name': 'ovos_dinkum_listener', 'source': 'audio', 'destination': ['skills'], 'transcription': 'tell me a joke'}
2024-06-19 18:02:56.880 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:769 - INFO - transcribed: tell me a joke
2024-06-19 18:02:56.880 - voice - ovos_dinkum_listener.service:_record_end_signal:642 - DEBUG - Record end
2024-06-19 18:02:56.881 - voice - ovos_dinkum_listener.service:_stt_text:660 - DEBUG - STT: tell me a joke
2024-06-19 18:02:56.881 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:792 - DEBUG - STATE: ListeningState.DETECT_WAKEWORD
2024-06-19 18:02:56.882 - voice - ovos_dinkum_listener.voice_loop.voice_loop:_after_cmd:803 - DEBUG - reset VAD

@JarbasAl JarbasAl added the refactor code improvements with no functional changes label Jun 19, 2024
@JarbasAl JarbasAl requested review from mikejgray, NeonDaniel and a team June 19, 2024 00:25
@NeonDaniel
Copy link
Member

The instant_listen flag existed before the refactoring of the playback confirmation sound. This change appears to permanently set behavior to match instant_listen=True which could cause the WW confirmation sound to be recorded as part of the utterance.

I think a better solution would be to roll this back to just play the sound in this service as it is in the latest stable release

@JarbasAl
Copy link
Member Author

instant_listen is True by default

remove_silence now is also True by default and removes the sound from the final recording

@NeonDaniel
Copy link
Member

instant_listen is True by default

remove_silence now is also True by default and removes the sound from the final recording

Right, but instant_listen was originally defaulted to False to prevent recording the listening sound as part of a user utterance. If the confirmation sound is included in the STT recording, it can cause problems with the transcription. i.e. if the confirmation sound was a recording of "yes", then "yes" would be prepended to every STT audio segment

@JarbasAl
Copy link
Member Author

JarbasAl commented Jun 19, 2024

the microphone plugin reads audio in a thread https://github.com/OpenVoiceOS/ovos-microphone-plugin-alsa/blob/dev/ovos_microphone_plugin_alsa/__init__.py#L42

blocking here has no impact in what audio makes it to STT, it will still record the sound ?

instant_listen only made sense in the classic listener because it was blocking, it doesnt make sense in dinkum

(i typed this before in more detail but accidentally deleted comment instead of editing 😫 )

also note instant_listen was not part of mycroft-core and was always flagged as experimental, so in my view backwards compat was not warranted anyway even if it made sense in dinkum-listener

@JarbasAl
Copy link
Member Author

if we want to know when sound stops playing

    def _play_sound(self, uri: str, timeout=0.5, message: Optional[Message] = None):
        message = message or Message("", context={
            'client_name': 'ovos_dinkum_listener', 'source': 'listener',
            'destination': ["audio"]  # default native-source
        })
        self.bus.emit(message.forward("mycroft.audio.play_sound", {"uri": uri}))
        # block waiting for ovos-audio to report sound finished playing
        if not self.config.get("instant_listen", True):
            sess = SessionManager.get(message)
            SessionManager.wait_while_speaking(timeout=timeout, session=sess)

we could use this to to know how much time (and therefore chunks) to drop from beginning of STT audio

but i would put this into it's own flag and do it in a separated PR

@JarbasAl
Copy link
Member Author

JarbasAl commented Jun 19, 2024

@NeonDaniel please re-review, given your feedback i added back the confirmation state, but with a dedicated handler for chunks during that period so that they can get dropped from the STT buffer (which didn't happen before)

please test and manually inspect some recordings, as this can potentially crop the initial 0.5 seconds of audio

@JarbasAl JarbasAl changed the title refactor/drop_confirmation_state fix/confirmation_state Jun 19, 2024
@JarbasAl JarbasAl added the bug Something isn't working label Jun 19, 2024
@JarbasAl
Copy link
Member Author

@mikejgray and @goldyfruit can you verify this solves #107 ?

please test with both instant_listen set to True and to False, and if possible with docker/voice satellite also.

want to get this one right for this stable release :)

@JarbasAl JarbasAl requested a review from goldyfruit June 19, 2024 18:08
@JarbasAl JarbasAl merged commit 1f0f99a into dev Jun 20, 2024
9 checks passed
@JarbasAl JarbasAl deleted the refactor/drop_confirmation_state branch June 20, 2024 19:21
@github-actions github-actions bot mentioned this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refactor code improvements with no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stable Dependencies for 0.0.8 Release Delay between ww recognition and acknowledgement sound
2 participants