Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Hedda · 2024-10-17T08:09:54Z

First of all, thank you all for these enhancements that make audio in ESPHome much better than what upstream is by default today!

@gnumpi @nighi @nielsnl68 @johnboiles I would like to make a request but here is a little backstory; as you and maybe others following your combined work on custom audio components for ESPHome in this repository are perhaps already aware of; Nabu Casa plans on soon releasing an official Home Assistant "Voice Satellite" appliance (a smart speaker voice assistant with media playback features) as an official voice assistant development platform and framework based on ESPHome with hardware that combines ESP32-S3 and an xCORE chip from XMOS for advanced audio processing, with that the PCB(s) in that not only including far-field microphones and built-in speaker but by default also including an audio-jack output for external speakers as well as GPIO pins for it to be used as a development board.

For that reason the lead ESPHome developers currently have an official "Home Assistant Voice PE" ("home-assistant-voice-pe") fork that ESPHome developers from Nabu Casa are actively working on in a relativly fast-pace with focus on adding and improving/enhancing many features related to i2s audio, voice, and media player components for ESPHome, and I understand they themselves have a plan to sooner or later backporting all the stable code from that forked home-assistant-voice-pe repository back upstream to main ESPHome for mainlining once they feel that the code is no longer experimental.

https://github.com/esphome/home-assistant-voice-pe

I would therefore like to ask if you and others here could consider backporting some or all of these custom i2s audio components and try submit as code patches upstream to either that fork or to that experimental "home-assistant-voice-pe" (Home Assistant Voice PE) as a stop-gap step before mainlining as that might have a lower threshold for entry, or alternativly consider trying to submit some stable improvements/enhancements directly to the main ESPHome repository if feel the code is stable and those components belong in upstream, with the goal of improving out-of-the-box capabilities for all audio related features in upstream mainline ESPHome.

https://github.com/esphome/esphome

Any thoughts on trying mainlining most audio enhancements from this repository to get them included in upstream ESPHome?

PS: For more info and reference check out "voice assistants" section in this Home Assistant's Roadmap 2024 Midyear Update blog post:

https://www.home-assistant.io/blog/2024/06/12/roadmap-2024h1#voice-assistants

Voice assistants

Since last year, we have built our voice assistant framework from scratch with our “Year of the Voice” initiative. Now that the infrastructure is in place, we want to make sure that it will be usable for everyone (before the demise of Alexa and Google Assistant 😜).

Current priority 1: Improve Assist capabilities out of the box

Our research has shown users are most interested in us improving out-of-the-box capabilities of Assist, for instance, timers, reminders, and music controls.

Current priority 2: Make Assist easier to start with

At the moment, there are several things you need to install or configure to get started with voice. We want to make it easier to set up and onboard. There are already some good hardware choices to start using voice, but we’re exploring building our voice satellite hardware to create a more plug-and-play experience.

Hedda · 2024-10-17T08:38:13Z

FYI, obvsiously Nabu Casa development in initially focuses on controlling your smart home via the Home Assistant platform and their incredible Assist voice control pipeline.

https://community.home-assistant.io/t/voice-chapter-7-supercharged-wake-words-and-timers/743625/

However, they are also looking at music playback via such "Voice Satellite" hardware streaming from Music Assistant to ESPHome as a core feature, and as such they are going to promote audio support for ESPHome and native media player functionality.

https://www.home-assistant.io/blog/2024/05/09/music-assistant-2

So to eventually make more enhanced/improved ESPHome features/functions related to audio output, voice input, and media playback become useful to even avérage end users of Home Assistant they have made it clear that their plan is not only to have them be supported upstream ESPHome project by default, but they also plan on standardizing voice assistant devices in both ESPHome (including audio output and media player features/functions) as well as matching functionallity and integrations in the Home Assistant core, and ESHome + Nabu Casa developers are now working on several new components related to this, including a new entity component as assist_satellite platform for that which will represent a standard VoIP-based voice satellite for Home Assistant Assist voice control. As such I also recommend that you check out this initial architecture discussions:

And the initial entity component for this new assist_satellite platform has been merged to Home Assistant core now:

Add Assist satellite entity + VoIP home-assistant/core#123830

Also follow related ongoing patches with many new related features submitted to both ESPHome and the Home Assistant core:

Bigger picture:

Standardize how voice satellites expose their capabilities
Standardize how voice satellites are configured
Automate based on the state of the satellite's pipeline
Control the behavior of a voice satellite from HA during the setup wizard
Skip wake word and listen for a command (with or without executing it)
Listen for a specific wake word (without running a pipeline)
Control a voice satellite from HA using service calls
Announce text using the TTS portion of the satellite's pipeline

Note also that the XMOS xCORE AI chip is technically also not limited to audio input from the microphone, so it can also be used for audio output to improve music playback, etc. using other custom AI models algorithms adding EQ options, and other features such as DRC (Digital Room Correction), etc. to achieve improved sound fidelity. Many products only XMOS chip just for music playback, like example music network streamers, to get great HiFi quality audio for low cost.

On top of that @rwrozelle has started working on laying the groundwork for extending child components of Media Player in ESPHome (and Home Assistant) to allow ESPHome to be built with a much richer set of capabilities in the media_player. See:

https://github.com/rwrozelle/audio-media-player
- Extend Functionality of Media_Player esphome/aioesphomeapi#911

PS; Other than the official Home Assistant Voice Satellite development hardware there are also already some third-parties working on ESPHome voice assistant hardware products, like for example FutureProofHomes have posted a new video on their YouTube channel showing off the current design of their ESP32-based hardware prototype upcoming FutureProofHomes Satellite1 voice control development board which looks to now be using such a XU316-1024-QF60A-C24 based XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit connected externally, (which by the way features 3,5mm line out jack for audio output to external speakers). Check it out:

Hedda · 2024-10-17T11:09:49Z

Off-topic but make sure that you do not miss this pull request with new related improvements in upstream that was just merged:

[speaker, i2s_audio] I2S Speaker implementation using a ring buffer esphome/esphome#7605

And the matching pull request to implement use of that in the "Home Assistant Voice PE" ("home-assistant-voice-pe") fork repo:

[nabu] Use speaker output instead of writing to I2S directly esphome/home-assistant-voice-pe#163

Also used as proof-of-concept in the nabu component in the kahrendt-i2s-audio-approach branch of home-assistant-voice-pe:

https://github.com/esphome/home-assistant-voice-pe/tree/kahrendt-i2s-audio-approach

nielsnl68 · 2024-10-17T12:26:39Z

I think we should wait until Nabu Casa is done porting there audio code back into the upstream esphome repo. After that we can see how all works

At that point we can decide if it is still needed to make enhancements or not. So far i have seen for the speaker component it is much better setup.

gnumpi · 2024-10-17T12:44:58Z

Thanks for sharing all the information. I agree with Niels here, we should wait until the voice-kit got merged into the ESPHome. On the other hand, Nabu Casa managed to implement their media player without depending on the ADF SDK, which is amazing. As the name already implies the adf_pipeline component relies totally on that sdk so I don't see many parts that could be or should be ported. The only thing that might be interesting to port is the support for full-duplex i2s. But for this we definitely should wait. They do a great job in rewriting the i2s component right now.

Hedda · 2024-10-18T08:29:30Z

That makes sense, thanks for that input!

Suspect that they may potentially also make more refactoring changes that will scramble things around more before merging to mainline ESPHome.

For example just last night they moved the audio decoder and resampling libraries into their own separate repo at https://github.com/esphome/esp-audio-libs

Hopefully splitting things like that while still keeping repos under the ESPHome originazation on GitHub will make it more readable and get more eyes on it + not as dounting to contribute upstream for mainlining.

johnboiles · 2024-10-18T16:06:29Z

@Hedda thanks for pointing out esphome/home-assistant-voice-pe#163! I'll certainly port my SPDIF component in #59 to use that instead since there's nothing ADF specific about it.

Hedda · 2024-10-19T07:38:28Z

By the way, recommend that you guys check out the new "ReSpeaker Lite" Voice Assistant Development Kit hardware from Seeed Studio which combine an ESP32-S3 with an XMOS xCORE XU316 MCU DSP chip for advanced audio acceleration and pre/post-processing as that features both far-field microphones for voice input and a 3.5mm audio output jack for external speakers so it can be used as a ESPHome-based Home Assistant Assist Satellite devkit (as it has the same hardware components as the upcoming official voice-kit from Home Assistent and Nabu Casa):

https://community.home-assistant.io/t/respeaker-lite-new-seeed-studio-voice-assistant-development-kit-hardware-combine-esp32-with-xmos-xu316-dsp-chip-for-advanced-audio-processing-as-a-esphome-based-home-assistant-assist-satellite-voice-devkit/756944

Hedda · 2024-10-25T13:17:38Z

FYI, they (kahrendt) have now also submitted these additional pull requests to upstream ESPHome:

[speaker, i2s_audio] I2S Speaker implementation using a ring buffer esphome/esphome#7605
[speaker, i2s_audio] Support audio_dac component, mute actions, and improved logging esphome/esphome#7664
- [speaker] Document mute actions and audio dac support esphome/esphome-docs#4378
- Document Speaker volume set action esphome/esphome-docs#4343
[speaker] Add speaker media player esphome/esphome#7672
- [speaker] Adds speaker media player documentation esphome/esphome-docs#4391
[media_player] Add new media player conditions esphome/esphome#7667
- [media_player] Document paused and announcing conditions esphome/esphome-docs#4387

As well as bumped esp-audio-libs to release version 1.0.0 (initial release) in the experimental home-assistant-voice-pe project:

https://github.com/esphome/esp-audio-libs

vuminhtuanhvtc · 2024-11-04T03:10:32Z

Thank you, @Hedda , for your updates. I’m currently using the ESP32-S3 N16R8, INMP441 microphone, and MAX98357 DAC. Could I use YAML in Home Assistant Voice with my setup, or is it specifically designed for the xCORE chip?

nighi mentioned this issue Oct 19, 2024

ES8388: Slow Audio Output and Microphone not running #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Hedda commented Oct 17, 2024 •

edited

Loading

Hedda commented Oct 17, 2024 •

edited

Loading

Hedda commented Oct 17, 2024 •

edited

Loading

nielsnl68 commented Oct 17, 2024 •

edited

Loading

gnumpi commented Oct 17, 2024

Hedda commented Oct 18, 2024

johnboiles commented Oct 18, 2024

Hedda commented Oct 19, 2024

Hedda commented Oct 25, 2024

vuminhtuanhvtc commented Nov 4, 2024

Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Comments

Hedda commented Oct 17, 2024 • edited Loading

Voice assistants

Current priority 1: Improve Assist capabilities out of the box

Current priority 2: Make Assist easier to start with

Hedda commented Oct 17, 2024 • edited Loading

Hedda commented Oct 17, 2024 • edited Loading

nielsnl68 commented Oct 17, 2024 • edited Loading

gnumpi commented Oct 17, 2024

Hedda commented Oct 18, 2024

johnboiles commented Oct 18, 2024

Hedda commented Oct 19, 2024

Hedda commented Oct 25, 2024

vuminhtuanhvtc commented Nov 4, 2024

Hedda commented Oct 17, 2024 •

edited

Loading

Hedda commented Oct 17, 2024 •

edited

Loading

Hedda commented Oct 17, 2024 •

edited

Loading

nielsnl68 commented Oct 17, 2024 •

edited

Loading