Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport some or all of these custom audio components and try submit to mainline ESPHome or their official "Home Assistant Voice PE" fork? #72

Open
Hedda opened this issue Oct 17, 2024 · 9 comments

Comments

@Hedda
Copy link

Hedda commented Oct 17, 2024

First of all, thank you all for these enhancements that make audio in ESPHome much better than what upstream is by default today!

@gnumpi @nighi @nielsnl68 @johnboiles I would like to make a request but here is a little backstory; as you and maybe others following your combined work on custom audio components for ESPHome in this repository are perhaps already aware of; Nabu Casa plans on soon releasing an official Home Assistant "Voice Satellite" appliance (a smart speaker voice assistant with media playback features) as an official voice assistant development platform and framework based on ESPHome with hardware that combines ESP32-S3 and an xCORE chip from XMOS for advanced audio processing, with that the PCB(s) in that not only including far-field microphones and built-in speaker but by default also including an audio-jack output for external speakers as well as GPIO pins for it to be used as a development board.

For that reason the lead ESPHome developers currently have an official "Home Assistant Voice PE" ("home-assistant-voice-pe") fork that ESPHome developers from Nabu Casa are actively working on in a relativly fast-pace with focus on adding and improving/enhancing many features related to i2s audio, voice, and media player components for ESPHome, and I understand they themselves have a plan to sooner or later backporting all the stable code from that forked home-assistant-voice-pe repository back upstream to main ESPHome for mainlining once they feel that the code is no longer experimental.

https://github.com/esphome/home-assistant-voice-pe

I would therefore like to ask if you and others here could consider backporting some or all of these custom i2s audio components and try submit as code patches upstream to either that fork or to that experimental "home-assistant-voice-pe" (Home Assistant Voice PE) as a stop-gap step before mainlining as that might have a lower threshold for entry, or alternativly consider trying to submit some stable improvements/enhancements directly to the main ESPHome repository if feel the code is stable and those components belong in upstream, with the goal of improving out-of-the-box capabilities for all audio related features in upstream mainline ESPHome.

https://github.com/esphome/esphome

Any thoughts on trying mainlining most audio enhancements from this repository to get them included in upstream ESPHome?

PS: For more info and reference check out "voice assistants" section in this Home Assistant's Roadmap 2024 Midyear Update blog post:

https://www.home-assistant.io/blog/2024/06/12/roadmap-2024h1#voice-assistants

Voice assistants

Since last year, we have built our voice assistant framework from scratch with our “Year of the Voice” initiative. Now that the infrastructure is in place, we want to make sure that it will be usable for everyone (before the demise of Alexa and Google Assistant 😜).

Current priority 1: Improve Assist capabilities out of the box

Our research has shown users are most interested in us improving out-of-the-box capabilities of Assist, for instance, timers, reminders, and music controls.

Current priority 2: Make Assist easier to start with

At the moment, there are several things you need to install or configure to get started with voice. We want to make it easier to set up and onboard. There are already some good hardware choices to start using voice, but we’re exploring building our voice satellite hardware to create a more plug-and-play experience.

@Hedda
Copy link
Author

Hedda commented Oct 17, 2024

FYI, obvsiously Nabu Casa development in initially focuses on controlling your smart home via the Home Assistant platform and their incredible Assist voice control pipeline.

However, they are also looking at music playback via such "Voice Satellite" hardware streaming from Music Assistant to ESPHome as a core feature, and as such they are going to promote audio support for ESPHome and native media player functionality.

So to eventually make more enhanced/improved ESPHome features/functions related to audio output, voice input, and media playback become useful to even avérage end users of Home Assistant they have made it clear that their plan is not only to have them be supported upstream ESPHome project by default, but they also plan on standardizing voice assistant devices in both ESPHome (including audio output and media player features/functions) as well as matching functionallity and integrations in the Home Assistant core, and ESHome + Nabu Casa developers are now working on several new components related to this, including a new entity component as assist_satellite platform for that which will represent a standard VoIP-based voice satellite for Home Assistant Assist voice control. As such I also recommend that you check out this initial architecture discussions:

And the initial entity component for this new assist_satellite platform has been merged to Home Assistant core now:

Also follow related ongoing patches with many new related features submitted to both ESPHome and the Home Assistant core:

Bigger picture:

  • Standardize how voice satellites expose their capabilities
  • Standardize how voice satellites are configured
  • Automate based on the state of the satellite's pipeline
  • Control the behavior of a voice satellite from HA during the setup wizard
  • Skip wake word and listen for a command (with or without executing it)
  • Listen for a specific wake word (without running a pipeline)
  • Control a voice satellite from HA using service calls
  • Announce text using the TTS portion of the satellite's pipeline

Note also that the XMOS xCORE AI chip is technically also not limited to audio input from the microphone, so it can also be used for audio output to improve music playback, etc. using other custom AI models algorithms adding EQ options, and other features such as DRC (Digital Room Correction), etc. to achieve improved sound fidelity. Many products only XMOS chip just for music playback, like example music network streamers, to get great HiFi quality audio for low cost.

On top of that @rwrozelle has started working on laying the groundwork for extending child components of Media Player in ESPHome (and Home Assistant) to allow ESPHome to be built with a much richer set of capabilities in the media_player. See:

PS; Other than the official Home Assistant Voice Satellite development hardware there are also already some third-parties working on ESPHome voice assistant hardware products, like for example FutureProofHomes have posted a new video on their YouTube channel showing off the current design of their ESP32-based hardware prototype upcoming FutureProofHomes Satellite1 voice control development board which looks to now be using such a XU316-1024-QF60A-C24 based XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit connected externally, (which by the way features 3,5mm line out jack for audio output to external speakers). Check it out:

@Hedda
Copy link
Author

Hedda commented Oct 17, 2024

Off-topic but make sure that you do not miss this pull request with new related improvements in upstream that was just merged:

And the matching pull request to implement use of that in the "Home Assistant Voice PE" ("home-assistant-voice-pe") fork repo:

Also used as proof-of-concept in the nabu component in the kahrendt-i2s-audio-approach branch of home-assistant-voice-pe:

@nielsnl68
Copy link

nielsnl68 commented Oct 17, 2024

I think we should wait until Nabu Casa is done porting there audio code back into the upstream esphome repo. After that we can see how all works

At that point we can decide if it is still needed to make enhancements or not. So far i have seen for the speaker component it is much better setup.

@gnumpi
Copy link
Owner

gnumpi commented Oct 17, 2024

Thanks for sharing all the information. I agree with Niels here, we should wait until the voice-kit got merged into the ESPHome. On the other hand, Nabu Casa managed to implement their media player without depending on the ADF SDK, which is amazing. As the name already implies the adf_pipeline component relies totally on that sdk so I don't see many parts that could be or should be ported. The only thing that might be interesting to port is the support for full-duplex i2s. But for this we definitely should wait. They do a great job in rewriting the i2s component right now.

@Hedda
Copy link
Author

Hedda commented Oct 18, 2024

That makes sense, thanks for that input!

Suspect that they may potentially also make more refactoring changes that will scramble things around more before merging to mainline ESPHome.

For example just last night they moved the audio decoder and resampling libraries into their own separate repo at https://github.com/esphome/esp-audio-libs

Hopefully splitting things like that while still keeping repos under the ESPHome originazation on GitHub will make it more readable and get more eyes on it + not as dounting to contribute upstream for mainlining.

@johnboiles
Copy link

@Hedda thanks for pointing out esphome/home-assistant-voice-pe#163! I'll certainly port my SPDIF component in #59 to use that instead since there's nothing ADF specific about it.

@Hedda
Copy link
Author

Hedda commented Oct 19, 2024

By the way, recommend that you guys check out the new "ReSpeaker Lite" Voice Assistant Development Kit hardware from Seeed Studio which combine an ESP32-S3 with an XMOS xCORE XU316 MCU DSP chip for advanced audio acceleration and pre/post-processing as that features both far-field microphones for voice input and a 3.5mm audio output jack for external speakers so it can be used as a ESPHome-based Home Assistant Assist Satellite devkit (as it has the same hardware components as the upcoming official voice-kit from Home Assistent and Nabu Casa):

@vuminhtuanhvtc
Copy link

Thank you, @Hedda , for your updates. I’m currently using the ESP32-S3 N16R8, INMP441 microphone, and MAX98357 DAC. Could I use YAML in Home Assistant Voice with my setup, or is it specifically designed for the xCORE chip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants