Now with Semantic Kernel and ChatGPT!
The Conversational Speaker, a.k.a. "Friend Bot", uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back.
This project is written in .NET 8 which supports Linux/Raspbian, macOS, and Windows.
Read time: 15 minutes
Build time: 30 minutes
Cost:
- Hardware
- $ for Raspberry PI 4 Model B
- $ for USB Omnidirectional Speakerphone
- $ for an SD card (to setup the Raspberry Pi OS)
- Software
- Azure Cognitive Speech Services
- Free tier: 5 audio hours per month and 1 concurrent request.
- Free $200 credit: With a new Azure account that can be used during the first 30 days.
- OpenAI OR Azure OpenAI
- $0.002 / 1K tokens / ~750 words: ChatGPT (gpt-3.5-turbo)
- Free $18 credit: With a new OpenAI account that can be used during your first 90 days.
- Azure Cognitive Speech Services
You will need an instance of Azure Cognitive Services and an OpenAI account. You can run the software on nearly any platform, but let's start with a Raspberry Pi.
You may also use an instance of Azure OpenAI in place of OpenAI as well.
If you are new to Raspberry Pis, check out this getting started guide.
- Insert an SD card into your PC.
- Go to https://aka.ms/maker/rpi/software then download and run the Raspberry Pi Imager.
- Click
Choose OS
and select the default Raspberry Pi OS (32-bit). - Click
Choose Storage
, select the SD card. - Click
Write
and wait for the imaging to complete. - Put the SD card into your Raspberry Pi and connect a keyboard, mouse, and monitor.
- Complete the initial setup, making sure to configure Wi-Fi.
- Plug in the USB speaker/microphone if you have not already.
- On the Raspberry PI OS desktop, right-click on the volume icon in the top-right of the screen and make sure the USB device is selected.
- Right-click on the microphone icon in the top-right of the screen and make sure the USB device is selected.
The conversational speaker uses Azure Cognitive Service for speech-to-text and text-to-speech. Below are the steps to create an Azure account and an instance of Azure Cognitive Services.
- In a web browser, navigate to https://aka.ms/friendbot/azure and click on
Try Azure for Free
. - Click on
Start Free
to start creating a free Azure account. - Sign in with your Microsoft or GitHub account.
- After signing in, you will be prompted to enter some information.
NOTE: Even though this is a free account, Azure still requires credit card information. You will not be charged unless you change settings later.
- After your account setup is complete, navigate to https://aka.ms/friendbot/azureportal.
- Sign into your account at https://aka.ms/friendbot/azureportal.
- In the search bar at the top, enter
Cognitive Services
. UnderMarketplace
selectCognitive Services
. (It may take a few seconds to populate.) - Verify the correct subscription is selected. Under
Resource Group
selectCreate New
. Enter a resource group name (e.g.conv-speak-rg
). - Select a region and a name for your instance of Azure Cognitive Services (e.g.
my-conv-speak-cog-001
).NOTE: EastUS, WestEurope, or SoutheastAsia are recommended, as those regions tend to support the greatest number of features.
- Click on
Review + Create
. After validation passes, clickCreate
. - When deployment has completed you can click
Go to resource
to view your Azure Cognitive Services resource. - On the left side navigation bar, under
Resourse Management
, selectKeys and Endpoint
. - Copy either of the two Cognitive Services keys. Save this key in a secure location for later.
Windows 11 users: If the application is stalling when calling the text-to-speech API, make sure you have applied all current security updates (link).
The conversational speaker uses OpenAI's models to hold a friendly conversation. Below are the steps to create a new account and access the AI models.
- In a web browser, navigate to https://aka.ms/maker/openai. Click
Sign up
.NOTE: can use a Google account, Microsoft account, or email to create a new account.
- Complete the sign-up process (e.g., create a password, verify your email, etc.).
NOTE: If you are new to OpenAI, please review the usage guidelines (https://beta.openai.com/docs/usage-guidelines).
- In the top-right corner click on your account. Click on
View API keys
. - Click
+ Create new secret key
. Copy the generated key and save it in a secure location for later.
If you are curious to play with the large language models directly, check out the https://platform.openai.com/playground?mode=chat at the top of the page after logging in to https://aka.ms/maker/openai.
If you want to use Azure OpenAI instead of OpenAI directly, look at src/ConversationalSpeaker/HostedServices/HostedService.cs and follow the instructions in the constructor to switch between the two.
- On the Raspberry Pi or your PC, open a command-line terminal.
- Install .NET 8 SDK.
- For Raspberry Pi and Linux:
After installation is complete (it may take a few minutes), add dotnet to the command search paths.
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 8.0
Verify dotnet was installed successfully by checking the version.echo 'export DOTNET_ROOT=$HOME/.dotnet' >> ~/.bashrc echo 'export PATH=$PATH:$HOME/.dotnet' >> ~/.bashrc source ~/.bashrc
dotnet --version
- For Windows, go to https://aka.ms/maker/dotnet/download, under .NET 8.0 click
Download .NET SDK x64
, and run the installer.
- For Raspberry Pi and Linux:
- Clone the repo.
git clone https://github.com/microsoft/conversational-speaker.git
- Set your API keys: Replace
{MyCognitiveServicesKey}
with your Azure Cognitive Services key and{MyOpenAIKey}
with your OpenAI API key from the sections above.cd ~/conversational-speaker/src/ConversationalSpeaker dotnet user-secrets set "AzureCognitiveServices:Key" "{MyCognitiveServicesKey}" dotnet user-secrets set "AzureCognitiveServices:Region" "{MyCognitiveServicesRegion (e.g., EastUS)}" dotnet user-secrets set "OpenAI:Key" "{MyOpenAIKey}"
- Build and run the code!
cd ~/conversational-speaker/src/ConversationalSpeaker dotnet build dotnet run
There are several ways to run a program when the Raspberry Pi boots. Below is a suggested method which runs the application in a visible terminal window automatically. This allows you to not only see the output but also cancel the application by clicking on the terminal window and pressing CTRL+C.
- Create a file
/etc/xdg/autostart/friendbot.desktop
sudo nano /etc/xdg/autostart/friendbot.desktop
- Put the following content into the file.
Press CTRL+O to save the file and CTRL+X to exit. This will run the application in a terminal window after the Raspberry Pi has finished booting.
[Desktop Entry] Exec=lxterminal --command "/bin/bash -c '~/.dotnet/dotnet run --project ~/conversational-speaker/src/ConversationalSpeaker; /bin/bash'"
- To test out the changes by rebooting.
reboot
The code base has a default wake phrase ("Hey, Computer."
) already, which I suggest you use first. If you want to create your own (free!) custom wake word, then follow the steps below.
- Create a custom keyword model using the directions here: https://aka.ms/hackster/microsoft/wakeword.
- Download the model, extract the
.table
file and copy it tosrc/ConversationalSpeaker/Handlers/WakePhrases
. - Update
ConversationalSpekaer.csproj
file to include your wake phrase file in the build.<ItemGroup> <None Update="Handlers\WakePhrases\{YOUR FILE}.table"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </None> </ItemGroup>
- Rebuild and run the project to use your custom wake word.
- To start a new conversation, say "Start a new conversation".
- To set context, the following phrase is recommended: "Hello, my name is <your name> and I live in <your location>."
- Continue conversing!
- For more usage settings, view
~/conversational-speaker/src/ConversationalSpeaker/configuration.json
.- Change the AI's name (
PromptEngine:OutputPrefix
), - Change the AI's voice (
AzureCognitiveServices:SpeechSynthesisVoiceName
) - Change the AI's personality (
PromptEngine:Description
) - Switch to text input by changing the
System:TextListener
totrue
(good for testing changes).
- Change the AI's name (
- Take a look at Azure Cognitive Service's style support page to see which languages support which emotional styles, then play with the
AzureCognitiveServices:SpeechSynthesisVoiceName
andPromptEngine
settings insrc/ConverstationalSpeaker/configuration.json
.
This application uses .NET's generic "HostBuilder" paradigm. The HostBuilder encapsulates handling dependencies (i.e., dependency injection), configuration, logging, and running a set of hosted services. In this example, there is only one hosted service, ConversationLoopHostedService
, which contains the primary logic loop.
// ConversationLoopHostedService.cs
while (!cancellationToken.IsCancellationRequested)
{
// Listen to the user.
string userMessage = await _listener.ListenAsync(cancellationToken);
// Run the message through the AI and get a response.
string response = await _conversationHandler.ProcessAsync(userMessage, cancellationToken);
// Speak the response.
await _speaker.SpeakAsync(response, cancellationToken);
}
Azure Cognitive Service's has an excellent (and free!) wake word support. After generating a keyword model (see "Create a custom wake word" above), we load it into the speech SDK and wait for the system to recognize the keyword.
// AzCognitiveServicesWakeWordListener.cs
_keywordModel = KeywordRecognitionModel.FromFile(keywordModelPath);
_audioConfig = AudioConfig.FromDefaultMicrophoneInput();
_keywordRecognizer = new KeywordRecognizer(_audioConfig);
do
{
result = await _keywordRecognizer.RecognizeOnceAsync(_keywordModel);
} while (result.Reason != ResultReason.RecognizedKeyword);
To listen to the user, the application leverages Azure Cognitive Service's speech-to-text feature. The feature supports many languages and configurations. This project's default language is english (en-US
) and uses the default system microphone.
// AzCognitiveServicesListener.cs
// Configure the connection to Azure.
SpeechConfig speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
speechConfig.SpeechRecognitionLanguage = _options.SpeechRecognitionLanguage;
speechConfig.SetProperty(PropertyId.SpeechServiceResponse_PostProcessingOption, "TrueText");
// Configure the local audio setup
_audioConfig = AudioConfig.FromDefaultMicrophoneInput();
_speechRecognizer = new SpeechRecognizer(speechConfig, _audioConfig);
And last, but not least, we head back to Azure Cognitive Services for its text-to-speech feature to give a voice to our AI. Since we are parsing out a style cue from OpenAI, we'll need to use the text-to-speech's Speech Synthesis Markup Language (SSML) support.
// AzCognitiveServicesSpeaker.cs
SpeechConfig speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
speechConfig.SpeechSynthesisVoiceName = _options.SpeechSynthesisVoiceName;
_speechSynthesizer = new SpeechSynthesizer(speechConfig);
message = ExtractStyle(message, out string style);
string ssml = GenerateSsml(message, style, _options.SpeechSynthesisVoiceName);
await _speechSynthesizer.SpeakSsmlAsync(ssml);
In the case of speaking "That's great to hear! ~~excited~~"
, the SSML sent to Azure Cognitive Services would like like this:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<mstts:express-as style="excited">That's great to hear!</mstts:express-as>
</voice>
</speak>
This can occur when the Azure Speech SDK is having trouble accessing your microphone. To get more details on
the issue, enable debug logging by setting the Logging:LogLevel:Default
setting in configution.json
to Debug
and run the application again.
Additionally, make sure your microphone is not being used by another application and is not set to "Do not allow apps to access your microphone".
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.