This repo contains a number of code samples showing how to use the Azure Speech to Text service from Python. This service is part of a suite of pre-built AI services you can use to add intelligence to your apps in only a few lines of code. These services are known as Azure Cognitive Services.
To use these samples, you will need an Azure subscription, and an Azure Speech to Text resource.
To use Azure services you will need an Azure subscription. If you don't have a subscription you can sign up for free.
-
If you are a student aged 18 and up and have an email address from an academic institution, you can sign up for the free Azure for Students offer at azure.microsoft.com/free/students without a credit card. At the time of writing this gives you $100 of credit to use over 12 months, as well as free tiers of a number of services for that 12 months. At the end of the 12 months, if you are still a student you can renew and get another $100 in credit and 12 months of free services.
-
If you are not a student, you can sign up at azure.microsoft.com/free. You will need a credit card for verification purposes only, you will not be billed unless you decide to upgrade your account to a paid offering. At the time of writing the free account will give you US$200 of free credit to spend on what you like in the first 30 days, 12 months of free services, plus a load of services that have tiers that are always free.
To use the Azure Speech to Text service, you will need to create a resource in your Azure subscription.
The speech to text service has a number of tiers, providing different pricing, with a generous free tier. At the time of writing you can get 5 hours of audio converted from speech to text a month for free. You can only create one free tier per service per subscription.
You can find the latest pricing details on the Speech to Text pricing page.
You can create this resource from the Azure Portal or the Azure CLI.
-
Launch the Azure Portal
-
Log in with the account you used to create your Azure subscription
-
Select + Create a resource from the home screen or the side menu
-
Search for
Speech
and select Speech from the drop downMake sure you select the resource called just Speech, as there are other speech related resources available on Azure.
-
Select Create
-
Fill in the details for the resource:
-
For the name, enter
speech-to-text
followed by the date or your name. Speech resources created through the portal need to have a globally unique name, so you will need something unique. -
Select the subscription you want to use
-
Select the location closest to you. You can see the regions on a map on the Azure Regions page.
-
Select the F0 pricing tier. This is the free tier
-
For the resource group, select the Create new option, and name the resource group
speech-to-text-rg
. Select OK to set the new resource group name.
-
-
Select Create
-
The resource will deploy, and you will get a notification when done. Select Go to resource from the notification.
-
From the resource blade, select Resource Management -> Keys and Endpoint*
-
Make a note of the value of Key 1 and Location
-
If you don't have the Azure CLI installed, install it by following these instructions on Microsoft Docs
-
Sign in to the Azure CLI using the following command:
az login
This will launch a browser window where you can log in with the account you used to create your Azure subscription. Once you are logged in you can close the browser window.
-
If you have more than one Azure subscription (such as a student subscription and a University subscription), ensure you have the correct subscription set.
-
Use the following command to list all your available subscriptions:
az account list \ --output table
-
Set the active subscription using the following command:
az account set --subscription <subscription_id>
Set
<subscription_id>
to the appropriate value from theSubscriptionId
column in the table output by the previous command.
-
-
Azure has multiple regions worldwide. When you create a resource, you select the region. You should select the one closest to you. To see all available regions for your subscription, use the following command:
az account list-locations --output table
Note the value from the
name
column for the location closest to you. You can see the regions on a map on the Azure Regions page. Make a note of this location as you will need it to run the samples. -
Create a resource group to contain your resource. Resource groups are logical groupings of resources that allow you to manage the resources together, for example deleting a resource group to delete all the resources that it contains. Use the following command to do this:
az group create \ --name speech-to-text-rg \ --location <region>
Replace
<region>
with the location closest to you.This will create a resource group called
speech-to-text-rg
. -
Once the resource group has been created, create the Speech to Text resource. Do this using the following command:
az cognitiveservices account create \ --name speech-to-text \ --resource-group speech-to-text-rg \ --kind SpeechServices \ --sku F0 \ --yes \ --location <region>
Replace
<region>
with the location you used to create the resource group.This will create a Speech to Text resource called
speech-to-text
in thespeech-to-text-rg
resource group. This will use the free tier.Unlike using the Azure Portal (above), speech resources created through the Azure CLI don't need a unique name, only unique per resource group.
-
To access this resource from code, you will need a key. You can list the keys using the following command:
az cognitiveservices account keys list \ --name speech-to-text \ --resource-group speech-to-text-rg \ --output table
Take a note of the value from the
Key1
column as you will need it to run the samples.
All these samples will need the location/region name and key for your Speech to Text service. Instructions on how to set up each sample are in the Project Setup Instructions.
- Basic Speech To Text - a very basic example that listens to your microphone and converts whatever it hears into text which is output to the console.
- Translation - a translation sample that listens to your microphone and outputs what it hears to the console in Chinese, English, French, and German.
- Translation with speech - a translation sample that listens to your microphone and outputs what it hears to the console in Chinese, English, French, and German, as well as playing an the Chinese version through your audio device.
- UI Control - a UI application controllable by speech. It has a label showing what you have just said. Say blue, green or black to change the color of the text.
- Learn more with hands-on, self guided learning using the Process and translate speech with Azure Cognitive Speech Services learning path on Microsoft Learn.
- Read more on the Python Speech Service SDK on Microsoft Speech SDK docs.