In moments of a disaster such as fires, floods, and shootings, victims need first response care as soon as possible. For example, the 2018 ‘Camp Fire’ in Northern California wiped out the town of Paradise, killing 85 people, and causing $16.5 billion in damages. In this case, and in many similar situations, the difference between a few seconds in response time meant not only lives saved, but millions and even billions of dollars salvaged in the potential destruction of valuable assets.
How can we develop a solution that will reduce the response time of first responders, and provide a positive impact for those who need it most? Well, when a victim is normally in danger, they call 911 and begin a dialogue with a dispatcher. While this conversation is ongoing, the Watson Danger Response Tool will screen the conversation for select dangers including fires and floods. If one of those dangers are identified, the appropriate authorities will be immediately alerted, thus shaving off precious seconds and sometimes even minutes off the response time. In this code pattern, using the power of Watson AI, we will create this danger response web app.
Below is a demo of the final product to inspire you to complete the code pattern!
- Audio input is captured by Watson Speech-to-Text service.
- Once input text is transcribed, it is sent to the Watson Natural Language Understanding (NLU) service.
- Within the NLU, a custom Knowledge Studio machine learning model processes the text for danger.
- The danger analysis from the machine learning model is then sent to the NLU.
- The final output of the NLU is presented, and decision is made whether or not to alert the authorities.
Built with React components and a Node.js server, the app will capture audio input and stream it to a Watson Speech to Text service. After the input speech is transcribed, it will be sent to a Watson Natural Language Understanding service that will identify, categorize, and score the danger threat in the text. Both the input speech and the danger analysis will be displayed in the app.
The key aspect of this tool is the NLU. Using Knowledge Studio, we will train a custom machine learning model for the ‘relations’ feature of the Watson NLU in order to drive the decision-making process of identifying the danger. We will create an ‘entity’ for the danger itself, and corresponding subcategories for each of the dangers-of-interest, for example ‘fire’ and ‘flood’. We will also create an ‘entity’ for the object that the danger is acting on.
The model will then identify and categorize the danger by reaching a minimum confidence threshold for a particular subcategory of danger. Additionally, the model will determine the severity of the danger, by analyzing the strength of the relation between the danger and object entities.
When you have completed this code pattern, you will understand how to:
- Stream audio to Speech to Text using a WebSocket
- Use Natural Language Understanding with a REST API
- Retrieve and parse text from Speech to Text using a REST API
- Integrate Speech to Text, Natural Language Understanding, and Knowledge Studio in a web app
- Use React components and a Node.js server
NOTE: This code pattern includes instructions for running Watson services on IBM Cloud or with the Watson API Kit on IBM Cloud Pak for Data. Click here for more information about IBM Cloud Pak for Data.
Provision the following services:
- Speech to Text
- Natural Language Understanding
The instructions will depend on whether you are provisioning services using IBM Cloud Pak for Data or on IBM Cloud.
Click to expand one:
IBM Cloud Pak for Data
Use the following instructions for each of the three services.
The services are not available by default. An administrator must install them on the IBM Cloud Pak for Data platform, and you must be given access to the service. To determine whether the service is installed, Click the Services icon () and check whether the service is enabled.
- For production use, create a user to use for authentication. From the main navigation menu (☰), select Administer > Manage users and then + New user.
- From the main navigation menu (☰), select My instances.
- On the Provisioned instances tab, find your service instance, and then hover over the last column to find and click the ellipses icon. Choose View details.
- Copy the URL to use as the {SERVICE_NAME}_URL when you configure credentials.
- Optionally, copy the Bearer token to use in development testing only. It is not recommended to use the bearer token except during testing and development because that token does not expire.
- Use the Menu and select Users and + Add user to grant your user access to this service instance. This is the user name (and password) you will use when you configure credentials to allow the Node.js server to authenticate.
IBM Cloud
- If you do not have an IBM Cloud account, register for a free trial account here.
- Click here to create a Speech to Text instance.
- Click here to create a Natural Language Understanding instance.
- From the main navigation menu (☰), select Resource list to find your services under Services.
- Click on each service to find the Manage view where you can collect the API Key and URL to use for each service when you configure credentials.
If you would like to run this pattern locally, without any cloud services, then all you need is VSCode and the IBM Credentials
- VSCode
- Node v12.18.2
- Run 'npm-install' in VSCode Terminal
NOTE: The app was developed using Chrome on macOS. Browser compatibility issues are still being worked out.
-
Browse to your app URL
- Use the URL provided at the end of your selected deployment option.
-
Select the language of your choice
- The drop-down will be populated with models supported by your Speech to Text service.
-
Use the Speech to Text toggle
- Use the
Speak Here
button (which becomesStop Listening
) to begin recording audio and streaming it to Speech to Text. Press the button again to stop listening/streaming.
- Use the
-
Use the Detect Danger toggle
- Use the
Detect Danger
button (which becomesDetecting Danger
) to begin running the NLU and executing danger analysis. Press the button again to stop when output has been printed.
- Use the
-
Resetting the transcribed text
-
The transcribed text will be cleared when you do any of the following:
- Press
Speech to Text
to restart listening - Refresh the page
- Press
-
- With additional time and resources, here are some features that can be implemented to supplement the current code pattern.
-
Maps API - Parse the location of the danger incident using ‘Entities’ NLU feature and pin the location on a visual map.
-
Database - Store voice inputs in a database that can be used to periodically retrain and improve the model over time.
-
User Log - Using ‘Concepts’ NLU feature, summarize important information like location, time, and environment from voice input. This summary can be passed on to first responders so they are prepared, even before they approach the danger.
-
Diversify Input Sources - Scrape social media forums (twitter, facebook) and other miscellaneous sources (ex: police radio) for danger statements and analyze accordingly.
-
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.