Skip to content

Commit

Permalink
Add C++ microphone examples for audio tagging (#749)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored Apr 10, 2024
1 parent f20291c commit 042976e
Show file tree
Hide file tree
Showing 24 changed files with 706 additions and 60 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test-build-wheel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
export PATH=/c/hostedtoolcache/windows/Python/3.8.10/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.9.13/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.10.11/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.8/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.9/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.12.2/x64/bin:$PATH
which sherpa-onnx
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-pip-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
export PATH=/c/hostedtoolcache/windows/Python/3.8.10/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.9.13/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.10.11/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.8/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.9/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.12.2/x64/bin:$PATH
sherpa-onnx --help
Expand Down
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,48 @@

This repository supports running the following functions **locally**

- Speech-to-text (i.e., ASR)
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., [silero-vad](https://github.com/snakers4/silero-vad))

on the following platforms and operating systems:

- Linux, macOS, Windows
- Android
- x86, ``x86_64``, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
- Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- Raspberry Pi
- NodeJS
- WebAssembly
- [Raspberry Pi](https://www.raspberrypi.com/)
- [RV1126](https://www.rock-chips.com/uploads/pdf/2022.8.26/191/RV1126%20Brief%20Datasheet.pdf)
- [LicheePi4A](https://sipeed.com/licheepi4a)
- [VisionFive 2](https://www.starfivetech.com/en/site/boards)
- [旭日X3派](https://developer.horizon.ai/api/v1/fileData/documents_pi/index.html)
- etc

with the following APIs

- C++
- C
- Python
- Go
- ``C#``
- Javascript
- Java
- Kotlin
- Swift

# Useful links

- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- APK for the text-to-speech engine: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
- APK for speaker identification: https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
- APK for speech recognition: https://github.com/k2-fsa/sherpa-onnx/releases/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

# How to reach us

Expand Down
16 changes: 12 additions & 4 deletions android/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,22 @@ for usage.
- [SherpaOnnx](./SherpaOnnx) It uses a streaming ASR model.

- [SherpaOnnx2Pass](./SherpaOnnx2Pass) It uses a streaming ASR model
for the first pass and use a non-streaming ASR model for the second pass.
for the first pass and use a non-streaming ASR model for the second pass

- [SherpaOnnxVad](./SherpaOnnxVad) It demonstrates how to use a VAD
- [SherpaOnnxKws](./SherpaOnnxKws) It demonstrates how to use keyword spotting

- [SherpaOnnxVadAsr](./SherpaOnnxVadAsr) It uses a VAD with a non-streaming
ASR model.
- [SherpaOnnxSpeakerIdentification](./SherpaOnnxSpeakerIdentification) It demonstrates
how to use speaker identification

- [SherpaOnnxTts](./SherpaOnnxTts) It is for standalone text-to-speech.

- [SherpaOnnxTtsEngine](./SherpaOnnxTtsEngine) It is for text-to-speech engine;
you can use it to replace the system TTS engine.

- [SherpaOnnxVad](./SherpaOnnxVad) It demonstrates how to use a VAD

- [SherpaOnnxVadAsr](./SherpaOnnxVadAsr) It uses a VAD with a non-streaming
ASR model.

- [SherpaOnnxWebSocket](./SherpaOnnxWebSocket) It shows how to write a websocket
client for the Python streaming websocket server.
2 changes: 1 addition & 1 deletion c-api-examples/asr-microphone-example/c-api-alsa.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
Expand Down
2 changes: 2 additions & 0 deletions cmake/cmake_extension.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def get_binaries():
"sherpa-onnx-keyword-spotter",
"sherpa-onnx-microphone",
"sherpa-onnx-microphone-offline",
"sherpa-onnx-microphone-offline-audio-tagging",
"sherpa-onnx-microphone-offline-speaker-identification",
"sherpa-onnx-offline",
"sherpa-onnx-offline-language-identification",
Expand All @@ -69,6 +70,7 @@ def get_binaries():
"sherpa-onnx-alsa-offline-speaker-identification",
"sherpa-onnx-offline-tts-play-alsa",
"sherpa-onnx-vad-alsa",
"sherpa-onnx-alsa-offline-audio-tagging",
]

if is_windows():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
Expand Down
2 changes: 1 addition & 1 deletion python-api-examples/vad-alsa.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
Expand Down
2 changes: 1 addition & 1 deletion python-api-examples/vad-remove-non-speech-segments-alsa.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
Expand Down
8 changes: 8 additions & 0 deletions sherpa-onnx/csrc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ if(SHERPA_ONNX_HAS_ALSA AND SHERPA_ONNX_ENABLE_BINARY)
add_executable(sherpa-onnx-alsa-offline sherpa-onnx-alsa-offline.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-speaker-identification sherpa-onnx-alsa-offline-speaker-identification.cc alsa.cc)
add_executable(sherpa-onnx-vad-alsa sherpa-onnx-vad-alsa.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-audio-tagging sherpa-onnx-alsa-offline-audio-tagging.cc alsa.cc)


if(SHERPA_ONNX_ENABLE_TTS)
Expand All @@ -276,6 +277,7 @@ if(SHERPA_ONNX_HAS_ALSA AND SHERPA_ONNX_ENABLE_BINARY)
sherpa-onnx-alsa-offline-speaker-identification
sherpa-onnx-keyword-spotter-alsa
sherpa-onnx-vad-alsa
sherpa-onnx-alsa-offline-audio-tagging
)

if(SHERPA_ONNX_ENABLE_TTS)
Expand Down Expand Up @@ -354,6 +356,11 @@ if(SHERPA_ONNX_ENABLE_PORTAUDIO AND SHERPA_ONNX_ENABLE_BINARY)
microphone.cc
)

add_executable(sherpa-onnx-microphone-offline-audio-tagging
sherpa-onnx-microphone-offline-audio-tagging.cc
microphone.cc
)

if(BUILD_SHARED_LIBS)
set(PA_LIB portaudio)
else()
Expand All @@ -365,6 +372,7 @@ if(SHERPA_ONNX_ENABLE_PORTAUDIO AND SHERPA_ONNX_ENABLE_BINARY)
sherpa-onnx-keyword-spotter-microphone
sherpa-onnx-microphone-offline
sherpa-onnx-microphone-offline-speaker-identification
sherpa-onnx-microphone-offline-audio-tagging
sherpa-onnx-vad-microphone
sherpa-onnx-vad-microphone-offline-asr
)
Expand Down
2 changes: 1 addition & 1 deletion sherpa-onnx/csrc/alsa.cc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
Expand Down
190 changes: 190 additions & 0 deletions sherpa-onnx/csrc/sherpa-onnx-alsa-offline-audio-tagging.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
// sherpa-onnx/csrc/sherpa-onnx-alsa-offline-audio-tagging.cc
//
// Copyright (c) 2022-2024 Xiaomi Corporation

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#include <algorithm>
#include <mutex> // NOLINT
#include <thread> // NOLINT

#include "sherpa-onnx/csrc/alsa.h"
#include "sherpa-onnx/csrc/audio-tagging.h"
#include "sherpa-onnx/csrc/macros.h"

enum class State {
kIdle,
kRecording,
kDecoding,
};

State state = State::kIdle;

// true to stop the program and exit
bool stop = false;

std::vector<float> samples;
std::mutex samples_mutex;

static void DetectKeyPress() {
SHERPA_ONNX_LOGE("Press Enter to start");
int32_t key;
while (!stop && (key = getchar())) {
if (key != 0x0a) {
continue;
}

switch (state) {
case State::kIdle:
SHERPA_ONNX_LOGE("Start recording. Press Enter to stop recording");
state = State::kRecording;
{
std::lock_guard<std::mutex> lock(samples_mutex);
samples.clear();
}
break;
case State::kRecording:
SHERPA_ONNX_LOGE("Stop recording. Decoding ...");
state = State::kDecoding;
break;
case State::kDecoding:
break;
}
}
}

static void Record(const char *device_name, int32_t expected_sample_rate) {
sherpa_onnx::Alsa alsa(device_name);

if (alsa.GetExpectedSampleRate() != expected_sample_rate) {
fprintf(stderr, "sample rate: %d != %d\n", alsa.GetExpectedSampleRate(),
expected_sample_rate);
exit(-1);
}

int32_t chunk = 0.1 * alsa.GetActualSampleRate();
while (!stop) {
const std::vector<float> &s = alsa.Read(chunk);
std::lock_guard<std::mutex> lock(samples_mutex);
samples.insert(samples.end(), s.begin(), s.end());
}
}

static void Handler(int32_t sig) {
stop = true;
fprintf(stderr, "\nCaught Ctrl + C. Press Enter to exit\n");
}

int32_t main(int32_t argc, char *argv[]) {
signal(SIGINT, Handler);

const char *kUsageMessage = R"usage(
Audio tagging from microphone (Linux only).
Usage:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
tar xvf sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
rm sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
./bin/sherpa-onnx-alsa-offline-audio-tagging \
--zipformer-model=./sherpa-onnx-zipformer-audio-tagging-2024-04-09/model.onnx \
--labels=./sherpa-onnx-zipformer-audio-tagging-2024-04-09/class_labels_indices.csv \
device_name
Please refer to
https://github.com/k2-fsa/sherpa-onnx/releases/tag/audio-tagging-models
for a list of pre-trained models to download.
The device name specifies which microphone to use in case there are several
on your system. You can use
arecord -l
to find all available microphones on your computer. For instance, if it outputs
**** List of CAPTURE Hardware Devices ****
card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
as the device_name.
)usage";

sherpa_onnx::ParseOptions po(kUsageMessage);
sherpa_onnx::AudioTaggingConfig config;
config.Register(&po);

po.Read(argc, argv);
if (po.NumArgs() != 1) {
fprintf(stderr, "Please provide only 1 argument: the device name\n");
po.PrintUsage();
exit(EXIT_FAILURE);
}

fprintf(stderr, "%s\n", config.ToString().c_str());

if (!config.Validate()) {
fprintf(stderr, "Errors in config!\n");
return -1;
}

SHERPA_ONNX_LOGE("Creating audio tagger ...");
sherpa_onnx::AudioTagging tagger(config);
SHERPA_ONNX_LOGE("Audio tagger created created!");

std::string device_name = po.GetArg(1);
fprintf(stderr, "Use recording device: %s\n", device_name.c_str());

int32_t sample_rate = 16000; // fixed to 16000Hz for all models from icefall

std::thread t2(Record, device_name.c_str(), sample_rate);
using namespace std::chrono_literals; // NOLINT
std::this_thread::sleep_for(100ms); // sleep for 100ms
std::thread t(DetectKeyPress);

while (!stop) {
switch (state) {
case State::kIdle:
break;
case State::kRecording:
break;
case State::kDecoding: {
std::vector<float> buf;
{
std::lock_guard<std::mutex> lock(samples_mutex);
buf = std::move(samples);
}
SHERPA_ONNX_LOGE("Computing...");
auto s = tagger.CreateStream();
s->AcceptWaveform(sample_rate, buf.data(), buf.size());
auto results = tagger.Compute(s.get());
SHERPA_ONNX_LOGE("Result is:");

int32_t i = 0;
std::ostringstream os;
for (const auto &event : results) {
os << i << ": " << event.ToString() << "\n";
i += 1;
}

SHERPA_ONNX_LOGE("\n%s\n", os.str().c_str());

state = State::kIdle;
SHERPA_ONNX_LOGE("Press Enter to start");
break;
}
}

std::this_thread::sleep_for(20ms); // sleep for 20ms
}
t.join();
t2.join();

return 0;
}
Loading

0 comments on commit 042976e

Please sign in to comment.