Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ microphone examples for audio tagging #749

Merged
merged 8 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test-build-wheel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
export PATH=/c/hostedtoolcache/windows/Python/3.8.10/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.9.13/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.10.11/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.8/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.9/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.12.2/x64/bin:$PATH

which sherpa-onnx
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-pip-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
export PATH=/c/hostedtoolcache/windows/Python/3.8.10/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.9.13/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.10.11/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.8/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.11.9/x64/bin:$PATH
export PATH=/c/hostedtoolcache/windows/Python/3.12.2/x64/bin:$PATH

sherpa-onnx --help
Expand Down
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,48 @@

This repository supports running the following functions **locally**

- Speech-to-text (i.e., ASR)
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., [silero-vad](https://github.com/snakers4/silero-vad))

on the following platforms and operating systems:

- Linux, macOS, Windows
- Android
- x86, ``x86_64``, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
- Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- Raspberry Pi
- NodeJS
- WebAssembly
- [Raspberry Pi](https://www.raspberrypi.com/)
- [RV1126](https://www.rock-chips.com/uploads/pdf/2022.8.26/191/RV1126%20Brief%20Datasheet.pdf)
- [LicheePi4A](https://sipeed.com/licheepi4a)
- [VisionFive 2](https://www.starfivetech.com/en/site/boards)
- [旭日X3派](https://developer.horizon.ai/api/v1/fileData/documents_pi/index.html)
- etc

with the following APIs

- C++
- C
- Python
- Go
- ``C#``
- Javascript
- Java
- Kotlin
- Swift

# Useful links

- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- APK for the text-to-speech engine: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
- APK for speaker identification: https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
- APK for speech recognition: https://github.com/k2-fsa/sherpa-onnx/releases/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

# How to reach us

Expand Down
16 changes: 12 additions & 4 deletions android/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,22 @@ for usage.
- [SherpaOnnx](./SherpaOnnx) It uses a streaming ASR model.

- [SherpaOnnx2Pass](./SherpaOnnx2Pass) It uses a streaming ASR model
for the first pass and use a non-streaming ASR model for the second pass.
for the first pass and use a non-streaming ASR model for the second pass

- [SherpaOnnxVad](./SherpaOnnxVad) It demonstrates how to use a VAD
- [SherpaOnnxKws](./SherpaOnnxKws) It demonstrates how to use keyword spotting

- [SherpaOnnxVadAsr](./SherpaOnnxVadAsr) It uses a VAD with a non-streaming
ASR model.
- [SherpaOnnxSpeakerIdentification](./SherpaOnnxSpeakerIdentification) It demonstrates
how to use speaker identification

- [SherpaOnnxTts](./SherpaOnnxTts) It is for standalone text-to-speech.

- [SherpaOnnxTtsEngine](./SherpaOnnxTtsEngine) It is for text-to-speech engine;
you can use it to replace the system TTS engine.

- [SherpaOnnxVad](./SherpaOnnxVad) It demonstrates how to use a VAD

- [SherpaOnnxVadAsr](./SherpaOnnxVadAsr) It uses a VAD with a non-streaming
ASR model.

- [SherpaOnnxWebSocket](./SherpaOnnxWebSocket) It shows how to write a websocket
client for the Python streaming websocket server.
2 changes: 1 addition & 1 deletion c-api-examples/asr-microphone-example/c-api-alsa.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

Expand Down
2 changes: 2 additions & 0 deletions cmake/cmake_extension.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def get_binaries():
"sherpa-onnx-keyword-spotter",
"sherpa-onnx-microphone",
"sherpa-onnx-microphone-offline",
"sherpa-onnx-microphone-offline-audio-tagging",
"sherpa-onnx-microphone-offline-speaker-identification",
"sherpa-onnx-offline",
"sherpa-onnx-offline-language-identification",
Expand All @@ -69,6 +70,7 @@ def get_binaries():
"sherpa-onnx-alsa-offline-speaker-identification",
"sherpa-onnx-offline-tts-play-alsa",
"sherpa-onnx-vad-alsa",
"sherpa-onnx-alsa-offline-audio-tagging",
]

if is_windows():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

Expand Down
2 changes: 1 addition & 1 deletion python-api-examples/vad-alsa.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

Expand Down
2 changes: 1 addition & 1 deletion python-api-examples/vad-remove-non-speech-segments-alsa.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def get_args():
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

Expand Down
8 changes: 8 additions & 0 deletions sherpa-onnx/csrc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ if(SHERPA_ONNX_HAS_ALSA AND SHERPA_ONNX_ENABLE_BINARY)
add_executable(sherpa-onnx-alsa-offline sherpa-onnx-alsa-offline.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-speaker-identification sherpa-onnx-alsa-offline-speaker-identification.cc alsa.cc)
add_executable(sherpa-onnx-vad-alsa sherpa-onnx-vad-alsa.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-audio-tagging sherpa-onnx-alsa-offline-audio-tagging.cc alsa.cc)


if(SHERPA_ONNX_ENABLE_TTS)
Expand All @@ -276,6 +277,7 @@ if(SHERPA_ONNX_HAS_ALSA AND SHERPA_ONNX_ENABLE_BINARY)
sherpa-onnx-alsa-offline-speaker-identification
sherpa-onnx-keyword-spotter-alsa
sherpa-onnx-vad-alsa
sherpa-onnx-alsa-offline-audio-tagging
)

if(SHERPA_ONNX_ENABLE_TTS)
Expand Down Expand Up @@ -354,6 +356,11 @@ if(SHERPA_ONNX_ENABLE_PORTAUDIO AND SHERPA_ONNX_ENABLE_BINARY)
microphone.cc
)

add_executable(sherpa-onnx-microphone-offline-audio-tagging
sherpa-onnx-microphone-offline-audio-tagging.cc
microphone.cc
)

if(BUILD_SHARED_LIBS)
set(PA_LIB portaudio)
else()
Expand All @@ -365,6 +372,7 @@ if(SHERPA_ONNX_ENABLE_PORTAUDIO AND SHERPA_ONNX_ENABLE_BINARY)
sherpa-onnx-keyword-spotter-microphone
sherpa-onnx-microphone-offline
sherpa-onnx-microphone-offline-speaker-identification
sherpa-onnx-microphone-offline-audio-tagging
sherpa-onnx-vad-microphone
sherpa-onnx-vad-microphone-offline-asr
)
Expand Down
2 changes: 1 addition & 1 deletion sherpa-onnx/csrc/alsa.cc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and the device 0 on that card, please use:
and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

Expand Down
190 changes: 190 additions & 0 deletions sherpa-onnx/csrc/sherpa-onnx-alsa-offline-audio-tagging.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
// sherpa-onnx/csrc/sherpa-onnx-alsa-offline-audio-tagging.cc
//
// Copyright (c) 2022-2024 Xiaomi Corporation

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#include <algorithm>
#include <mutex> // NOLINT
#include <thread> // NOLINT

#include "sherpa-onnx/csrc/alsa.h"
#include "sherpa-onnx/csrc/audio-tagging.h"
#include "sherpa-onnx/csrc/macros.h"

enum class State {
kIdle,
kRecording,
kDecoding,
};

State state = State::kIdle;

// true to stop the program and exit
bool stop = false;

std::vector<float> samples;
std::mutex samples_mutex;

static void DetectKeyPress() {
SHERPA_ONNX_LOGE("Press Enter to start");
int32_t key;
while (!stop && (key = getchar())) {
if (key != 0x0a) {
continue;
}

switch (state) {
case State::kIdle:
SHERPA_ONNX_LOGE("Start recording. Press Enter to stop recording");
state = State::kRecording;
{
std::lock_guard<std::mutex> lock(samples_mutex);
samples.clear();
}
break;
case State::kRecording:
SHERPA_ONNX_LOGE("Stop recording. Decoding ...");
state = State::kDecoding;
break;
case State::kDecoding:
break;
}
}
}

static void Record(const char *device_name, int32_t expected_sample_rate) {
sherpa_onnx::Alsa alsa(device_name);

if (alsa.GetExpectedSampleRate() != expected_sample_rate) {
fprintf(stderr, "sample rate: %d != %d\n", alsa.GetExpectedSampleRate(),
expected_sample_rate);
exit(-1);
}

int32_t chunk = 0.1 * alsa.GetActualSampleRate();
while (!stop) {
const std::vector<float> &s = alsa.Read(chunk);
std::lock_guard<std::mutex> lock(samples_mutex);
samples.insert(samples.end(), s.begin(), s.end());
}
}

static void Handler(int32_t sig) {
stop = true;
fprintf(stderr, "\nCaught Ctrl + C. Press Enter to exit\n");
}

int32_t main(int32_t argc, char *argv[]) {
signal(SIGINT, Handler);

const char *kUsageMessage = R"usage(
Audio tagging from microphone (Linux only).
Usage:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
tar xvf sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
rm sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2

./bin/sherpa-onnx-alsa-offline-audio-tagging \
--zipformer-model=./sherpa-onnx-zipformer-audio-tagging-2024-04-09/model.onnx \
--labels=./sherpa-onnx-zipformer-audio-tagging-2024-04-09/class_labels_indices.csv \
device_name

Please refer to
https://github.com/k2-fsa/sherpa-onnx/releases/tag/audio-tagging-models
for a list of pre-trained models to download.

The device name specifies which microphone to use in case there are several
on your system. You can use

arecord -l

to find all available microphones on your computer. For instance, if it outputs

**** List of CAPTURE Hardware Devices ****
card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

and if you want to select card 3 and device 0 on that card, please use:

plughw:3,0

as the device_name.
)usage";

sherpa_onnx::ParseOptions po(kUsageMessage);
sherpa_onnx::AudioTaggingConfig config;
config.Register(&po);

po.Read(argc, argv);
if (po.NumArgs() != 1) {
fprintf(stderr, "Please provide only 1 argument: the device name\n");
po.PrintUsage();
exit(EXIT_FAILURE);
}

fprintf(stderr, "%s\n", config.ToString().c_str());

if (!config.Validate()) {
fprintf(stderr, "Errors in config!\n");
return -1;
}

SHERPA_ONNX_LOGE("Creating audio tagger ...");
sherpa_onnx::AudioTagging tagger(config);
SHERPA_ONNX_LOGE("Audio tagger created created!");

std::string device_name = po.GetArg(1);
fprintf(stderr, "Use recording device: %s\n", device_name.c_str());

int32_t sample_rate = 16000; // fixed to 16000Hz for all models from icefall

std::thread t2(Record, device_name.c_str(), sample_rate);
using namespace std::chrono_literals; // NOLINT
std::this_thread::sleep_for(100ms); // sleep for 100ms
std::thread t(DetectKeyPress);

while (!stop) {
switch (state) {
case State::kIdle:
break;
case State::kRecording:
break;
case State::kDecoding: {
std::vector<float> buf;
{
std::lock_guard<std::mutex> lock(samples_mutex);
buf = std::move(samples);
}
SHERPA_ONNX_LOGE("Computing...");
auto s = tagger.CreateStream();
s->AcceptWaveform(sample_rate, buf.data(), buf.size());
auto results = tagger.Compute(s.get());
SHERPA_ONNX_LOGE("Result is:");

int32_t i = 0;
std::ostringstream os;
for (const auto &event : results) {
os << i << ": " << event.ToString() << "\n";
i += 1;
}

SHERPA_ONNX_LOGE("\n%s\n", os.str().c_str());

state = State::kIdle;
SHERPA_ONNX_LOGE("Press Enter to start");
break;
}
}

std::this_thread::sleep_for(20ms); // sleep for 20ms
}
t.join();
t2.join();

return 0;
}
Loading
Loading