Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline decode support multi threads #306

Merged
merged 9 commits into from
Sep 19, 2023

Conversation

keanucui
Copy link
Contributor

1 sherpa-onnx offline support pass wav_scp parameter and multi thread decoding;
2 fix a bug.

@@ -80,6 +80,7 @@ if(SHERPA_ONNX_ENABLE_CHECK)
endif()

add_library(sherpa-onnx-core ${sources})
target_link_libraries(sherpa-onnx-core -pthread)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_link_libraries(sherpa-onnx-core -pthread)
if(WIN32)
target_link_libraries(sherpa-onnx-core -pthread)
endif()

We don't need it for Windows, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(not WIN32)

Conditional statement of if branch should be not WIN32 if it don't need it for Windows ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes you are right. Should be

if(NOT WIN32)
...
endif()

@@ -4,14 +4,139 @@

#include <stdio.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a new file sherpa-onnx-offline-parallel.cc and keep sherpa-onnx-offline.cc untouched?

std::mutex mtx;

std::vector<std::vector<std::string>> split_to_batchsize(
std::vector<std::string> input, int batch_size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::string> input, int batch_size) {
const std::vector<std::string> &input, int32_t batch_size) {

std::vector<std::string> input, int batch_size) {
std::vector<std::vector<std::string>> outputs;
auto itr = input.cbegin();
int process_num = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int process_num = 0;
int32_t process_num = 0;

auto itr = input.cbegin();
int process_num = 0;

while (process_num + batch_size <= static_cast<unsigned>(input.size())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (process_num + batch_size <= static_cast<unsigned>(input.size())) {
while (process_num + batch_size <= static_cast<int32_t>(input.size())) {

std::istringstream iss(line);
std::string column1, column2;
iss >> column1 >> column2;
wav_paths.emplace_back(column2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wav_paths.emplace_back(column2);
wav_paths.emplace_back(std::move(column2));

while(std::getline(in, line))
{
std::istringstream iss(line);
std::string column1, column2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move colum1 and column2 to the outside of this while loop.

Also, please use clang-format to reformat your code.

}


void asr_inference(std::vector<std::vector<std::string>> chunk_wav_paths,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void asr_inference(std::vector<std::vector<std::string>> chunk_wav_paths,
void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths,

Comment on lines 88 to 89
std::vector<sherpa_onnx::OfflineStream *>().swap(ss_pointers);
std::vector<std::unique_ptr<sherpa_onnx::OfflineStream>>().swap(ss);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<sherpa_onnx::OfflineStream *>().swap(ss_pointers);
std::vector<std::unique_ptr<sherpa_onnx::OfflineStream>>().swap(ss);
ss_pointers.clear();
ss.clear();

if (chunk >= chunk_wav_paths.size()) {
break;
}
auto wav_paths = chunk_wav_paths[chunk];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto wav_paths = chunk_wav_paths[chunk];
const auto &wav_paths = chunk_wav_paths[chunk];

@keanucui
Copy link
Contributor Author

@csukuangfj I pulled the new version and there are still some errors in CI about websocket. Would you check it, please?

@csukuangfj
Copy link
Collaborator

@csukuangfj I pulled the new version and there are still some errors in CI about websocket. Would you check it, please?

Please just ignore them as long as there are similar tests that don't fail.

@@ -0,0 +1,292 @@
// sherpa-onnx/csrc/sherpa-onnx-offline.cc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// sherpa-onnx/csrc/sherpa-onnx-offline.cc
// sherpa-onnx/csrc/sherpa-onnx-offline-parallel.cc

@@ -0,0 +1,292 @@
// sherpa-onnx/csrc/sherpa-onnx-offline.cc
//
// Copyright (c) 2022-2023 Xiaomi Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your own information here if possible.

std::ifstream in(wav_scp_path);
if (!in.is_open()) {
fprintf(stderr, "Failed to open file: %s.\n", wav_scp_path.c_str());
return wav_paths;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return wav_paths;
exit(-1);

We can just exit on errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when return a empty vector, the main function will exit normally. may be it's a better way?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's up to you.

iss >> column1 >> column2;
wav_paths.emplace_back(std::move(column2));
}
in.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in.close();

Not necessary as it is called automatically in the destructor
of std::ifstream.

const std::vector<float> samples =
sherpa_onnx::ReadWave(wav_filename, &sampling_rate, &is_ok);
if (!is_ok) {
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
continue;

Use continue to skip this wave.

const std::vector<float> samples =
sherpa_onnx::ReadWave(wav_filename, &sampling_rate, &is_ok);
if (!is_ok) {
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str());
continue;


See https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/index.html

./bin/sherpa-onnx-offline \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the help information

sherpa_onnx::ParseOptions po(kUsageMessage);
sherpa_onnx::OfflineRecognizerConfig config;
config.Register(&po);
po.Register("wav-scp", &wav_scp,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update help to include --wav-scp and --batch-size

std::atomic<int> wav_index(0);
std::mutex mtx;

std::vector<std::vector<std::string>> split_to_batchsize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::vector<std::string>> split_to_batchsize(
std::vector<std::vector<std::string>> SplitToBatches(

return outputs;
}

std::vector<std::string> load_scp_file(const std::string &wav_scp_path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::string> load_scp_file(const std::string &wav_scp_path) {
std::vector<std::string> LoadScpFile(const std::string &wav_scp_path) {

return wav_paths;
}

void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths,
void AsrInference(const std::vector<std::vector<std::string>> &chunk_wav_paths,

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

@csukuangfj csukuangfj merged commit bd173b2 into k2-fsa:master Sep 19, 2023
133 of 142 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants