-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline decode support multi threads #306
Conversation
sherpa-onnx/csrc/CMakeLists.txt
Outdated
@@ -80,6 +80,7 @@ if(SHERPA_ONNX_ENABLE_CHECK) | |||
endif() | |||
|
|||
add_library(sherpa-onnx-core ${sources}) | |||
target_link_libraries(sherpa-onnx-core -pthread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_link_libraries(sherpa-onnx-core -pthread) | |
if(WIN32) | |
target_link_libraries(sherpa-onnx-core -pthread) | |
endif() |
We don't need it for Windows, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(not WIN32)
Conditional statement of if branch should be not WIN32
if it don't need it for Windows ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes you are right. Should be
if(NOT WIN32)
...
endif()
@@ -4,14 +4,139 @@ | |||
|
|||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a new file sherpa-onnx-offline-parallel.cc
and keep sherpa-onnx-offline.cc
untouched?
std::mutex mtx; | ||
|
||
std::vector<std::vector<std::string>> split_to_batchsize( | ||
std::vector<std::string> input, int batch_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<std::string> input, int batch_size) { | |
const std::vector<std::string> &input, int32_t batch_size) { |
std::vector<std::string> input, int batch_size) { | ||
std::vector<std::vector<std::string>> outputs; | ||
auto itr = input.cbegin(); | ||
int process_num = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int process_num = 0; | |
int32_t process_num = 0; |
auto itr = input.cbegin(); | ||
int process_num = 0; | ||
|
||
while (process_num + batch_size <= static_cast<unsigned>(input.size())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (process_num + batch_size <= static_cast<unsigned>(input.size())) { | |
while (process_num + batch_size <= static_cast<int32_t>(input.size())) { |
std::istringstream iss(line); | ||
std::string column1, column2; | ||
iss >> column1 >> column2; | ||
wav_paths.emplace_back(column2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wav_paths.emplace_back(column2); | |
wav_paths.emplace_back(std::move(column2)); |
while(std::getline(in, line)) | ||
{ | ||
std::istringstream iss(line); | ||
std::string column1, column2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move colum1 and column2 to the outside of this while loop.
Also, please use clang-format
to reformat your code.
} | ||
|
||
|
||
void asr_inference(std::vector<std::vector<std::string>> chunk_wav_paths, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void asr_inference(std::vector<std::vector<std::string>> chunk_wav_paths, | |
void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths, |
std::vector<sherpa_onnx::OfflineStream *>().swap(ss_pointers); | ||
std::vector<std::unique_ptr<sherpa_onnx::OfflineStream>>().swap(ss); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<sherpa_onnx::OfflineStream *>().swap(ss_pointers); | |
std::vector<std::unique_ptr<sherpa_onnx::OfflineStream>>().swap(ss); | |
ss_pointers.clear(); | |
ss.clear(); |
if (chunk >= chunk_wav_paths.size()) { | ||
break; | ||
} | ||
auto wav_paths = chunk_wav_paths[chunk]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto wav_paths = chunk_wav_paths[chunk]; | |
const auto &wav_paths = chunk_wav_paths[chunk]; |
…ecoding with parallel;optimize some code
@csukuangfj I pulled the new version and there are still some errors in CI about websocket. Would you check it, please? |
Please just ignore them as long as there are similar tests that don't fail. |
@@ -0,0 +1,292 @@ | |||
// sherpa-onnx/csrc/sherpa-onnx-offline.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// sherpa-onnx/csrc/sherpa-onnx-offline.cc | |
// sherpa-onnx/csrc/sherpa-onnx-offline-parallel.cc |
@@ -0,0 +1,292 @@ | |||
// sherpa-onnx/csrc/sherpa-onnx-offline.cc | |||
// | |||
// Copyright (c) 2022-2023 Xiaomi Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add your own information here if possible.
std::ifstream in(wav_scp_path); | ||
if (!in.is_open()) { | ||
fprintf(stderr, "Failed to open file: %s.\n", wav_scp_path.c_str()); | ||
return wav_paths; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return wav_paths; | |
exit(-1); |
We can just exit on errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when return a empty vector, the main function will exit normally. may be it's a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it's up to you.
iss >> column1 >> column2; | ||
wav_paths.emplace_back(std::move(column2)); | ||
} | ||
in.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in.close(); |
Not necessary as it is called automatically in the destructor
of std::ifstream
.
const std::vector<float> samples = | ||
sherpa_onnx::ReadWave(wav_filename, &sampling_rate, &is_ok); | ||
if (!is_ok) { | ||
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); | |
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); | |
continue; |
Use continue
to skip this wave.
const std::vector<float> samples = | ||
sherpa_onnx::ReadWave(wav_filename, &sampling_rate, &is_ok); | ||
if (!is_ok) { | ||
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); | |
fprintf(stderr, "Failed to read %s\n", wav_filename.c_str()); | |
continue; |
|
||
See https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/index.html | ||
|
||
./bin/sherpa-onnx-offline \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the help information
sherpa_onnx::ParseOptions po(kUsageMessage); | ||
sherpa_onnx::OfflineRecognizerConfig config; | ||
config.Register(&po); | ||
po.Register("wav-scp", &wav_scp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update help to include --wav-scp
and --batch-size
std::atomic<int> wav_index(0); | ||
std::mutex mtx; | ||
|
||
std::vector<std::vector<std::string>> split_to_batchsize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<std::vector<std::string>> split_to_batchsize( | |
std::vector<std::vector<std::string>> SplitToBatches( |
return outputs; | ||
} | ||
|
||
std::vector<std::string> load_scp_file(const std::string &wav_scp_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<std::string> load_scp_file(const std::string &wav_scp_path) { | |
std::vector<std::string> LoadScpFile(const std::string &wav_scp_path) { |
return wav_paths; | ||
} | ||
|
||
void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void asr_inference(const std::vector<std::vector<std::string>> &chunk_wav_paths, | |
void AsrInference(const std::vector<std::vector<std::string>> &chunk_wav_paths, |
Thank you for your contribution! |
1 sherpa-onnx offline support pass wav_scp parameter and multi thread decoding;
2 fix a bug.