Have trouble using sherpa-onnx-offline-websocket-server with cuda provider #1053

Vergissmeinicht · 2024-06-24T10:07:59Z

I follow the instruction from (https://k2-fsa.github.io/sherpa/onnx/websocket/offline-websocket.html ) to start a non-streaming websocket server of transducer models. It works well with the client as well. But when I try to run the client in multithread, which means, several thread using websocket client to recognize wav files one by one in the same time, server raises cuda error:

2024-06-24 09:47:01.083093543 [E:onnxruntime:, cuda_call.cc:116 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=a2d9f82c2221 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=408 ; expr=cudaStreamSynchronize(static_cast<cudaStream_t>(stream_)); 2024-06-24 09:47:01.083005575 [E:onnxruntime:, cuda_call.cc:116 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=a2d9f82c2221 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/gpu_data_transfer.cc ; line=73 ; expr=cudaMemcpyAsync(dst_data, src_data, bytes, cudaMemcpyDeviceToHost, static_cast<cudaStream_t>(stream.GetHandle())); terminate called after throwing an instance of 'Ort::Exception' what(): CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=a2d9f82c2221 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=408 ; expr=cudaStreamSynchronize(static_cast<cudaStream_t>(stream_)); Aborted
My server runs on GeForce RTX 4090 / driver 535.104.05 / CUDA version: 12.2.

Glad to have your help.

The text was updated successfully, but these errors were encountered:

csukuangfj · 2024-06-24T10:27:13Z

Does the server work fine when you use CPU?

Vergissmeinicht · 2024-06-24T10:31:26Z

Yes, it works fine when using cpu provider.

csukuangfj · 2024-06-24T10:39:36Z

Could you tell us how you start the server?
Please post the full command.

Vergissmeinicht · 2024-06-24T11:21:24Z

CUDA_VISIBLE_DEVICES=2 ./bin/sherpa-onnx-offline-websocket-server --provider=cuda --port=6006 --num-work-threads=10 --tokens=sherpa-onnx-zipformer-gigaspeech-2023-12-12/tokens.txt --encoder=sherpa-onnx-zipformer-gigaspeech-2023-12-12/encoder-epoch-30-avg-1.onnx --decoder=sherpa-onnx-zipformer-gigaspeech-2023-12-12/decoder-epoch-30-avg-1.onnx --joiner=sherpa-onnx-zipformer-gigaspeech-2023-12-12/joiner-epoch-30-avg-1.onnx --log-file=./log.txt --max-batch-size=5

csukuangfj · 2024-06-24T11:44:59Z

Could you change

sherpa-onnx/sherpa-onnx/csrc/offline-websocket-server-impl.cc

Lines 95 to 98 in 1f95bff

    
           lock.unlock(); 
        
           // Note: DecodeStreams is thread-safe 
        
           recognizer_.DecodeStreams(p_ss.data(), size);

to

 recognizer_.DecodeStreams(p_ss.data(), size); 

 lock.unlock();

recompile, and re-try?

Vergissmeinicht · 2024-06-25T04:09:52Z

It works fine now. So is it a bug here?

csukuangfj · 2024-06-25T04:12:16Z

It works fine now. So is it a bug here?

I think it is a bug of onnxruntime.

When using CPU, onnxruntime session is thread-safe.
However, it is not thread-safe when using CUDA provider.

please see
microsoft/onnxruntime#114

manickavela29 · 2024-06-25T15:50:05Z

Hi @csukuangfj,

is the issue coming because @Vergissmeinicht is using local onnxruntime,
with onnxruntime from sherpa-onnx(onnxruntime 1.17.1) it is stable in my machines

Vergissmeinicht · 2024-06-26T10:39:57Z

Hi @csukuangfj,

is the issue coming because @Vergissmeinicht is using local onnxruntime, with onnxruntime from sherpa-onnx(onnxruntime 1.17.1) it is stable in my machines

I build sherpa with no local onnxruntime. The installation of onnxruntime is provided by the cmake.

Vergissmeinicht · 2024-06-26T10:43:12Z

@csukuangfj Server been running with 200k wav files recognized, everything works fine except that memory consumption seems to increase by nearly 3G. No more modification to source code. Is it possible that memory leak may happen?

csukuangfj · 2024-06-26T10:59:44Z

Is CPU RAM or GPU RAM increased to 3G?

Do you mean 20 000 wavs or just 200 wav files?

csukuangfj · 2024-06-26T11:04:46Z

Hi @csukuangfj,

is the issue coming because @Vergissmeinicht is using local onnxruntime, with onnxruntime from sherpa-onnx(onnxruntime 1.17.1) it is stable in my machines

@Vergissmeinicht Could you look into this comment?

Vergissmeinicht · 2024-06-27T08:24:47Z

Is CPU RAM or GPU RAM increased to 3G?

Do you mean 20 000 wavs or just 200 wav files?

Been serving for 2days and now the memory consumption keeps stable. No more worry about memory leak! : )

Vergissmeinicht · 2024-06-27T08:26:43Z

Hi @csukuangfj,
is the issue coming because @Vergissmeinicht is using local onnxruntime, with onnxruntime from sherpa-onnx(onnxruntime 1.17.1) it is stable in my machines

@Vergissmeinicht Could you look into this comment?

Replied to this comment already. I build the whole project inside a docker without any onnxruntime installed.

csukuangfj · 2024-06-27T08:43:36Z

Are you also running sherpa-onnx inside the docker container?

Vergissmeinicht · 2024-06-27T08:47:06Z

Are you also running sherpa-onnx inside the docker container?

Yes. I use nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04 as my base docker.

csukuangfj · 2024-06-27T08:51:18Z

Can it be closed now?

Vergissmeinicht · 2024-06-27T08:54:34Z

Can it be closed now?

So it makes no difference whether the recognizer do decode after or before the unlock?

csukuangfj · 2024-06-27T08:57:54Z

For the CUDA provider, since onnxruntime.session is not thread-safe, we have to do decode first, and then unlock.

For the CPU provider, onnxruntime.session is thread-safe, so we can unlock first and then decode.

Vergissmeinicht closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have trouble using sherpa-onnx-offline-websocket-server with cuda provider #1053

Have trouble using sherpa-onnx-offline-websocket-server with cuda provider #1053

Vergissmeinicht commented Jun 24, 2024 •

edited

Loading

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 24, 2024

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 24, 2024

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 25, 2024

csukuangfj commented Jun 25, 2024

manickavela29 commented Jun 25, 2024

Vergissmeinicht commented Jun 26, 2024

Vergissmeinicht commented Jun 26, 2024 •

edited

Loading

csukuangfj commented Jun 26, 2024

csukuangfj commented Jun 26, 2024

Vergissmeinicht commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Have trouble using sherpa-onnx-offline-websocket-server with cuda provider #1053

Have trouble using sherpa-onnx-offline-websocket-server with cuda provider #1053

Comments

Vergissmeinicht commented Jun 24, 2024 • edited Loading

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 24, 2024

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 24, 2024

csukuangfj commented Jun 24, 2024

Vergissmeinicht commented Jun 25, 2024

csukuangfj commented Jun 25, 2024

manickavela29 commented Jun 25, 2024

Vergissmeinicht commented Jun 26, 2024

Vergissmeinicht commented Jun 26, 2024 • edited Loading

csukuangfj commented Jun 26, 2024

csukuangfj commented Jun 26, 2024

Vergissmeinicht commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Vergissmeinicht commented Jun 27, 2024

csukuangfj commented Jun 27, 2024

Vergissmeinicht commented Jun 24, 2024 •

edited

Loading

Vergissmeinicht commented Jun 26, 2024 •

edited

Loading