You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We experience crashes on exit (static uninit phase) when running inference on a specific model with the CUDA EP:
#0 0x00007f7a5634537f in raise () at /lib64/libc.so.6
#1 0x00007f7a5632fdb5 in abort () at /lib64/libc.so.6
#2 0x00007f7a563884e7 in __libc_message () at /lib64/libc.so.6
#3 0x00007f7a5638f5ec in .annobin_top_check.start () at /lib64/libc.so.6
#4 0x00007f7a5638fe2c in unlink_chunk.isra () at /lib64/libc.so.6
#5 0x00007f7a5638ff97 in malloc_consolidate () at /lib64/libc.so.6
#6 0x00007f7a56392368 in _int_malloc () at /lib64/libc.so.6
#7 0x00007f7a563948d6 in calloc () at /lib64/libc.so.6
#8 0x00007f7aac588224 in calloc(size_t, size_t) (nelem=8, elsize=157) at develop/src/libheap/allocwrap.C:395
#9 0x00007f7a6e437817 in () at /mnt/repo/bin/libcublas.so.11
#10 0x00007f7a6e43a914 in () at /mnt/repo/bin/libcublas.so.11
#11 0x00007f7a6e42a15c in () at /mnt/repo/bin/libcublas.so.11
#12 0x00007f7a6e42bcf8 in () at /mnt/repo/bin/libcublas.so.11
#13 0x00007f7a56348037 in __cxa_finalize () at /lib64/libc.so.6
#14 0x00007f7a6dc3c8c3 in () at /mnt/repo/bin/libcublas.so.11
#15 0x00007ffc4e07d630 in ()
#16 0x00007f7aac3c5c96 in _dl_fini () at /lib64/ld-linux-x86-64.so.2
Here is a different stack I got when using a debug build of ORT:
#0 0x00007f84b391437f in raise () at /lib64/libc.so.6
#1 0x00007f84b38fedb5 in abort () at /lib64/libc.so.6
#2 0x00007f84b42ce09b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () at /lib64/libstdc++.so.6
#3 0x00007f84b42d453c in __cxxabiv1::__terminate(void (*)()) () at /lib64/libstdc++.so.6
#4 0x00007f84b42d4597 in () at /lib64/libstdc++.so.6
#5 0x00007f84b42d53f5 in () at /lib64/libstdc++.so.6
#6 0x00007f8490639c50 in onnxruntime::ProviderSharedLibrary::Unload() (this=0x7f84925db1c0 <onnxruntime::s_library_shared>)
at /repo/onnx/onnxruntime-1.18.0/onnxruntime/core/session/provider_bridge_ort.cc:1385
#7 0x00007f849063a5ce in onnxruntime::UnloadSharedProviders() () at /repo/onnx/onnxruntime-1.18.0/onnxruntime/core/session/provider_bridge_ort.cc:1534
#8 0x00007f8490629229 in OrtEnv::~OrtEnv() (this=0x1cb10580, __in_chrg=<optimized out>) at /repo/onnx/onnxruntime-1.18.0/onnxruntime/core/session/ort_env.cc:31
#9 0x00007f849062a510 in std::default_delete<OrtEnv>::operator()(OrtEnv*) const (this=0x7f84925db118 <OrtEnv::p_instance_>, __ptr=0x1cb10580) at /usr/include/c++/8/bits/unique_ptr.h:81
#10 0x00007f8490629c83 in std::unique_ptr<OrtEnv, std::default_delete<OrtEnv> >::~unique_ptr() (this=0x7f84925db118 <OrtEnv::p_instance_>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:277
#11 0x00007f84b3917037 in __cxa_finalize () at /lib64/libc.so.6
#12 0x00007f84905dfc17 in __do_global_dtors_aux () at /mnt/onnxruntime/lib/libonnxruntime.so.1.18.0
#13 0x00007ffd05bf6d20 in ()
#14 0x00007f8509a2bc96 in _dl_fini () at /lib64/ld-linux-x86-64.so.2
It seems to be a regression in ORT 1.18, since we could not repro in ORT 1.14.1.
We are using default CUDA OrtCUDAProviderOptions values.
We are using the default Arena allocator.
To reproduce
Run inference on our model (which will be provided to Microsoft directly as it has some proprietary content) with any input using the CUDA EP.
You will crash exiting the program.
Urgency
No response
Platform
Linux
OS Version
ROCKY 8.5 (gcc-11.2.1, c++17)
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
C
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
The text was updated successfully, but these errors were encountered:
Describe the issue
We experience crashes on exit (static uninit phase) when running inference on a specific model with the CUDA EP:
Here is a different stack I got when using a debug build of ORT:
It seems to be a regression in ORT 1.18, since we could not repro in ORT 1.14.1.
We are using default CUDA OrtCUDAProviderOptions values.
We are using the default Arena allocator.
To reproduce
Run inference on our model (which will be provided to Microsoft directly as it has some proprietary content) with any input using the CUDA EP.
You will crash exiting the program.
Urgency
No response
Platform
Linux
OS Version
ROCKY 8.5 (gcc-11.2.1, c++17)
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
C
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
The text was updated successfully, but these errors were encountered: