You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Following is my command:
singularity run --nv /data/a/zhangwencai/software/herro.sif inference --read-alns /data/b/zhangwencai/ultra_long/japo_fromGuoSong/minimap2_alignment -t 1 -b 1 -m /data/a/zhangwencai/software/herro/model_R9_v0.1.pt /data/b/zhangwencai/ultra_long/japo_fromGuoSong/DY48490_ONT_UL_200kb.fastq DY48490_ONT_UL_200kb_herro.fasta
The following is the error content:
[00:00:05] Parsed 10543 reads. [00:00:00] Processing 1/? batch ⡀ thread '' panicked at /herro/src/inference.rs:209:70:
Cannot load model.: Torch("CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.\nException raised from device_count_impl at ../c10/cuda/CUDAFunctions.cpp:69 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f5cd385a6bb in /libs/libtorch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc9 (0x7f5cd3855769 in /libs/libtorch/lib/libc10.so)\nframe #2: c10::cuda::device_count_ensure_non_zero() + 0x117 (0x7f5cd324b027 in /libs/libtorch/lib/libc10_cuda.so)\nframe #3: + 0x103931a (0x7f5ced03931a in /libs/libtorch/lib/libtorch_cuda.so)\nframe #4: + 0x2c30f36 (0x7f5ceec30f36 in /libs/libtorch/lib/libtorch_cuda.so)\nframe #5: + 0x2c30ffb (0x7f5ceec30ffb in /libs/libtorch/lib/libtorch_cuda.so)\nframe #6: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x1fb (0x7f5cd5eb71fb in /libs/libtorch/lib/libtorch_cpu.so)\nframe #7: + 0x25ebc75 (0x7f5cd61ebc75 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #8: at::_ops::empty_strided::call(c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x168 (0x7f5cd5ef2328 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #9: + 0x1701f5f (0x7f5cd5301f5f in /libs/libtorch/lib/libtorch_cpu.so)\nframe #10: at::native::_to_copy(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x17e3 (0x7f5cd56a6cf3 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #11: + 0x27d3603 (0x7f5cd63d3603 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #13: + 0x25f01c8 (0x7f5cd61f01c8 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #14: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #15: + 0x3a66271 (0x7f5cd7666271 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #16: + 0x3a6681b (0x7f5cd766681b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #17: at::_ops::_to_copy::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x201 (0x7f5cd5c16651 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #18: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0xfd (0x7f5cd56a505d in /libs/libtorch/lib/libtorch_cpu.so)\nframe #19: + 0x29a5612 (0x7f5cd65a5612 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #20: at::_ops::to_device::call(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0x1c1 (0x7f5cd5d95cd1 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #21: torch::jit::Unpickler::readInstruction() + 0x1719 (0x7f5cd8766789 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #22: torch::jit::Unpickler::run() + 0xa8 (0x7f5cd8767988 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #23: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f5cd876953e in /libs/libtorch/lib/libtorch_cpu.so)\nframe #24: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_typec10::ivalue::Object > (c10::StrongTypePtr, c10::IValue)> >, c10::optionalc10::Device, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtrc10::Type (*)(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), std::shared_ptrtorch::jit::DeserializationStorageContext) + 0x529 (0x7f5cd87241a9 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #25: + 0x4b08c4b (0x7f5cd8708c4b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #26: + 0x4b0b04b (0x7f5cd870b04b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #27: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&, bool, bool) + 0x3a2 (0x7f5cd870f6c2 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #28: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0x92 (0x7f5cd870fa42 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #29: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0xd1 (0x7f5cd870fb71 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #30: + 0x1ee52e (0x55cf434da52e in herro)\nframe #31: + 0xd4bc9 (0x55cf433c0bc9 in herro)\nframe #32: + 0x1062b6 (0x55cf433f22b6 in herro)\nframe #33: + 0xc0aec (0x55cf433acaec in herro)\nframe #34: + 0xf56e5 (0x55cf433e16e5 in herro)\nframe #35: + 0x15ae9b (0x55cf43446e9b in herro)\nframe #36: + 0x94ac3 (0x7f5cd366bac3 in /lib/x86_64-linux-gnu/libc.so.6)\nframe #37: clone + 0x44 (0x7f5cd36fca04 in /lib/x86_64-linux-gnu/libc.so.6)\n")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)
Please tell me where is the error, what should I do?
Below are my CUDA version and GPU version:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
nvidia-smi
Mon Dec 2 09:55:33 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:31:00.0 Off | Off |
| 30% 58C P0 80W / 300W | 1MiB / 49140MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Best wishes,
WenCai
The text was updated successfully, but these errors were encountered:
Hello,
Following is my command:
singularity run --nv /data/a/zhangwencai/software/herro.sif inference --read-alns /data/b/zhangwencai/ultra_long/japo_fromGuoSong/minimap2_alignment -t 1 -b 1 -m /data/a/zhangwencai/software/herro/model_R9_v0.1.pt /data/b/zhangwencai/ultra_long/japo_fromGuoSong/DY48490_ONT_UL_200kb.fastq DY48490_ONT_UL_200kb_herro.fasta
The following is the error content:
[00:00:05] Parsed 10543 reads. [00:00:00] Processing 1/? batch ⡀ thread '' panicked at /herro/src/inference.rs:209:70:
Cannot load model.: Torch("CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.\nException raised from device_count_impl at ../c10/cuda/CUDAFunctions.cpp:69 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f5cd385a6bb in /libs/libtorch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc9 (0x7f5cd3855769 in /libs/libtorch/lib/libc10.so)\nframe #2: c10::cuda::device_count_ensure_non_zero() + 0x117 (0x7f5cd324b027 in /libs/libtorch/lib/libc10_cuda.so)\nframe #3: + 0x103931a (0x7f5ced03931a in /libs/libtorch/lib/libtorch_cuda.so)\nframe #4: + 0x2c30f36 (0x7f5ceec30f36 in /libs/libtorch/lib/libtorch_cuda.so)\nframe #5: + 0x2c30ffb (0x7f5ceec30ffb in /libs/libtorch/lib/libtorch_cuda.so)\nframe #6: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x1fb (0x7f5cd5eb71fb in /libs/libtorch/lib/libtorch_cpu.so)\nframe #7: + 0x25ebc75 (0x7f5cd61ebc75 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #8: at::_ops::empty_strided::call(c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x168 (0x7f5cd5ef2328 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #9: + 0x1701f5f (0x7f5cd5301f5f in /libs/libtorch/lib/libtorch_cpu.so)\nframe #10: at::native::_to_copy(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x17e3 (0x7f5cd56a6cf3 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #11: + 0x27d3603 (0x7f5cd63d3603 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #13: + 0x25f01c8 (0x7f5cd61f01c8 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #14: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #15: + 0x3a66271 (0x7f5cd7666271 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #16: + 0x3a6681b (0x7f5cd766681b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #17: at::_ops::_to_copy::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x201 (0x7f5cd5c16651 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #18: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0xfd (0x7f5cd56a505d in /libs/libtorch/lib/libtorch_cpu.so)\nframe #19: + 0x29a5612 (0x7f5cd65a5612 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #20: at::_ops::to_device::call(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0x1c1 (0x7f5cd5d95cd1 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #21: torch::jit::Unpickler::readInstruction() + 0x1719 (0x7f5cd8766789 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #22: torch::jit::Unpickler::run() + 0xa8 (0x7f5cd8767988 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #23: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f5cd876953e in /libs/libtorch/lib/libtorch_cpu.so)\nframe #24: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_typec10::ivalue::Object > (c10::StrongTypePtr, c10::IValue)> >, c10::optionalc10::Device, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtrc10::Type (*)(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), std::shared_ptrtorch::jit::DeserializationStorageContext) + 0x529 (0x7f5cd87241a9 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #25: + 0x4b08c4b (0x7f5cd8708c4b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #26: + 0x4b0b04b (0x7f5cd870b04b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #27: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&, bool, bool) + 0x3a2 (0x7f5cd870f6c2 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #28: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0x92 (0x7f5cd870fa42 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #29: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0xd1 (0x7f5cd870fb71 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #30: + 0x1ee52e (0x55cf434da52e in herro)\nframe #31: + 0xd4bc9 (0x55cf433c0bc9 in herro)\nframe #32: + 0x1062b6 (0x55cf433f22b6 in herro)\nframe #33: + 0xc0aec (0x55cf433acaec in herro)\nframe #34: + 0xf56e5 (0x55cf433e16e5 in herro)\nframe #35: + 0x15ae9b (0x55cf43446e9b in herro)\nframe #36: + 0x94ac3 (0x7f5cd366bac3 in /lib/x86_64-linux-gnu/libc.so.6)\nframe #37: clone + 0x44 (0x7f5cd36fca04 in /lib/x86_64-linux-gnu/libc.so.6)\n")
note: run with
RUST_BACKTRACE=1
environment variable to display a backtraceAborted (core dumped)
Please tell me where is the error, what should I do?
Below are my CUDA version and GPU version:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
nvidia-smi
Mon Dec 2 09:55:33 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:31:00.0 Off | Off |
| 30% 58C P0 80W / 300W | 1MiB / 49140MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Best wishes,
WenCai
The text was updated successfully, but these errors were encountered: