Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #1172

Open
hwangumn opened this issue Dec 12, 2024 · 1 comment
Open

Segmentation fault #1172

hwangumn opened this issue Dec 12, 2024 · 1 comment

Comments

@hwangumn
Copy link

Issue Report

Please describe the issue:

I am running Dorado v0.8.3 on our supercomputing platform with A100 using Slurm Manager. I have successfully run Dorado with smaller POD5 files but once I combined the 24hrs worth of data (PromethION) together. It keeps producing a segmentation fault error with an exit code of 139. Any suggestions?

Logs

[2024-12-11 17:19:14.736] [info] Running: "basecaller" "hac" "/home/ONT/Isolates/raw_pod5" "--kit-name" "SQK-NBD114-96"
[2024-12-11 17:19:17.326] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar SSL_CERT_FILE to specify the location manually.
[2024-12-11 17:19:17.341] [info] - downloading [email protected] with httplib
[2024-12-11 17:19:17.366] [error] Failed to download [email protected]: SSL server verification failed
[2024-12-11 17:19:17.366] [info] - downloading [email protected] with curl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
17 14.8M 17 2586k 0 0 2728k 0 0:00:05 --:--:-- 0:00:05 2725k
100 14.8M 100 14.8M 0 0 11.4M 0 0:00:01 0:00:01 --:--:-- 11.4M
[2024-12-11 17:19:19.356] [info] Normalised: chunksize 10000 -> 9996
[2024-12-11 17:19:19.356] [info] Normalised: overlap 500 -> 498
[2024-12-11 17:19:19.356] [info] > Creating basecall pipeline
[2024-12-11 17:19:45.588] [info] cuda:0 using chunk size 9996, batch size 5760
[2024-12-11 17:19:46.041] [info] cuda:0 using chunk size 4998, batch size 9216
/var/spool/slurmd/job27986866/slurm_script: line 13: 2146773 Segmentation fault dorado basecaller hac /home/ONT/Isolates/raw_pod5 --kit-name SQK-NBD114-96 > calls.bam

@malton-ont
Copy link
Collaborator

Hi @hwangumn,

To be clear, is this segfault occurring with the same data that ran successfully as smaller files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants