Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi A100 expected CPU/GPU utilization in correction inference stage #1189

Open
ChristianKniep opened this issue Dec 19, 2024 · 2 comments
Open
Labels
read_correction Read error correction

Comments

@ChristianKniep
Copy link

ChristianKniep commented Dec 19, 2024

I am trying to utilize a p4.24xl (8xA100 and plenty of CPUs) but when running the step I barely see any CPU/GPU utilization.
There's a couple of GB of memory allocated within the GPU and the CPU - but not a lot else going on.

I am using the version 0.9.0+9dc15a8 within the upstream dockerhub container.

$ dorado correct /shared/data.fq \
  --from-paf /shared/data_overlaps.paf \
  > /shared/data_corrected_reads.fasta

Output

[2024-12-19 10:17:19.679] [info] Running: "correct" "/shared/data.fq" "--from-paf" "/shared/data_overlaps.paf"
[2024-12-19 10:17:19.736] [info]  - downloading herro-v1 with httplib
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 0.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 1.
[2024-12-19 10:17:22.581] [info] Using batch size 12 on device cuda:2 in inference thread 0.
[2024-12-19 10:17:22.587] [info] Using batch size 12 on device cuda:2 in inference thread 1.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 0.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 1.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 0.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 1.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 0.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 1.
[2024-12-19 10:17:23.379] [info] Using batch size 12 on device cuda:6 in inference thread 0.
[2024-12-19 10:17:23.380] [info] Using batch size 12 on device cuda:6 in inference thread 1.
[2024-12-19 10:17:23.569] [info] Using batch size 12 on device cuda:7 in inference thread 0.
[2024-12-19 10:17:23.570] [info] Using batch size 12 on device cuda:7 in inference thread 1.

How can I improve the utilization of the box...?

@HalfPhoton HalfPhoton added the read_correction Read error correction label Dec 20, 2024
@svc-jstone
Copy link
Contributor

Hi @ChristianKniep ,

How long did you leave it running before observing this behavior? I.e. has the run completed and was the GPU utilization low throughout the entire run?
There are a couple of things to consider:

  • Overlapping is a CPU-only portion of the workflow, and it runs first. Even though the GPUs are initialized, it takes some time before the first overlap piles are ready for inference (on GPUs), especially when the input data is big.
  • You can try running the overlap step first manually by using the --to-paf option. In this case, GPUs aren't utilized. After this is done, you can run GPU inference using --from-paf. This may even be more cost effective, because the first portion can be done on a smaller compute node which only has CPU compute resources.
  • Additionally, you can try to manually set the batch size (--batch-size) to a larger value in case the auto-computed batch size turned out to be too low for your system.

Hope this helps!

@ChristianKniep
Copy link
Author

Hey, I ran it for around 90min and aborted it afterwards. I am using --from-paf - the overlapping workflow was already done by a colleaguge (ping @Sateesh_Peri).
I changed the batch size to 250 with no effect (even though I did not let it run for long).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
read_correction Read error correction
Projects
None yet
Development

No branches or pull requests

3 participants