Multi A100 expected CPU/GPU utilization in correction inference stage #1189

ChristianKniep · 2024-12-19T11:15:51Z

I am trying to utilize a p4.24xl (8xA100 and plenty of CPUs) but when running the step I barely see any CPU/GPU utilization.
There's a couple of GB of memory allocated within the GPU and the CPU - but not a lot else going on.

I am using the version 0.9.0+9dc15a8 within the upstream dockerhub container.

$ dorado correct /shared/data.fq \
  --from-paf /shared/data_overlaps.paf \
  > /shared/data_corrected_reads.fasta

Output

[2024-12-19 10:17:19.679] [info] Running: "correct" "/shared/data.fq" "--from-paf" "/shared/data_overlaps.paf"
[2024-12-19 10:17:19.736] [info]  - downloading herro-v1 with httplib
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 0.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 1.
[2024-12-19 10:17:22.581] [info] Using batch size 12 on device cuda:2 in inference thread 0.
[2024-12-19 10:17:22.587] [info] Using batch size 12 on device cuda:2 in inference thread 1.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 0.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 1.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 0.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 1.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 0.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 1.
[2024-12-19 10:17:23.379] [info] Using batch size 12 on device cuda:6 in inference thread 0.
[2024-12-19 10:17:23.380] [info] Using batch size 12 on device cuda:6 in inference thread 1.
[2024-12-19 10:17:23.569] [info] Using batch size 12 on device cuda:7 in inference thread 0.
[2024-12-19 10:17:23.570] [info] Using batch size 12 on device cuda:7 in inference thread 1.

How can I improve the utilization of the box...?

The text was updated successfully, but these errors were encountered:

svc-jstone · 2024-12-20T12:40:20Z

Hi @ChristianKniep ,

How long did you leave it running before observing this behavior? I.e. has the run completed and was the GPU utilization low throughout the entire run?
There are a couple of things to consider:

Overlapping is a CPU-only portion of the workflow, and it runs first. Even though the GPUs are initialized, it takes some time before the first overlap piles are ready for inference (on GPUs), especially when the input data is big.
You can try running the overlap step first manually by using the --to-paf option. In this case, GPUs aren't utilized. After this is done, you can run GPU inference using --from-paf. This may even be more cost effective, because the first portion can be done on a smaller compute node which only has CPU compute resources.
Additionally, you can try to manually set the batch size (--batch-size) to a larger value in case the auto-computed batch size turned out to be too low for your system.

Hope this helps!

ChristianKniep · 2024-12-20T12:45:34Z

Hey, I ran it for around 90min and aborted it afterwards. I am using --from-paf - the overlapping workflow was already done by a colleaguge (ping @Sateesh_Peri).
I changed the batch size to 250 with no effect (even though I did not let it run for long).

HalfPhoton added the read_correction Read error correction label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi A100 expected CPU/GPU utilization in correction inference stage #1189

Multi A100 expected CPU/GPU utilization in correction inference stage #1189

ChristianKniep commented Dec 19, 2024 •

edited

Loading

svc-jstone commented Dec 20, 2024

ChristianKniep commented Dec 20, 2024

Multi A100 expected CPU/GPU utilization in correction inference stage #1189

Multi A100 expected CPU/GPU utilization in correction inference stage #1189

Comments

ChristianKniep commented Dec 19, 2024 • edited Loading

svc-jstone commented Dec 20, 2024

ChristianKniep commented Dec 20, 2024

ChristianKniep commented Dec 19, 2024 •

edited

Loading