You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to utilize a p4.24xl (8xA100 and plenty of CPUs) but when running the step I barely see any CPU/GPU utilization.
There's a couple of GB of memory allocated within the GPU and the CPU - but not a lot else going on.
I am using the version 0.9.0+9dc15a8 within the upstream dockerhub container.
[2024-12-19 10:17:19.679] [info] Running: "correct" "/shared/data.fq" "--from-paf" "/shared/data_overlaps.paf"
[2024-12-19 10:17:19.736] [info] - downloading herro-v1 with httplib
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 0.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 1.
[2024-12-19 10:17:22.581] [info] Using batch size 12 on device cuda:2 in inference thread 0.
[2024-12-19 10:17:22.587] [info] Using batch size 12 on device cuda:2 in inference thread 1.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 0.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 1.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 0.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 1.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 0.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 1.
[2024-12-19 10:17:23.379] [info] Using batch size 12 on device cuda:6 in inference thread 0.
[2024-12-19 10:17:23.380] [info] Using batch size 12 on device cuda:6 in inference thread 1.
[2024-12-19 10:17:23.569] [info] Using batch size 12 on device cuda:7 in inference thread 0.
[2024-12-19 10:17:23.570] [info] Using batch size 12 on device cuda:7 in inference thread 1.
How can I improve the utilization of the box...?
The text was updated successfully, but these errors were encountered:
How long did you leave it running before observing this behavior? I.e. has the run completed and was the GPU utilization low throughout the entire run?
There are a couple of things to consider:
Overlapping is a CPU-only portion of the workflow, and it runs first. Even though the GPUs are initialized, it takes some time before the first overlap piles are ready for inference (on GPUs), especially when the input data is big.
You can try running the overlap step first manually by using the --to-paf option. In this case, GPUs aren't utilized. After this is done, you can run GPU inference using --from-paf. This may even be more cost effective, because the first portion can be done on a smaller compute node which only has CPU compute resources.
Additionally, you can try to manually set the batch size (--batch-size) to a larger value in case the auto-computed batch size turned out to be too low for your system.
Hey, I ran it for around 90min and aborted it afterwards. I am using --from-paf - the overlapping workflow was already done by a colleaguge (ping @Sateesh_Peri).
I changed the batch size to 250 with no effect (even though I did not let it run for long).
I am trying to utilize a p4.24xl (8xA100 and plenty of CPUs) but when running the step I barely see any CPU/GPU utilization.
There's a couple of GB of memory allocated within the GPU and the CPU - but not a lot else going on.
I am using the version
0.9.0+9dc15a8
within the upstream dockerhub container.Output
How can I improve the utilization of the box...?
The text was updated successfully, but these errors were encountered: