-
-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU support for iw3 #59
Comments
pytorch/pytorch#8637 |
the above problem was fixed by nagadomi/MiDaS_iw3@22193f4 , |
register_forward_hook problem was fixed by nagadomi/MiDaS_iw3@0da1ad0 nagadomi/ZoeDepth_iw3@55bacaf iw3 now works with multiple GPUs. updating stepsfor git, # update source code
git pull
# update MiDas and ZoeDepth
python -m iw3.download_models for windows_package, examplesCLI
GUI
I tested only 2 GPU case on Linux CLI. @elecimage |
oh Thank you. I'll test it soon |
yes it works but slower than 1 gpu use.~ |
@elecimage Here are some possible causes and questions,
|
With 2 GPu's it is roughly 1.2x slower than with 1. I'm using two 2080ti. I've tried changing the Depth Batch Size several times, but it doesn't make much difference. |
OK, I will try to create a Windows VM on cloud and check the behavior. |
Maybe fixed by 2b7cbf9. |
oh Thank you. I'll test it soon |
I'm still having problems. |
When using multi GPUs, the batch size is divided for each GPU. So for the same batch size setting, each GPU's VRAM usage will be 1/GPU. In my test above, I tried the following settings. with 720x720 video, 1GPU = 2.5 FPS, 2GPU = 3.7 FPS.
GPU is Tesla T4 x2, T4 is the same generation architecture as RTX 2080ti and should have slightly worse performance. For reference, |
Recent Changes,
The issue of FPS not improving with multiple GPUs may be caused by Windows NVIDIA Driver mode (TCC/WDDM, seems to differ between Tesla Driver and GeForce Driver), so it may not be improved. |
and same result in the windows ,I'm pretty sure it has identified all the cards |
|
all_cuda vs singal gpu |
Mutli-GPU DataParallel seems to be working (first screenshot of nvidia-smi). It may just slow. Also, you can monitor nvidia-smi with the following commands
or
|
Turn off |
Try |
Try closing the application once and then try again (to avoid out of memory). Multi-GPU feature only supports depth estimation models, so if there are other bottlenecks, they will not be improved. Try low-resolution video as well. Also, when processing multiple videos, the following method is effective.
|
I tried All CUDA in a Tesla T4 x 2 Linux environment.
multi gpu fps: 7.36 With Depth Anything(Any_B), the difference is even smaller.
multi gpu fps: 14.83 I have an idea about another multi-GPU strategy. |
Maybe it’s because Nvidia has cut some features from gaming graphics cards compared to professional cards. Anyway, I’m looking forward to your new multi-GPU strategy. |
I made this change. T4 x2 + Linux + 8 core (When tested above, it was 2 cores...)
Old code for comparison
Single GPU performance is also improved. On T4 x2 + Windows Server, |
it seems not work on my pc python -m iw3.cli -i /home/ohjoij/视频/fz.mkv -o /home/ohjoij/视频/test.mkv --gpu 0 1 --depth-model Any_B --zoed-batch-size 4 --max-workers 8 --yes ZoeD_N python -m iw3.cli -i /home/ohjoij/视频/fz.mkv -o /home/ohjoij/视频/test.mkv --gpu 0 1 --depth-model ZoeD_N --zoed-batch-size 4 --max-workers 8 --yes |
Maybe CPU or IO is the bottleneck and single GPU performance is higher compared to them. Is the single GPU performance of |
I changed part of #59 (comment) change to only enable it when |
ZoeD_N: Add --cuda-stream |
close efficent cores in 13600k, python -m iw3.cli -i /home/ohjoij/视频/fz.mkv -o /home/ohjoij/视频/test.mkv --gpu 0 1 --depth-model ZoeD_N --zoed-batch-size 4 --max-workers 8 --yes --cuda-stream fz.mkv: 100%|████████████████▉| 2230/2232 [03:13<00:00, 11.55it/s] |
I think the multi-GPU feature is working, but it is simply not efficient. |
OK,i see,thank you for your patient answer |
from #28 (comment)
The text was updated successfully, but these errors were encountered: