-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiGPU efficient densenets are slow #36
Comments
Oooh @wandering007 good catch. I'll take a look. |
@gpleiss This re-implementation (https://github.com/wandering007/efficient-densenet-pytorch) has good support for |
i submitted a pull request for this: |
Just merged in #39 . @wandering007 , can you confirm that this fixes the issue? |
@gpleiss Yes, it works fine. |
@wandering007 hmmm that is problematic... In general, I think that the checkpointing-based approach is probably what we should be doing moving forward. The original version was using some low-level calls which are no longer available in PyTorch. Using those low-level calls would require some C code, which is in my opinion undesirable for this package. However, it sounds like the checkpointing-based code is practically unusable for the multi-GPU scenario. It's probably worthwhile bringing up an issue in the PyTorch repo about this. I'll see if there's a better solution in the meantime. |
nn.DataPallel
fails for checkpoint feature
@gpleiss It may be tough for now...To be frank, I am still in favor of the previous implementation (v0.3.1) via |
Maybe this issue could have been made more clear in the readme. I followed the implementation in my project but found it doesn't work with dataparallel ... |
@yzcjtr you might be experiencing a different problem. According to my tests, this should work with DataParallel. Can you post the errors that you're seeing? |
I just got the Segmentation fault (core dumped) error when running with multiple GPUs. Does anyone know how to solve this problem? |
@theonegis can you provide more information? What version of PyTorch, what OS, what version of CUDA, what GPUs, etc.? Also, could you open up a new issue for this? |
@gpleiss I have opened a new issues. Segmentation fault (core dumped) error for multiple GPUs. |
Hi @gpleiss , really sorry for my previous misunderstanding. I'm confronted with a similar situation as @theonegis . I will provide more information in his new issue. Thanks. |
The PyTorch official checkpointing is slow on MultiGPUs as explained by @wandering007 . https://github.com/csrhddlam/pytorch-checkpoint solves this issue. |
I just want to benchmark the new implementation of efficient densenet with the code here. However, it seems that the used checkpointed modules are not broadcast to multiple GPUs as I got the following errors:
I think that the checkpoint feature provides weak support for
nn.DataParallel
.The text was updated successfully, but these errors were encountered: