-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU performance worse than CPU #110
Comments
Hi @touste. Thank you for your interest and I'm glad it has made your job easier! The GPU backend has sadly not been maintained for too long. Are you mapping arguments to the GPU every iteration, thus incurring costs in moving data? Perhaps @gkanwar can provide some further suggestions? Could it be due to the cost of sending a run command to the GPU every time step? Regarding a roadmap, I completely agree. Simit has, unfortunately, been neglected for about a year while we have worked on the tensor algebra compiler (tensor-compiler.org). We built it to become the new compiler for Simit, but it has taken on a life of its own. However, we are trying to find the time to integrate the tensor compiler with Simit and use it to carry out several improvements to the Simit language. |
Unfortunately, as @fredrikbk mentioned, the GPU backend certainly needs some maintenance at this point. I am happy to help work through the issues you're seeing, but it would be helpful if you can provide a small code sample that demonstrates the performance bug. Thanks! |
Thank you for your suggestions, I was indeed able to get better performances by reducing the number of mappings between the cpu and gpu. One thing that also helped was to build simit with the release flag. Now I get a 4x speedup on gpu compared to cpu, which is reasonnable I suppose. Thanks again! |
@touste, yes, we want to get back to it. Integrating taco with Simit will let us make it much more general, including arbitrary blocked matrices and even general sparse tensor computations. It has just been a time management issue, since taco itself has required a lot of work over the last year. Also, we'll be talking to Nvidia about tricks to get fast GPU support in taco, so when they are combined you might get a nice GPU speedup. I'm so glad to hear that it's useful to you! It really makes the work worth it. In the mean time, taco is a C++ library so you can use it apart from Simit if you wish. Of course, when it's integrated with Simit, especially tensor assemblies, then it will be much more convenient. taco does support multiprocessing to some degree, so Simit will get that with taco integration. It also has some vectorization that it gets from the compiler, but I'm hoping to find a keen master student to improve on that. Multiple concurrent assemblies is a great idea that we'll think about. |
Hi, I've been using Simit for writing a piece of finite element code in order to reach real-time execution for computer-assisted surgery planning. Using this library has made my work much easier by taking care of kernel assembly and execution.
Based on the published paper, I was expecting a drastic improvement in terms of performance when executing on the GPU, however this is not the case, actually the performance is worse than on the CPU.
Are there plans to update the GPU backend to reach performances such as advertised in the paper? If not, is it possible for me to revert to a previous version of simit which will compile and give me better performance?
As a user, I would also like to second the need of a roadmap ( #58 ) of some sort in order to have a better visibility of future development. This would be very helpful for planning long-term development on the user's end.
Thanks!
The text was updated successfully, but these errors were encountered: