You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Autotuning takes a while and for us most of that time is actually spent compiling the JIT kernel for each configuration rather than running the code. Since this process happens on the host CPU and should not affect timings it would be nice if it could be run in parallel and then once that is done all the configurations could be tested on the GPU linearly. Is this something that might be worth supporting?
The text was updated successfully, but these errors were encountered:
Autotuning takes a while and for us most of that time is actually spent compiling the JIT kernel for each configuration rather than running the code. Since this process happens on the host CPU and should not affect timings it would be nice if it could be run in parallel and then once that is done all the configurations could be tested on the GPU linearly. Is this something that might be worth supporting?
The text was updated successfully, but these errors were encountered: