Skip to content

GPU Performance Improvement

Compare
Choose a tag to compare
@texadactyl texadactyl released this 04 Apr 23:52
· 20 commits to master since this release
0fc6e73

This release replaces the flt function with a new implementation when turbo_seti is running in GPU mode. Thanks to Franklin Antonio (@fantonio2 on github) for his code at https://github.com/UCBerkeleySETI/dedopplerperf/blob/main/CudaTaylor5demo.cu; these turbo_seti changes are based on that. Kevin Lacker (@lacker on github) used a C++ template to handle multiple float types and other miscellaneous amendments.

Note that some of the surrounding code is refactored because the CPU implementation of flt stores rows of the output by using a bit reversal technique,. The GPU implementation doesn't so the format is slightly different.

This speeds up the flt function by a factor of 5x or so and that was previously around 30% of the time spent by turbo_seti in the search_coarse_channel function. Overall this change seems to provide a ~15% performance improvement.

Note that the output of the search_coarse_channel function is unchanged. This is purely a performance change when running in GPU mode.

Profiling before this change:
https://bldata.berkeley.edu/pipeline/tmp/turboseti_profile.svg

Profiling after this change:
https://bldata.berkeley.edu/pipeline/tmp/new_turboseti_profile.svg