Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete optimization #27

Merged
merged 7 commits into from
Oct 10, 2023
Merged

Delete optimization #27

merged 7 commits into from
Oct 10, 2023

Conversation

kstppd
Copy link
Owner

@kstppd kstppd commented Oct 6, 2023

  • This ports over the optimizations done to the insertion kernels during summer 2023 to the delete kernels.
  • On top of that hashers.h is now made smaller by separating the nvidia and amd kernels to different files namely kernels_NVIDIA.h and kernels_AMD.h. There is some level of duplication in those files which should be cleaned up in the near future.

@kstppd
Copy link
Owner Author

kstppd commented Oct 6, 2023

Some stats from VOLTAR:

DEV

void Hashinator::Hashers::delete_kernel<unsigned int, unsigned int, (unsigned int)4294967295, (unsigned int)4294967294, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, Hashinator::hash_pair<T1, T2> *, unsigned long *, int, unsigned long, unsigned long) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.505338 1.516568 1.511397
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.060273 1.068183 1.064541
gpc__cycles_elapsed.max cycle 7446972.000000 7500421.000000 7464503.200000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 17.833765 17.962199 17.920048
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 17.833765 17.962199 17.920048
gpu__time_duration.sum msecond 6.994560 7.028800 7.011821
l1tex__throughput.avg.pct_of_peak_sustained_active % 9.222978 9.263118 9.239684
lts__throughput.avg.pct_of_peak_sustained_elapsed % 26.295844 26.489994 26.420818
sm__cycles_active.avg cycle 7435975.435185 7468405.527778 7455003.432407
sm__throughput.avg.pct_of_peak_sustained_elapsed % 10.428739 10.503618 10.478983

void Hashinator::Hashers::insert_kernel<unsigned int, unsigned int, (unsigned int)4294967295, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, T2 *, Hashinator::hash_pair<T1, T2> *, int, unsigned long, unsigned long *, unsigned long *, unsigned long, Hashinator::status *) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.510651 1.512987 1.511663
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.063859 1.065505 1.064571
gpc__cycles_elapsed.max cycle 4213775.000000 4218969.000000 4216988.200000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 35.815730 35.857784 35.832524
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 32.626122 32.670232 32.642962
gpu__time_duration.sum msecond 3.956960 3.963264 3.960874
l1tex__throughput.avg.pct_of_peak_sustained_active % 35.873580 35.904683 35.888333
lts__throughput.avg.pct_of_peak_sustained_elapsed % 58.533260 58.634845 58.583288
sm__cycles_active.avg cycle 4207948.657407 4212078.842593 4210073.062963
sm__throughput.avg.pct_of_peak_sustained_elapsed % 46.175313 46.232249 46.197265

void Hashinator::Hashers::retrieve_kernel<unsigned int, unsigned int, (unsigned int)4294967295, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, T2 *, Hashinator::hash_pair<T1, T2> *, int, unsigned long) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.506765 1.511590 1.510017
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.061036 1.064440 1.063329
gpc__cycles_elapsed.max cycle 2537513.000000 2539980.000000 2539065.500000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 45.088881 45.133333 45.106795
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 45.088881 45.133333 45.106795
gpu__time_duration.sum msecond 2.385408 2.391648 2.387571
l1tex__throughput.avg.pct_of_peak_sustained_active % 22.094788 22.132451 22.114941
lts__throughput.avg.pct_of_peak_sustained_elapsed % 59.425200 59.494961 59.452679
sm__cycles_active.avg cycle 2520763.879630 2525127.953704 2522717.659259
sm__throughput.avg.pct_of_peak_sustained_elapsed % 26.102052 26.128094 26.112027

DELETE_OPTIMIZATIONS

void Hashinator::Hashers::delete_kernel<unsigned int, unsigned int, (unsigned int)4294967295, (unsigned int)4294967294, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, Hashinator::hash_pair<T1, T2> *, unsigned long *, int, unsigned long, unsigned long) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.507232 1.517378 1.510987
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.057371 1.068437 1.063286
gpc__cycles_elapsed.max cycle 3677466.000000 3697696.000000 3686546.700000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 36.137017 36.322093 36.237116
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 36.137017 36.322093 36.237116
** gpu__time_duration.sum msecond 3.458112 3.473248 3.465114 **
l1tex__throughput.avg.pct_of_peak_sustained_active % 31.170985 31.314450 31.260865
lts__throughput.avg.pct_of_peak_sustained_elapsed % 49.128153 49.428405 49.272549
sm__cycles_active.avg cycle 3669260.083333 3685783.824074 3675393.850000
sm__throughput.avg.pct_of_peak_sustained_elapsed % 30.080758 30.368656 30.183720

void Hashinator::Hashers::insert_kernel<unsigned int, unsigned int, (unsigned int)4294967295, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, T2 *, Hashinator::hash_pair<T1, T2> *, int, unsigned long, unsigned long *, unsigned long *, unsigned long, Hashinator::status *) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.510984 1.513487 1.512004
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.064106 1.065860 1.064825
gpc__cycles_elapsed.max cycle 4216802.000000 4221979.000000 4218887.900000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 35.793318 35.837510 35.817612
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 32.599702 32.640707 32.624702
gpu__time_duration.sum msecond 3.957856 3.964288 3.961763
l1tex__throughput.avg.pct_of_peak_sustained_active % 35.817812 35.892894 35.868637
lts__throughput.avg.pct_of_peak_sustained_elapsed % 58.519449 58.586801 58.561299
sm__cycles_active.avg cycle 4209233.851852 4218398.981481 4212581.847222
sm__throughput.avg.pct_of_peak_sustained_elapsed % 46.145086 46.202654 46.179337

void Hashinator::Hashers::retrieve_kernel<unsigned int, unsigned int, (unsigned int)4294967295, Hashinator::HashFunctions::Fibonacci, (int)32, (int)4>(T1 *, T2 *, Hashinator::hash_pair<T1, T2> *, int, unsigned long) (131072, 1, 1)x(1024, 1, 1), Device 0, CC 8.0, Invocations 10
Section: GPU Speed Of Light Throughput
Metric Name Metric Unit Minimum Maximum Average
---------------------------------------------------------------- ------------- -------------- -------------- --------------
dram__cycles_elapsed.avg.per_second cycle/nsecond 1.506886 1.511546 1.509261
gpc__cycles_elapsed.avg.per_second cycle/nsecond 1.061131 1.064427 1.062812
gpc__cycles_elapsed.max cycle 2538442.000000 2542006.000000 2540214.800000
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 45.049734 45.114206 45.081456
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 45.049734 45.114206 45.081456
gpu__time_duration.sum msecond 2.387328 2.393696 2.389834
l1tex__throughput.avg.pct_of_peak_sustained_active % 22.086863 22.117189 22.101961
lts__throughput.avg.pct_of_peak_sustained_elapsed % 59.407263 59.487880 59.437980
sm__cycles_active.avg cycle 2522794.518519 2526302.138889 2524542.429630
sm__throughput.avg.pct_of_peak_sustained_elapsed % 26.087542 26.124117 26.105620

So essentially we double the throughput!

  • dev deletion-> gpu__time_duration.sum msecond 6.994560 7.028800 7.011821
  • this PR-> gpu__time_duration.sum msecond 3.458112 3.473248 3.465114

@kstppd
Copy link
Owner Author

kstppd commented Oct 6, 2023

Will be kept in PR state for a while

@kstppd kstppd merged commit fd889d9 into dev Oct 10, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant