Reduce stream syncs in split tools #40

kstppd · 2024-02-07T11:13:20Z

Reduces number of stream syncs in spit tools.

markusbattarbee

more optimization notes

markusbattarbee · 2024-02-19T11:11:34Z

include/splitvector/split_tools.h

@@ -779,6 +766,7 @@ void copy_keys_if(split::SplitVector<T, split::split_unified_allocator<T>>& inpu
 const size_t memory_for_pool = 8 * nBlocks * sizeof(uint32_t);
 Cuda_mempool mPool(memory_for_pool, s);
 auto len = copy_keys_if_raw(input, output.data(), rule, nBlocks, mPool, s);
+ SPLIT_CHECK_ERR(split_gpuStreamSynchronize(s));


This one is not needed. Copy_keys_if_raw has a stream sync before returning.

Addressed in a new PR

markusbattarbee · 2024-02-19T11:14:39Z

include/splitvector/split_tools.h

@@ -622,14 +622,12 @@ uint32_t copy_if_raw(split::SplitVector<T, split::split_unified_allocator<T>>& i
 uint32_t* d_counts;


up above I see what I believe are unnecessary syncs:
on lines 595, 603, 611

Same as below

markusbattarbee · 2024-02-19T11:20:57Z

include/splitvector/split_tools.h

@@ -622,14 +622,12 @@ uint32_t copy_if_raw(split::SplitVector<T, split::split_unified_allocator<T>>& i
 uint32_t* d_counts;
 uint32_t* d_offsets;
 d_counts = (uint32_t*)mPool.allocate(nBlocks * sizeof(uint32_t));
- SPLIT_CHECK_ERR(split_gpuStreamSynchronize(s));
 SPLIT_CHECK_ERR(split_gpuMemsetAsync(d_counts, 0, nBlocks * sizeof(uint32_t),s));


d_counts here is passed to scan_reduce_raw as output, and in that kernel it gets directly written to, not incremented. Thus, the memset appears unnecessary, as long as the kernel actually writes to all elements. This same logic check should be done to the other memsets as well.

To be addressed later as I am in a bit of a git wreck

kstppd · 2024-03-01T20:46:38Z

This can be close as it is encapsulated in PR #48.

kstppd added 2 commits February 7, 2024 13:12

Reduce stream syncs in split tools

f266573

Revert a few stream syncs that are actually needed

ca8d997

markusbattarbee reviewed Feb 19, 2024

View reviewed changes

kstppd closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce stream syncs in split tools #40

Reduce stream syncs in split tools #40

kstppd commented Feb 7, 2024

markusbattarbee left a comment

markusbattarbee Feb 19, 2024

kstppd Mar 1, 2024

markusbattarbee Feb 19, 2024

kstppd Mar 1, 2024

markusbattarbee Feb 19, 2024

kstppd Mar 1, 2024

kstppd commented Mar 1, 2024

		@@ -622,14 +622,12 @@ uint32_t copy_if_raw(split::SplitVector<T, split::split_unified_allocator<T>>& i
		uint32_t* d_counts;

Reduce stream syncs in split tools #40

Reduce stream syncs in split tools #40

Conversation

kstppd commented Feb 7, 2024

markusbattarbee left a comment

Choose a reason for hiding this comment

markusbattarbee Feb 19, 2024

Choose a reason for hiding this comment

kstppd Mar 1, 2024

Choose a reason for hiding this comment

markusbattarbee Feb 19, 2024

Choose a reason for hiding this comment

kstppd Mar 1, 2024

Choose a reason for hiding this comment

markusbattarbee Feb 19, 2024

Choose a reason for hiding this comment

kstppd Mar 1, 2024

Choose a reason for hiding this comment

kstppd commented Mar 1, 2024