Best way to iterate over a device_vector() by size N chunks #1051
Replies: 0 comments 2 replies
-
I don't see a way to combine the existing Thrust iterators in such a way to implement what you're looking for. The zipped iterator approach would be the closest, but there are issues. The chunk size could be handled by using a factory function, e.g.
But this wouldn't handle all of your usecases, since there is no way to specify the stride -- incrementing the chunk iterator will always increment the component iterators by one. A custom iterator sounds like the best solution here. You may also be interested in #1575 -- this PR adds a new iterator that could be used to implement your usecase. |
Beta Was this translation helpful? Give feedback.
-
I like Option 3 best as well. You could also consider having it return a
This formulation assumes the adapted range is equally divisible by the chunk size. You'd need to do a little more fix up to handle the boundary conditions otherwise. This isn't helpful today other than as something to look forward to in the future, but the C++ ranges library is getting a |
Beta Was this translation helpful? Give feedback.
-
I would like to find a eligant method to create a transform_iterator that can iterate over a device_vector by size N chuncks.
Elements in each size-N chunks are consequtive. size-N chunk may overlap with each other. but they are evenly spaced.
e.g.: I can iterate over device_vector elements [0,1,2], [1,2,3], [3,4,5] ...
or I can also iterate over device_vector elements [0,1,2,3],[4,5,6,7],[8,9,10,11]
The UnaryOperator of the transform_iterator will take ALL elements of the size-N chunk and calculate one result.
Method 1:
I tried the following method, by creating a device_ptr in the operator from the reference passed to the operator() function, but it doesn't work. (note this is what I try to iterate over device_vector by elements [0,1,2], [1,2,3], [3,4,5] ...)
The *(p+1) and *(p+2) cannot get the correct value from the device_vector. But when the algorithm runs on host_vector it gives correct result.
Other two methods I can think of are:
Method 2:
for the case of iterating over device_vector by each 3 elements [0,1,2], [1,2,3], [3,4,5] ...
i can create a zip iterator:
zip_iter(thrust::make_tuple(my_device_vector.begin(), my_device_vector.begin()+1, my_device_vector.begin()+2))
And the chunk_operator() will process each tuple of 3 elements.
Apparently this method is difficult to change the chunk size N. If I want to iterate by size 8 chunk, or size 16 chunk, the code has to be changed.
Method 3:
To make it flexible to iterate of any size N chunk, I can also create a customized iterator which produces "pairs of iterators" over a given device_vector().
each pair of iterators marks the begin and end of the size-N chunk.
And the chunk_operator() shall be written to take the pair of iterators as input, and make the calculation between the begin and end.
To me option 3 seems to be the best option so far. (though I haven't implemented yet)
Please give your recommendations on what you think is the simplest and fastest way of implementing such transform_iterator() by size-N chunks. Thanks a lot!!
Beta Was this translation helpful? Give feedback.
All reactions