Best way to iterate over a device_vector() by size N chunks #1051

shansong99 · 2022-04-28T09:45:29Z

shansong99
Apr 28, 2022

I would like to find a eligant method to create a transform_iterator that can iterate over a device_vector by size N chuncks.
Elements in each size-N chunks are consequtive. size-N chunk may overlap with each other. but they are evenly spaced.

e.g.: I can iterate over device_vector elements [0,1,2], [1,2,3], [3,4,5] ...
or I can also iterate over device_vector elements [0,1,2,3],[4,5,6,7],[8,9,10,11]

The UnaryOperator of the transform_iterator will take ALL elements of the size-N chunk and calculate one result.

Method 1:
I tried the following method, by creating a device_ptr in the operator from the reference passed to the operator() function, but it doesn't work. (note this is what I try to iterate over device_vector by elements [0,1,2], [1,2,3], [3,4,5] ...)
The *(p+1) and *(p+2) cannot get the correct value from the device_vector. But when the algorithm runs on host_vector it gives correct result.

	struct chunck_operator
	{
		__thrust_exec_check_disable__
		__host__ __device__
		ResultType operator()(const float& sample) const {
			thrust::device_ptr<const float> p = thrust::device_pointer_cast(&sample);
			//try to access *(p+1), *(p+2)
			//calculate the result from sample *(p+1), and *(p+2)
		}
	};

	thrust::transform_iterator<chunck_operator, device_vector<float>::iterator> chunk_transform_iter(my_device_vector.begin() + 2, chunck_operator());

Other two methods I can think of are:

Method 2:

for the case of iterating over device_vector by each 3 elements [0,1,2], [1,2,3], [3,4,5] ...
i can create a zip iterator:

	zip_iter(thrust::make_tuple(my_device_vector.begin(), my_device_vector.begin()+1, my_device_vector.begin()+2))

And the chunk_operator() will process each tuple of 3 elements.
Apparently this method is difficult to change the chunk size N. If I want to iterate by size 8 chunk, or size 16 chunk, the code has to be changed.

Method 3:

To make it flexible to iterate of any size N chunk, I can also create a customized iterator which produces "pairs of iterators" over a given device_vector().
each pair of iterators marks the begin and end of the size-N chunk.
And the chunk_operator() shall be written to take the pair of iterators as input, and make the calculation between the begin and end.

To me option 3 seems to be the best option so far. (though I haven't implemented yet)

Please give your recommendations on what you think is the simplest and fastest way of implementing such transform_iterator() by size-N chunks. Thanks a lot!!

alliepiper · 2022-05-02T16:27:18Z

alliepiper
May 2, 2022
Collaborator

I don't see a way to combine the existing Thrust iterators in such a way to implement what you're looking for. The zipped iterator approach would be the closest, but there are issues. The chunk size could be handled by using a factory function, e.g.

template <int chunk_size, typename InputIterator>
auto make_chunk_iterator(InputIterator it)
{
  // Construct and return a zip of `chunk_size` iterators here
}

// Usage:
thrust::device_vector<T> vec = ...;
auto chunk_iter = make_chunk_iterator<3>(vec.begin());

But this wouldn't handle all of your usecases, since there is no way to specify the stride -- incrementing the chunk iterator will always increment the component iterators by one.

A custom iterator sounds like the best solution here. You may also be interested in #1575 -- this PR adds a new iterator that could be used to implement your usecase.

0 replies

jrhemstad · 2022-05-02T22:00:16Z

jrhemstad
May 2, 2022
Maintainer

I like Option 3 best as well. You could also consider having it return a span instead of a pair of iterators if the adapted iterator is always contiguous.

template <typename It>
auto make_chunk_iterator(It it, size_t chunk_size){
   return thrust::transform_iterator(thrust::counting_iterator{0} , [chunk_size, it](auto i){ return {thrust::advance(it, i), thrust::advance(it, i + chunk_size) } };
}

This formulation assumes the adapted range is equally divisible by the chunk size. You'd need to do a little more fix up to handle the boundary conditions otherwise.

This isn't helpful today other than as something to look forward to in the future, but the C++ ranges library is getting a std::views::chunk and std::views::slide that do exactly what you want here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2214r0.html#the-sliding-family

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to iterate over a device_vector() by size N chunks #1051

{{title}}

Replies: 0 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Best way to iterate over a device_vector() by size N chunks #1051

shansong99 Apr 28, 2022

Replies: 0 comments · 2 replies

alliepiper May 2, 2022 Collaborator

jrhemstad May 2, 2022 Maintainer

shansong99
Apr 28, 2022

Replies: 0 comments 2 replies

alliepiper
May 2, 2022
Collaborator

jrhemstad
May 2, 2022
Maintainer