Skip to content

Does cub use global memory to do prefix scan? #1996

Answered by elstehle
cyk2018 asked this question in CUB
Discussion options

You must be logged in to vote

As far as I know, Single-pass Parallel Prefix Scan with Decoupled Look-back describes what CUB is doing.

@pauleonix is exactly right. CUB is using the single-pass prefix scan to minimize incurred memory traffic. That is, to communicate partial and inclusive prefix scan results of each tile (I am referring to "a tile", as the items that one thread block processes). To compute the results within one tile, you can assume that the CUB implementation is at least as sophisticated as the shared memory variant you were referring to above.

In general, I would strongly advise to use the CUB algorithms. There's a lot of thought that went into the design of these algorithms and you don't have to ma…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@cyk2018
Comment options

@elstehle
Comment options

Answer selected by cyk2018
@cyk2018
Comment options

@jrhemstad
Comment options

@cyk2018
Comment options

@cyk2018
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
CUB
Labels
None yet
4 participants