You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
multi-get_json_object currently sorts the paths before chunking them up. This works fairly well in reducing thread divergence in the kernels, but it is not perfect.
The current kernel will process an input row per warp. Each warp can then handle up to 32 different threads/JSON paths. But because of memory limitations we often cannot match that. #2247 is to reduce this and if we do it enough we can increase the performance for some jobs significantly. But in my tests when we start to get large numbers of paths running in a warp the performance of sorting can be worse than what we did before, with hashing the paths.
I think we can fix all of this with some simple changes. Effectively the increased computation cost of adding a path to a warp is all about when does that path diverge from any other path in the warp. The divergence will happen at each level of processing a path. So in general we want to cluster the paths by prefix, which is why sorting works. There appears to be little if any added cost in running a diverging thread in the same warp vs another warp, but there can be a big win if we don't arbitrarily split two paths that are very close to each other.
The idea would be to create a tree of paths. We can then find a sub-tree with enough leaf nodes that it would fit in our budget/warp. This is kind of a bin-packing problem so we can be a bit greedy and start by looking at the largest sub-trees first and then filling it in with smaller branches.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
multi-get_json_object currently sorts the paths before chunking them up. This works fairly well in reducing thread divergence in the kernels, but it is not perfect.
The current kernel will process an input row per warp. Each warp can then handle up to 32 different threads/JSON paths. But because of memory limitations we often cannot match that. #2247 is to reduce this and if we do it enough we can increase the performance for some jobs significantly. But in my tests when we start to get large numbers of paths running in a warp the performance of sorting can be worse than what we did before, with hashing the paths.
I think we can fix all of this with some simple changes. Effectively the increased computation cost of adding a path to a warp is all about when does that path diverge from any other path in the warp. The divergence will happen at each level of processing a path. So in general we want to cluster the paths by prefix, which is why sorting works. There appears to be little if any added cost in running a diverging thread in the same warp vs another warp, but there can be a big win if we don't arbitrarily split two paths that are very close to each other.
The idea would be to create a tree of paths. We can then find a sub-tree with enough leaf nodes that it would fit in our budget/warp. This is kind of a bin-packing problem so we can be a bit greedy and start by looking at the largest sub-trees first and then filling it in with smaller branches.
The text was updated successfully, but these errors were encountered: