You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that cuDF does not utilize concurrent streaming multiprocessors in the kernels launched during performing cudf.read_csv operation.
The main idea to be implemented is to utilize the concept of copy/compute overlap by utilizing non-default streams for different chunks of data to create the possibility of overlapping to optimize the operation. The kernels will be prioritized based on the execution percentage of the kernels launched (as per the nsight profiler) during the read_csv() operation launch to analyze the effect of using streaming multiprocessors. On top of using optimizing on the SM level, the optimization can also extend on the MGPU level with on top of concurrent SMs.
The text was updated successfully, but these errors were encountered:
This is a great question. Overall, cuDF (and more recently cudf.pandas) attempts to be as faithful to the pandas API as possible. This means that we're not planning on exposing streams as part of the public cuDF API any time in the near future, so you wouldn't be able to implement the kind of custom streaming strategy you're envisioning.
However, exposing streams is on our roadmap for pylibcudf, a lower level library that cuDF uses under the hood, which supports csv reading. I'm not able to provide an ETA for this feature at at this time, but it's likely there will come a future state where you can use pylibcudf to implement your custom chunked read and take over with cuDF/cudf.pandas afterwards.
I have noticed that cuDF does not utilize concurrent streaming multiprocessors in the kernels launched during performing cudf.read_csv operation.
The main idea to be implemented is to utilize the concept of copy/compute overlap by utilizing non-default streams for different chunks of data to create the possibility of overlapping to optimize the operation. The kernels will be prioritized based on the execution percentage of the kernels launched (as per the nsight profiler) during the read_csv() operation launch to analyze the effect of using streaming multiprocessors. On top of using optimizing on the SM level, the optimization can also extend on the MGPU level with on top of concurrent SMs.
The text was updated successfully, but these errors were encountered: