For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362

webcoderz · 2023-06-29T20:12:29Z

Problem
To enable full gpu data science it would be really useful to consider using dask-cudf for data frames larger than single gpu memory or cudf for data frames that can fit entirely on a single gpu, instead of having to pass the data to the cpu for input into the model training.

Solution
Allow a dask dataframe or a cudf dataframe (or both) as input into the model

leoniewgnr · 2023-06-29T23:19:57Z

Hi @webcoderz, thanks for bringing this up! How would this work then? Would the dataframe always stay a dask or cudf dataframe throughout the whole model? And what are the benefits? Would this be faster then?

webcoderz · 2023-06-29T23:54:24Z

Basically it would enable full gpu pipeline and would give a speedup and enable a larger dataframe to pass through , a dask dataframe can hold a very large dataframe, so this would potentially reduce bottlenecks in terms of data size as when reading in GBs of data into a dask dataframe it splits it into x number of partitions(pd.Dataframes) and would also reduce delays when offramping from gpu to cpu. the cool thing about dask is it runs in parallel and can do lazy loading so you can do stuff like delaying compute until the end of your function chain or wherever is more efficient and then execute a function chain in parallel. The original prophet was GIL locked so this wasn't possible, but it should be here. A lot of considerations have to be made when dealing with large data , like one of the df_utils you all do is making a copy of the dataframe which would potentially blow out memory on a really large dataset, my guess would be to start here https://ml.dask.org/pytorch.html and see if dask-ml supports some of the things you're doing here out of the box before proceeding down the path, I can try to help a bit I have to read more of the code to understand better, but typically it's completely possible to use torch and dask together

ourownstory · 2023-06-30T20:55:47Z

@webcoderz Thank you for your excellent suggestion, and highlighting the other device transfer issue. @leoniewgnr is currently parallelizing some of our older code parts to help spead up data processing (She is mostly done). What you are suggesting sounds like the appropriate next step, as I think that you are right about most of our compute being data-pipeline and device-transfer bottlenecked. If you would be down for the challenge, we would love to have a chat with you and discuss how to proceed. BTW, Our dev core team (all of us are open source volunteers) is open for new members. :)

webcoderz · 2023-06-30T22:27:32Z

Hi @ourownstory yea I'd be down! I have a lot on my plate currently but would be happy to help! I think it would be super cool to run this on hundreds of millions of rows of data 😀

beckernick · 2023-07-05T15:51:21Z

Hi! I came across this issue due to the cuDF / Dask mentions (I'm part of the RAPIDS team at NVIDIA that develops cuDF, Dask-CUDA, and a variety of other projects for accelerated computing).

We'd love to see the NeuralProphet community contribute this functionality to the package! I'd be happy to join any discussions on this topic and help try to answer any questions that may come up.

webcoderz · 2023-07-05T16:26:16Z

Perfect! this is excellent

ourownstory · 2023-10-17T23:37:28Z

@webcoderz Please let me know if you may still be game for this challenge. I'd be happy to hop on a call to discuss it.

webcoderz · 2023-10-17T23:42:58Z

I am all here for it! I gave @leoniewgnr my email if you want to reach out to schedule something!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362

For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362

webcoderz commented Jun 29, 2023 •

edited by ourownstory

Loading

leoniewgnr commented Jun 29, 2023

webcoderz commented Jun 29, 2023

ourownstory commented Jun 30, 2023

webcoderz commented Jun 30, 2023

beckernick commented Jul 5, 2023 •

edited

Loading

webcoderz commented Jul 5, 2023

ourownstory commented Oct 17, 2023

webcoderz commented Oct 17, 2023

For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362

For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362

Comments

webcoderz commented Jun 29, 2023 • edited by ourownstory Loading

leoniewgnr commented Jun 29, 2023

webcoderz commented Jun 29, 2023

ourownstory commented Jun 30, 2023

webcoderz commented Jun 30, 2023

beckernick commented Jul 5, 2023 • edited Loading

webcoderz commented Jul 5, 2023

ourownstory commented Oct 17, 2023

webcoderz commented Oct 17, 2023

webcoderz commented Jun 29, 2023 •

edited by ourownstory

Loading

beckernick commented Jul 5, 2023 •

edited

Loading