Test Polars compatibility and performance #368
Replies: 4 comments
-
Thanks @toni-neurosc for mentioning that! Nice video also with impressive speed improvements over pandas. I guess our main aim would be time to store data in an existing data frame / array (either using append/concat after feature computation) and then IO by saving the data frame / array. |
Beta Was this translation helpful? Give feedback.
-
Hi @timonmerk, I opened a discussion about this in #322. I did not consider numpy's .npy format but it's actually not that crazy, since pretty much anyone who wants to use PyNM is going to be doing the data processing in Python for sure. In fact, I already had thought about the problem of the intermediate representation of the feature calculation results, which are currently written in a dictionary, then moved into a Pandas dataframe. I think the dictionary representation might be a bit troublesome, and my idea was to basically flatten the nested structure that can arise in some of the feature calcualtions (e.g. different frequency bands for each channel) and hold the order of each of the features in a separate string array, then return a If we were to do that, maybe we would be able to ditch dataframes altogether. Maybe we need to use them for the GUI for visualization, but in order to send data around parts of the program, I think we could stay within numpy all the time if we wanted. Then storing to |
Beta Was this translation helpful? Give feedback.
-
I played with polars a bit for a different project now, and it's quite amazing! The core problem however, that we currently accumulate all computed features in RAM still needs to be adressed. After my previous calculation I will try to implement sqlite and save features after every iteration. This option was the fastest and should not create too much overhead. Also the computation should not affect the other examples, since pandas or polars provide methods to load from a database. This all comes at a cost not having a human readable csv file.. But we could also save a snippet / head of the features simply for debugging purposes. |
Beta Was this translation helpful? Give feedback.
-
Coincidentally earlier this morning, when I erroneously thought I had fixed the RTD, I preemptively opened a new local branch called "no_pandas" where I wanted to eventually:
|
Beta Was this translation helpful? Give feedback.
-
So Polars is a replacement for Pandas written in Rust (https://pola.rs/) which can be 10-100x faster than Pandas depending on the operations.
However, it's still not fully compatible with certain things, for example, I have read that it can have problems working directly with scikit-learn.
PyNM is using Pandas dataframes to store analysis results, so I think at some we should at least give Polars a go and see if it would fit the project.
Demo of Plotly Dash with Polars https://www.youtube.com/watch?v=_iebrqafOuM
Beta Was this translation helpful? Give feedback.
All reactions