Data Exploration and Software Tools and Resources at WRC #14
Replies: 1 comment
-
@Roman-Battisti mentioned dask.dataframe as a potentially useful extension to pandas. FWIW, I have been using it to read and manipulate some biggish data sets (these are cruise tracks with about 90 columns and anywhere from 1.9 million to 54 million rows, read from an ERDDAP and saved to parquet file for re-reading via dask.dataframe). dask.dataframe is a direct analog to pandas, except with lazy loading and parallel processing so if you are used to using pandas you'll find this pretty straight forward. So far, I've experimented with using datashader to make an image of all of the cruise tracks to overlay onto a map via Plotly, but eventually I'll be trying to take advantage of the supposedly fast sub-setting to pull out either geographic sub-sets or groups of individual cruises for plotting. I have also tried a similar tool called vaex. It's less of a direct analog to pandas (though similar). I was mystified for a long time how to even dereference the objects once the sub-set was completed, but finally figured it out using the examples from the Dash Enterprise sample app for vaex. Either of these tools could be used in a notebook, but if you happen to want to build a Web app around these tools head over to our Dash Enterprise [lab network or VPN required] instance for examples. *edit to note Dash portal requires VPN |
Beta Was this translation helpful? Give feedback.
-
In response to Oct 11th TUG Meeting - Please feel free to use this discussion formum to share software, approaches, and challenges.
The initial discussion was inpired by the following tweet - https://twitter.com/simonw/status/1572285367382061057 which asked the question:
Links to useful resources from the discussion:
Beta Was this translation helpful? Give feedback.
All reactions