Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Arrow for Performance, GIS Data Models, & Python API #189

Open
aufdenkampe opened this issue Sep 30, 2024 · 1 comment
Open

Apache Arrow for Performance, GIS Data Models, & Python API #189

aufdenkampe opened this issue Sep 30, 2024 · 1 comment

Comments

@aufdenkampe
Copy link

aufdenkampe commented Sep 30, 2024

@cbuahin, our team is looking at how we might tightly couple SWMM with other models, and I came across this exciting roadmap you presented in February: https://www.icwmm.org/files/2024-C033-06.pdf

Its really awesome that your roadmap aims to:

  • Improve SWMM’s
    Computational Performance (v5.4.0)
  • Adopt a GIS Based Data Model for
    SWMM
  • Advancing SWMM’s API
    and Runtime Interaction Capabilities (to support component-based modeling)

As I looked over your slides, it occurred to me (based on my recent experience here: WikiWatershed/global-hydrography#1), that adopting Apache Arrow could help you with all three of those goals.

It's very possible that you're already considering this approach, given that Apache Arrow's memory/cache format (https://arrow.apache.org/overview/) looks so similar to your slide on performance. If so, then consider this issue a +1 for that approach.

If not, then it might be worth my while to point out a few additional benefits of Apache Arrow.

  • Libraries for nearly every language (including C, C++, Python) make it easy to implement.
  • The goal of Apache Arrow is fast zero-copy data exchange among software components, even when written in different languages.
  • Arrow libraries already have exceptionally fast IO implementations to Apache Parquet, CSV, and other formats that are widely being adopted by other projects, including recently becoming the preferred IO engine for Python Pandas (optionally since v2.0 and required/default coming with v3.0).
  • GDAL introduced some Arrow integration with release v3.8 and expanded it with v3.9.
@cbuahin
Copy link
Collaborator

cbuahin commented Sep 30, 2024

@aufdenkampe, thanks for sharing this neat project! It appears it has many of the elements that will support some of the work we are embarking on. I will look into it to see if we are able to utilize it. If you have some specific suggestions regarding implementation, I am happy to discuss.

Generally, it is a tricky proposition to tie SWMM to external dependencies. However, there are some many great libraries out there that provide so much value that it is becomes harder to justify not utilizing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants