-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace use of pandas
in pyrealm.demography
#292
Replace use of pandas
in pyrealm.demography
#292
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #292 +/- ##
===========================================
- Coverage 95.29% 95.22% -0.07%
===========================================
Files 28 32 +4
Lines 1720 2115 +395
===========================================
+ Hits 1639 2014 +375
- Misses 81 101 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks fine to me overall.
One issue to fix:
There's a straggler issue in that the community.py module docstring still makes reference to using pandas as the internal data vehicle.
Internally, the cohort data in the Community class is represented as a pandas dataframe, which makes it possible to update cohort attributes in parallel across all cohorts but also provide a clean interface for adding and removing cohorts to a Community.
And one query:
I see the type hints for the new NDArrays are now float32
. Is there a specific reason for this? Will Python type promotion not mean that this doesn't survive contact with any float64
data?
Fixed
That's a good point - the typing of arrays has got cleaner and I think the earlier code was written when this was not such a transparent thing to do. I don't think we need the precision of So - not sure. I guess I'd like to leave this PR as is and tackle the array typing more widely as another issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sorting the docstring. Happy for the float32 types hints to be left in place for the time being. My gut feel is that any data read in from csv/toml etc will default to float64, and would need to be explicitly converted to float32.
Description
In #277, I used
pandas.Dataframe
withinFlora
to provide an array-like view onto the plant functional type data. That then led to using it forCommunity.cohort_data
(#282) and also typing the inputs to the T Model functions aspandas.Series
. That turns out to be awkward for a number of core use cases:Flora
Series
tonp.ndarray
when broadcasting needed.pandas
inpyrealm.demography
#291 which summarises the issues and describes what this PR should do.This PR starts to replace the usage with a pure numpy alternative. There is code in the currently open PR #288 that will also need updating, but this seems like the right way to go now.
The code:
Fixes #291 (issue)
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks