Replace use of `pandas` in `pyrealm.demography` #292

davidorme · 2024-09-21T21:29:48Z

Description

In #277, I used pandas.Dataframe within Flora to provide an array-like view onto the plant functional type data. That then led to using it for Community.cohort_data (#282) and also typing the inputs to the T Model functions as pandas.Series. That turns out to be awkward for a number of core use cases:

See comment Add helper functions for PFT geometry and canopy shape. #290 (comment) about the awkwardness of using this structure in creating allometry predictions for a Flora
See comment Implementation of the canopy model #288 (comment) about the repeated casting of Series to np.ndarray when broadcasting needed.
See issue Look at replacing pandas in pyrealm.demography #291 which summarises the issues and describes what this PR should do.

This PR starts to replace the usage with a pure numpy alternative. There is code in the currently open PR #288 that will also need updating, but this seems like the right way to go now.

The code:

Replaces the use of pandas structures throughout.
Adds a prototype validation function and unit test for that function to validate array inputs to T model functions.
I have not yet added that validation to the functions - wanted to wait for feedback before moving on.

Fixes #291 (issue)

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

… dicts

codecov-commenter · 2024-09-21T21:35:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.22%. Comparing base (1f315ba) to head (97ad0ed).
Report is 88 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #292      +/-   ##
===========================================
- Coverage    95.29%   95.22%   -0.07%     
===========================================
  Files           28       32       +4     
  Lines         1720     2115     +395     
===========================================
+ Hits          1639     2014     +375     
- Misses          81      101      +20

Flag	Coverage Δ
	`95.22% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

j-emberton

The implementation looks fine to me overall.

One issue to fix:
There's a straggler issue in that the community.py module docstring still makes reference to using pandas as the internal data vehicle.

Internally, the cohort data in the Community class is represented as a pandas dataframe, which makes it possible to update cohort attributes in parallel across all cohorts but also provide a clean interface for adding and removing cohorts to a Community.

And one query:
I see the type hints for the new NDArrays are now float32. Is there a specific reason for this? Will Python type promotion not mean that this doesn't survive contact with any float64 data?

davidorme · 2024-09-23T12:36:50Z

One issue to fix: There's a straggler issue in that the community.py module docstring still makes reference to using pandas as the internal data vehicle.

Fixed

And one query: I see the type hints for the new NDArrays are now float32. Is there a specific reason for this? Will Python type promotion not mean that this doesn't survive contact with any float64 data?

That's a good point - the typing of arrays has got cleaner and I think the earlier code was written when this was not such a transparent thing to do. I don't think we need the precision of float64 so we use less RAM by doing this, but equally I don't know if that has speed implications (good or bad) with 64 bit architecture.

So - not sure. I guess I'd like to leave this PR as is and tackle the array typing more widely as another issue.

j-emberton

Thanks for sorting the docstring. Happy for the float32 types hints to be left in place for the time being. My gut feel is that any data read in from csv/toml etc will default to float64, and would need to be explicitly converted to float32.

davidorme added 7 commits September 20, 2024 22:31

Flora.data now a dict of arrays

e40ae25

Re-typing Community.cohort_data to numpy, splitting into PFT v cohort…

f9f69f6

… dicts

Converting t_model_functions to use NDArray not pd.Series

dc541ce

Fixing up indexing of cohort PFT names onto flora data

2c41147

Fixing display of community cohort data in docstring

3e73ba5

Draft validation function for T Model arguments

621531a

Fixing up _validate_t_model_args and adding tests

f291ebf

davidorme linked an issue Sep 21, 2024 that may be closed by this pull request

Look at replacing pandas in pyrealm.demography #291

Closed

davidorme added this to the Demography and allocation model milestone Sep 21, 2024

davidorme requested review from omarjamil and j-emberton September 21, 2024 21:33

j-emberton requested changes Sep 23, 2024

View reviewed changes

Out of date docstring

97ad0ed

davidorme requested a review from j-emberton September 23, 2024 12:37

davidorme mentioned this pull request Sep 23, 2024

Add helper functions for PFT geometry and canopy shape. #290

Closed

j-emberton approved these changes Sep 23, 2024

View reviewed changes

davidorme merged commit 8793252 into develop Sep 23, 2024
12 checks passed

davidorme deleted the 291-look-at-replacing-pandas-in-pyrealmdemography branch September 23, 2024 13:30

davidorme mentioned this pull request Sep 23, 2024

Implementation of the canopy model #288

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace use of `pandas` in `pyrealm.demography` #292

Replace use of `pandas` in `pyrealm.demography` #292

davidorme commented Sep 21, 2024 •

edited

Loading

codecov-commenter commented Sep 21, 2024 •

edited

Loading

j-emberton left a comment

davidorme commented Sep 23, 2024

j-emberton left a comment •

edited

Loading

Replace use of pandas in pyrealm.demography #292

Replace use of pandas in pyrealm.demography #292

Conversation

davidorme commented Sep 21, 2024 • edited Loading

Description

Type of change

Key checklist

Further checks

codecov-commenter commented Sep 21, 2024 • edited Loading

Codecov Report

j-emberton left a comment

Choose a reason for hiding this comment

davidorme commented Sep 23, 2024

j-emberton left a comment • edited Loading

Choose a reason for hiding this comment

Replace use of `pandas` in `pyrealm.demography` #292

Replace use of `pandas` in `pyrealm.demography` #292

davidorme commented Sep 21, 2024 •

edited

Loading

codecov-commenter commented Sep 21, 2024 •

edited

Loading

j-emberton left a comment •

edited

Loading