Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Community object init and factory methods. #282

Merged
merged 26 commits into from
Sep 16, 2024

Conversation

davidorme
Copy link
Collaborator

@davidorme davidorme commented Sep 12, 2024

Description

This PR brings in a Community object based of @AmyOctoCat's work in PR #230. From the features listed in #278, we have a (mostly) complete implementation with some elements postponed for later PRs:

  • provide something close to a final definition of the Community building on @AmyOctoCat's work and review by @davidorme,

Note

  • The community object takes cohort data as arrays, merges in PFT traits for those arrays and then calculates the cored predictions of the T Model for the initialisation data.
  • I've switched to using a pandas.DataFrame for storing cohort data. I think that reverses something I told @AmyOctoCat (sorry!), but although there is overhead in using the Dataframe rather than simple arrays as attributes, it does simplify the namespace of the class a lot and it is also much cleaner to add and delete cohorts as rows rather than having to iterate over individual attributes.
  • methods to create instances from data in files - and also handle multiple communities in files.

Note

  • This does not implement loading multiple Community objects yet - might as well sense check the single case before doing that!
  • The serialisation formats for CSV, JSON and TOML are revised
  • Factory methods are now implemented to load from those formats.
  • Marshmallow schemas are used to validate and post-process all of those formats into the format needed to initialise a Community object.
  • test that code.

Note

  • It currently tests the T Model functions code rather lightly and it would be better to have deeper regression tests against the R implementation. I'd like to break that out into a new PR as complete tests would include some elements missing from the T model functions at the moment and I don't want to bloat this basic Community PR.
  • provide API docs

This PR also updates Flora to provide a pandas.Dataframe view of the flora data, which makes it super easy to merge onto the cohort data.

There's some complexity in the serialisation formats and schemas:

  • the Community object needs single values for cell_area and cell_id but arrays for the cohort data.
  • CSV is great for rows of cohort data, but isn't structured for the single values, so I've imposed those as needing to be constant fields in the CSV. That is easy for users to maintain and feeds forward into having multiple cells in a file.
  • JSON/TOML allows structured data, but editing arrays of cohort data is ugly and confusing because they're not neatly aligned in grids. So here, I've made the cohort data a list of objects, which is much easier to maintain in these formats.

The validation schemas take these two slightly different approaches, validate them and then coerce to the common arguments used for initialisation.

Fixes #278

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@davidorme davidorme linked an issue Sep 12, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 93.82716% with 10 lines in your changes missing coverage. Please review.

Project coverage is 95.13%. Comparing base (1f315ba) to head (89e5670).
Report is 48 commits behind head on develop.

Files with missing lines Patch % Lines
pyrealm/demography/community.py 93.54% 8 Missing ⚠️
pyrealm/demography/t_model_functions.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #282      +/-   ##
===========================================
- Coverage    95.29%   95.13%   -0.16%     
===========================================
  Files           28       32       +4     
  Lines         1720     2077     +357     
===========================================
+ Hits          1639     1976     +337     
- Misses          81      101      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidorme
Copy link
Collaborator Author

There is still a doc failure - sphinx is failing to link to pandas DataFrame from the flora.data attribute. It is linking other refs to pandas.DataFrame but not here. No idea why.

We could just add this to the growing pile of noise in our nitpick_ignore section in the sphinx conf.py but I think there's something wrong in the setup - we surely shouldn't be getting this much trouble from intersphinx and numpy?

@j-emberton
Copy link
Collaborator

j-emberton commented Sep 13, 2024

I've had a good poke around this code and it looks ok to my eye. I can't claim to be completely abreast of the science, but I like the fact you're rolling out this 'schema' approach that flows from what we discussed for the prior flora PR. The approach seems sensible.

I did I quick check and there is a GPU drop (cuDF) in for pandas. Hopefully this should mean that all this is compatible with the array api stuff I've been talking about.

Happy to approve once the docs are fixed

@j-emberton
Copy link
Collaborator

There is still a doc failure - sphinx is failing to link to pandas DataFrame from the flora.data attribute. It is linking other refs to pandas.DataFrame but not here. No idea why.

We could just add this to the growing pile of noise in our nitpick_ignore section in the sphinx conf.py but I think there's something wrong in the setup - we surely shouldn't be getting this much trouble from intersphinx and numpy?

I haven't had time to dig into this issue at all I'm afraid. I'd hope we can fix it rather than ignore.

@davidorme
Copy link
Collaborator Author

@j-emberton Thanks for the worked example and other requests. Should all be good now.

I have just nitpick-ignored the intersphinx issue with pandas. I'd much rather solve it properly, but this kind of thing always seems to turn into a huge time sink and I don't have the time now.

@j-emberton j-emberton self-requested a review September 16, 2024 09:40
Copy link
Collaborator

@j-emberton j-emberton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion and changes, happy to review

@davidorme davidorme merged commit ce78db1 into develop Sep 16, 2024
12 checks passed
@davidorme davidorme deleted the 278-implementation-of-community-object branch September 16, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Implementation of Community object
3 participants