Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better names for classes in code #2

Open
mnlevy1981 opened this issue Nov 28, 2018 · 1 comment
Open

Better names for classes in code #2

mnlevy1981 opened this issue Nov 28, 2018 · 1 comment

Comments

@mnlevy1981
Copy link
Contributor

Issue Description

Right now, the way we have organized the data in our various classes is rather confusing. @klindsay28 and I chatted a bit about how the data is organized, and I think that the following is an accurate representation of the result of that conversation.

Current Hierarchy of Classes

  1. Data is stored in an xarray dataset

  2. A "collection" is a container class that has the xarray dataset and the following

    • Location of data on disk
    • Source of data: CESM output, observational data, etc
    • type of data: history files, time series, climatology, etc
    • Role of data: should dataset be used as a reference for "truth"?
    • Years of data requested for analysis
    • Variables from the dataset to include in analysis
    • Operations to perform on data
  3. An "analysis element" is a container class that has a dictionary of "collections" and the following

    • Description of analysis being done
    • Variables to analyze
    • Plots / tables requested
    • Where on disk to write output
    • Horizontal grid / vertical levels to use for output (if needed)

Proposed Hierarchy of Classes

  1. Data is stored in an xarray dataset

  2. There is a container class that has an xarray dataset and metadata

    • Location of data on disk
    • Source of data: CESM output, observational data, etc
    • type of data: history files, time series, climatology, etc
  3. There is a dictionary of these container class objects (a "collection")

  4. There is a container class of the collection that contains information like

    • Description of analysis being done
    • Variables to analyze
    • Time period to use for analysis (default = all available data)
    • Which collection should be considered "truth"?
    • Plots / tables requested
    • Where on disk to write output
    • Horizontal grid / vertical levels to use for output (if needed)

Notable changes

  1. The container class of collections is the only source of variables to analyze.

    • Users will specify at the highest level if a variable should be ignored in some datasets (but included in others)
    • For the time being, requesting a variable that is not actually in the dataset will result in an error. We can open another issue ticket to discuss whether this is the best behavior, or if missing variables should just be omitted from analysis.
  2. The container class for the xarray dataset no longer explicitly specifies what operations are needed on the dataset -- this should be determined by the operations requested in the container class of collections.

    • E.g. plot_state knows it needs monthly climatologies, so it will request monthly climatologies of each dataset. It's up to the collection of datasets to know if the data is already a climatology or if another operation is needed.
  3. The container class for the xarray dataset no longer explicitly specifies which dataset should be used as reference for the truth


The issue is that we need names for the container classes. I think the container of datasets can now be called a data_source because the constructor only needs information about where the dataset should be read from and what kind of data it is. The compute_monthly_climatology function still exists at this level, although at some point we will replace the xarray dataset object with the esmlab dataset object and then the compute_monthly_climatology function will come from esmlab.

I don't have a good name for the container of collections + metadata... we've been calling it an analysis element, but it's odd to call something that contains multiple collections an element. Maybe it's a diagnostic evaluation object?

@mnlevy1981
Copy link
Contributor Author

As of 2e0733c, the code is mostly in line with the proposed hierarchy above - the time period for the analysis is currently still part of the collection class (corresponding to 2 in the list of four levels) but everything else is tied to the correct component. I'm going to rename collection -> data_source and similarly collections -> data_sources (though it will be hard to break the habit of referring to multiple data sources as a single collection).

I haven't yet come up with a good name for what we currently call the analysis element.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant