Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop metadata solution for reporting #77

Open
fschreyer opened this issue Jul 28, 2021 · 4 comments
Open

Develop metadata solution for reporting #77

fschreyer opened this issue Jul 28, 2021 · 4 comments
Assignees

Comments

@fschreyer
Copy link
Contributor

Dear all,

following the remind2 task force meeting (see notes here), we decided that we need a solution to store metadata about mif files. In particular, a list of output variables with their definitions. From the discussion, I understood that the solution should include

  • a system to automatically generate a list of all REMIND output variables with their definitions
  • a system to make developers add/modify variable definitions to that list
  • ideally: a system of quality flags for each of the variables (e.g. "model input", "high confidence", "low confidence")
  • information on the origin/setup of the scenario in the mif file or a related metadata file in the run folder to make it easier to trace it back mif files to the actual runs and their configuration

I guess, we do not need all features at once but the key aspect would be to have a system to document variable definitions. Please add or correct if I misrepresented something.

Best,
Felix

@cchrisgong
Copy link
Contributor

cchrisgong commented Jul 28, 2021

On the last point, I opened an issue in magclass and quitte:
pik-piam/magclass#101
pik-piam/quitte#19

In my opinion, we can add the run path, model version, reporting library version as comments at the top of mif file. Won't be more than 3 lines hence won't enlarge mif size. R scripts reading mif file will automatically skip these lines

variable documentation imo should be in a separate file since it might be large if all variables are defined. However, the comment header above can point to the path of this metadata file so people can trace a mif to both the run and the bespoke variable definitions for cross comparison

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Turning the eye of Sauron LOD-GEOSS on this issue …
As for variable definitions, this is an ongoing task in LOD-GEOSS, where this is pursued in tedious, painful detail. @giannou is involved with that, too. In the meantime, https://github.com/openENTRANCE/nomenclature might be a useful building block for this.

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

On the last point, I opened an issue in magclass and quitte:
pik-piam/magclass#101
pik-piam/quitte#19

I updated quitte (0.3093.0). read.quitte() now ignores any header of comments, and write.mif() can add such a header. But since the number of .mif files written by write.mif() per year is probably in the single digits, this feature depends on magclass, and RSE will need some prodding to look into this.


  • a system to automatically generate a list of all REMIND output variables with their definitions

Tall order. This will require a system parallel to remind2::convGDX2MIF() which has to be updated along the individual reportX() functions.
As a sketch, we could have a function variable_definitions() that in turn calls variable_definition_X() functions that return definitions for all variables returned by reportX() functions. The variable_definition_X() functions would go into the same files as the reportX() functions and we could automatically test that all variables returned by convGDX2MIF() also appear in the output of variable_definitions().
Problem is, the set of variables in the returned .mif file depends on the module realisations of the .gdx it is based on. Or at least will, since industry/fixed_shares will not report subsector information, only aggregate industry information. So that is something to be worked out. Possibly we can test the output of several .mif files with different realisations collectively against the variable definitions.

  • a system to make developers add/modify variable definitions to that list

Since variables would only be added or changed when code is added or changed to/in remind2, that system should be remind2 code as well.

  • ideally: a system of quality flags for each of the variables (e.g. "model input", "high confidence", "low confidence")

We would need a consensus on what these flags mean. There's some work on data quality being done in LOD-GEOSS, I can poke around if they did come up with something useful. "Confidence" is a term implying a quality that isn't actually what we want to communicate. Probably it is more useful to discern between "proper model outputs" and "downscaled figures".

@fschreyer
Copy link
Contributor Author

Ok, thanks for the comments, Michaja. I added you to the reporting task force email where we will meet next week again. We can discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants