Existing tools for managing our metadata? #2546
zaneselvans
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
Sounds like we need some kind of meta-tool to standardize the metadata across all these frameworks :) |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi there, we are developing a metadata standard for energy data: https://github.com/OpenEnergyPlatform/oemetadata |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We have a lot of metadata describing the hundreds of tables and thousands of columns that are part of PUDL, and a somewhat homebrew system for managing it, using a mix of Pydantic and SQLAlchemy. It's hard to believe that we're facing a novel problem here, and I wonder if there are some off-the-shelf solutions we could be using or working toward?
Exactly how and where the data ends up being output can change over time, and we don't want to have to maintain and synchronize several different definitions manually, so at a high level, we have generic descriptions of the tables (Resources) and columns (Fields), originally modeled on the Frictionless tabular data package standard. Database-like information is stored in the Table Schemas that are part of the Resource definition.
What do we do with our metadata?
Manage Data Types
Create database schemas
Validate Data Contents / Structure
Export metadata for use by other systems
datapackage.json
files that will annotate SQLite, Parquet, or other file-based tabular data products we distribute. This will allow the creation of relatively lightweight data catalogs, and export to systems like Kaggle.Build human-readable documentation
What don't we do with our metadata?
ORM stuff
Existing tools
All of these intersect with the above needs somehow, and we're already using many of them. Is there a better way for them to be glued together into a coherent whole?
dbt
Beta Was this translation helpful? Give feedback.
All reactions