A LinkML schema for Datasheets for Datasets model as published in Datasheets for Datasets. Inspired by datasheets as used in the electronics and other industries, Gebru et al. proposed that every dataset "be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on". To this end the authors create a series of topics and over 50 questions addressing different aspects of datasets, also useful in an AI/ML context. An example of completed datasheet for datasets can be found here: Structured dataset documentation: a datasheet for CheXpert
Google is working with a different model called Data Cards, which in practice is close to the original Datasheets for Datasets template.
This repository stores a LinkML schema representation for the original Datasheets for Datasets model, representing the topics, sets of questions, and expected entities and fields in the answers (work in progress). Beyond a less structured markdown template for this model (e.g. template for datasheet for dataset) we are not aware of any other structured form representing Datasheets for Datasets.
We are also tracking related developments, such as augmented Datasheets for Datasets models as in Augmented Datasheets for Speech Datasets and Ethical Decision-Making.
- examples/ - example data
- project/ - project files (do not edit these)
- src/ - source files (edit these)
- tests/ - Python tests
make all
: make everythingmake deploy
: deploys site
This project was made with linkml-project-cookiecutter.