A dataset datasheet for the ChestX-ray8 dataset, a.k.a. ChestX-ray14.
See the PDF generated from the LaTeX files here.
Dataset datasheets "[d]ocument [the dataset] motivation, composition, collection process, recommended uses, and so on. [They] have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate greater reproducibility of machine learning results, and help researchers and practitioners select more appropriate datasets for their chosen tasks."
On the left side we see the prose format of the paper. On the right side we see the structured format of the dataset datasheet.
An item of the "Uses" section, describing the image format and label details.
I researched publicly-available information to create this dataset datasheet. Ideally, the team working on the dataset should create the datasheet as they develop it.
Given the lack of access to first-hand information, consider this an illustration of what a datasheet dataset should be, but not necessarily accurate information about this particular dataset.
This datasheet was created with the Overleaf template. There is also a Markdown template for datasheet for dataset.
In collaboration with the the CheXpert team, we created a datasheet for the CheXpert dataset.
Accuracy is essential when documenting the details of a dataset. Here is an example of using code to create the tables in a datasheet to make the process reproducible, transparent, and auditable.
If you are interested in datasheets for datasets, you may also want to review model cards.