From 2e77a38cf65067c5ea0f16bc2c390f3ff944f2c9 Mon Sep 17 00:00:00 2001 From: Ethan Steinberg Date: Fri, 16 Aug 2024 05:56:49 -0700 Subject: [PATCH] Add comment about all codes --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5800383..00b217a 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,8 @@ found in the following subfolders: organized into _data schema_ files, sharded by subject and sorted, for each subject, by time. - `$MEDS_ROOT/metadata/codes.parquet`: This file contains per-code metadata in the _code metadata schema_ - about the MEDS dataset. As this dataset describes all codes observed in the full MEDS dataset, it is _not_ + about the MEDS dataset. All codes within the dataset should have an entry in this file. + As this dataset describes all codes observed in the full MEDS dataset, it is _not_ sharded. Note that some pre-processing operations may, at times, produce sharded code metadata files, but these will always appear in subdirectories of `$MEDS_ROOT/metadata/` rather than at the top level, and should generally not be used for overall metadata operations.