From dd44300050e171395c7fe5ebc8715996b4092aae Mon Sep 17 00:00:00 2001 From: Alyssa Dai Date: Sun, 12 Nov 2023 21:58:32 -0500 Subject: [PATCH] add instructions for updating graph data following newer data model --- docs/cli.md | 6 ++++++ docs/updating_dataset.md | 19 +++++++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/docs/cli.md b/docs/cli.md index b0eda700..9f9ad466 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -146,6 +146,12 @@ You could run the CLI as follows: ... ``` +## Upgrading to a newer version of the CLI +New releases of the Neurobagel CLI will occasionally introduce breaking changes to the data model for subject-level information in a `.jsonld` graph file. + +_If you have already created `.jsonld` files for your Neurobagel graph database using the CLI_, +follow the instructions [here](updating_dataset.md#following-a-change-in-the-neurobagel-data-model) to regenerate your existing graph data so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version. + ## Development environment To set up a development environment, please run diff --git a/docs/updating_dataset.md b/docs/updating_dataset.md index a081839a..6525f9b4 100644 --- a/docs/updating_dataset.md +++ b/docs/updating_dataset.md @@ -1,17 +1,19 @@ # Updating a harmonized dataset +## Following a change in my _dataset_ + When using Neurobagel tools on a dataset that is still undergoing data collection, you may need to update the Neurobagel annotations and/or graph-ready data for the dataset when you want to add new subjects or measurements or to correct mistakes in prior data versions. For any of the below types of changes, you will need to regenerate a graph-ready `.jsonld` file for the dataset which reflects the change. -## If the phenotypic (tabular) data have changed +### If the phenotypic (tabular) data have changed If new variables have been added to the dataset such that there are new columns in the phenotypic TSV you previously annotated using Neurobagel's annotation tool, you will need to: 1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](annotation_tool.md) 2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) on your updated TSV and data dictionary -## If only the imaging data have changed +### If only the imaging data have changed If the BIDS data for a dataset have changed without changes in the corresponding phenotypic TSV (e.g., if new modalities or scans have been acquired for a subject), you have two options: - If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](cli.md) on the updated BIDS directory. @@ -23,11 +25,20 @@ OR _When in doubt, rerun both CLI commands._ -## If only the subjects have changed +### If only the subjects have changed If subjects have been added to or removed from the dataset but the phenotypic TSV is otherwise unchanged (i.e., only new or removed rows, without changes to the available variables), you will need to: - **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary +## Following a change in the _Neurobagel data model_ + +As Neurobagel refines the way that specific subject properties are modeled as graph data, new tool releases will occasionally introduce breaking changes to the data model for subject-level information in a `.jsonld` graph data file. + +_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, the easiest way would be to [rerun the CLI](cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph. +This will ensure that if you use the latest version of the Neurobagel CLI to process new datasets (i.e., generate new `.jsonld` files) for your database, the resulting data will not have conflicts with existing data in the graph. + +Note that if upgrading to a newer version of the data model, **you should regenerate the `.jsonld` files for _all_ datasets in your existing graph**. + ## Updating the graph database -To allow easy (re-)uploading of the updated `.jsonld` for your dataset to a graph database, make a copy of it in a [central directory on your research data fileserver for storing local Neurobagel `jsonld` datasets](infrastructure.md#where-to-store-neurobagel-graph-ready-data). +To allow easy (re-)uploading of the updated `.jsonld` for your dataset(s) to a graph database, make a copy of it in a [central directory on your research data fileserver for storing local Neurobagel `jsonld` datasets](infrastructure.md#where-to-store-neurobagel-graph-ready-data). Then, follow the steps for [uploading/updating a dataset in the graph database](infrastructure.md#uploading-data-to-the-graph) (needs to be completed by user with database write access).