From 6bffc0522aa93f17e5cbab48a8b88b5fb8bab9cb Mon Sep 17 00:00:00 2001 From: rae McCollum Date: Mon, 8 Jan 2024 11:51:14 -0600 Subject: [PATCH 1/2] update documentation to include new scripts --- docs/lookup.md | 22 ++++++++++++++++++---- docs/scripts.md | 14 +++++++++----- 2 files changed, 27 insertions(+), 9 deletions(-) diff --git a/docs/lookup.md b/docs/lookup.md index 042bda8..3bf7e25 100644 --- a/docs/lookup.md +++ b/docs/lookup.md @@ -4,20 +4,34 @@ Create a lookup.csv to be used in prepare.py. The prepare.py script uses the lookup.csv file to determine which subjects and sessions will be uploaded.  -1. Run the `make_lookupcsv.py` with `abcd_mri01.txt` as the input under the `utilities` directory within this repository. +1. Run the `make_lookupcsv.py` under the `utilities` directory within this repository. -2. Verify that all of the subjects you want are included. + * The `--info_file` can either be the `abcd_mri01.txt` or the `abcd_fastqc01.txt` file. Specify which file you're using with the `--abcdmri` or `--fasttrack` flag. -## How to Download the `abcd_mri01.txt` File + * While either file can be used, the fastqc file typically will have more subjects than the mri file. + + * You will also need to provide the path to where you want your lookup.csv with the `--lookup_csv` flag. + +2. Verify that all of the subjects you want are included by running `validate-lookupcsv.py` also under the `utilities` directory. + + * This script will make sure that all of the subjects in your subject list are present in the lookup.csv. + + * This script will also remove any lines with a duplicate subject,session pair that have a different interview date by choosing whichever line has the earliest date. + + * Please note that this will not fix the issue of the same subject,session pair having different age/sex markers (an issue found in the fastqc file). That will have to be fixed manually by comparing the lookup.csv to the abcd_mri file. + +## How to Download the Information File 1. Login to the [NIMH Data Archive](https://nda.nih.gov/) -2. Navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_mri01) +2. For the `abcd_mri01.txt` file, navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_mri01). Navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_fastqc01) for the `abcd_fastqc.txt` file. 3. Click *Add to Filter Cart* at the bottom 4. Once the filter cart in the top right corner updates, click on *Create Data Package/Add Data to Study* + * For the fastqc file, double check that the ABCD Dataset and ABCD Fasttrack QC Instrument checkboxes are selected (they should be by default) + 5. Click *Create Data Package* and name it something identifiable to you 6. Make sure *Include Documentation* is selected before clicking *Create Data Package*. It will take a while to create diff --git a/docs/scripts.md b/docs/scripts.md index fc24687..82c4b84 100644 --- a/docs/scripts.md +++ b/docs/scripts.md @@ -44,15 +44,19 @@ to upload. Once this script has been run you will want to spot check the results. -You can validate that all of the expected subjects are present in each +You can validate that all of the expected subjects are present in at least one of the datatype folders by running the `validate-prepare.py` script with -your `subject_list.csv` and your working directory as the inputs. This script -will loop through each datatype folder and compare the subject list to the -subject IDs present in the folder. It will output a text file containing the -subject IDs that are in the subject list but not in the datatype folders. +your `subject_list.csv` and your working directory as the inputs. You can also +specify an output file. This script will loop through each datatype folder and +build a list of subject IDs that have a datatype folder. Then it will compare +that list to the subject list. It will output a text file containing the +subject IDs that are in the subject list but not in any of the datatype folders. If there are many subjects missing, part of the issue could be that the lookup.csv does not contain all of the subjects that it should, so be sure to double check that all of the subjects listed in the `subject_list.csv` are also in the `lookup.csv` +and that there aren't any duplicate subject,session pairs. The output log of the `prepare.py` +run should have warnings for any subjects that weren't found in the lookup.csv, which means +that they are either completely missing or that there are duplicates. This script should also create several types of files. It will create pairs of csv and txt files that separate out the subject files in separate batches of 500 files each, with the From 5812f5f3b873723b86d4b7ffbce40f1d29838c72 Mon Sep 17 00:00:00 2001 From: rae McCollum Date: Tue, 9 Jan 2024 10:07:00 -0600 Subject: [PATCH 2/2] add missing , --- utilities/validate-prepare.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utilities/validate-prepare.py b/utilities/validate-prepare.py index a00ad07..d24eb23 100755 --- a/utilities/validate-prepare.py +++ b/utilities/validate-prepare.py @@ -24,7 +24,7 @@ def _cli(): help='Path to subject_list.csv containing the subjects you wish to upload' ) parser.add_argument( - '--output_file', required=False, default = "prepare_validation.txt" + '--output_file', required=False, default = "prepare_validation.txt", help='Path to where you want the text file with the missing subjects to be output (including the name of the file). Default is cwd/prepare_validation.txt.' )