Skip to content

Commit

Permalink
update documentation to include new scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
rae McCollum committed Jan 8, 2024
1 parent f33c1f8 commit 6bffc05
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 9 deletions.
22 changes: 18 additions & 4 deletions docs/lookup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,34 @@ Create a lookup.csv to be used in prepare.py. The prepare.py script uses
the lookup.csv file to determine which subjects and sessions will be
uploaded. 

1. Run the `make_lookupcsv.py` with `abcd_mri01.txt` as the input under the `utilities` directory within this repository.
1. Run the `make_lookupcsv.py` under the `utilities` directory within this repository.

2. Verify that all of the subjects you want are included.
* The `--info_file` can either be the `abcd_mri01.txt` or the `abcd_fastqc01.txt` file. Specify which file you're using with the `--abcdmri` or `--fasttrack` flag.

## How to Download the `abcd_mri01.txt` File
* While either file can be used, the fastqc file typically will have more subjects than the mri file.

* You will also need to provide the path to where you want your lookup.csv with the `--lookup_csv` flag.

2. Verify that all of the subjects you want are included by running `validate-lookupcsv.py` also under the `utilities` directory.

* This script will make sure that all of the subjects in your subject list are present in the lookup.csv.

* This script will also remove any lines with a duplicate subject,session pair that have a different interview date by choosing whichever line has the earliest date.

* Please note that this will not fix the issue of the same subject,session pair having different age/sex markers (an issue found in the fastqc file). That will have to be fixed manually by comparing the lookup.csv to the abcd_mri file.

## How to Download the Information File

1. Login to the [NIMH Data Archive](https://nda.nih.gov/)

2. Navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_mri01)
2. For the `abcd_mri01.txt` file, navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_mri01). Navigate to [this page](https://nda.nih.gov/data_structure.html?short_name=abcd_fastqc01) for the `abcd_fastqc.txt` file.

3. Click *Add to Filter Cart* at the bottom

4. Once the filter cart in the top right corner updates, click on *Create Data Package/Add Data to Study*

* For the fastqc file, double check that the ABCD Dataset and ABCD Fasttrack QC Instrument checkboxes are selected (they should be by default)

5. Click *Create Data Package* and name it something identifiable to you

6. Make sure *Include Documentation* is selected before clicking *Create Data Package*. It will take a while to create
Expand Down
14 changes: 9 additions & 5 deletions docs/scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,19 @@ to upload.

Once this script has been run you will want to spot check the results.

You can validate that all of the expected subjects are present in each
You can validate that all of the expected subjects are present in at least one
of the datatype folders by running the `validate-prepare.py` script with
your `subject_list.csv` and your working directory as the inputs. This script
will loop through each datatype folder and compare the subject list to the
subject IDs present in the folder. It will output a text file containing the
subject IDs that are in the subject list but not in the datatype folders.
your `subject_list.csv` and your working directory as the inputs. You can also
specify an output file. This script will loop through each datatype folder and
build a list of subject IDs that have a datatype folder. Then it will compare
that list to the subject list. It will output a text file containing the
subject IDs that are in the subject list but not in any of the datatype folders.
If there are many subjects missing, part of the issue could be that the lookup.csv
does not contain all of the subjects that it should, so be sure to double check
that all of the subjects listed in the `subject_list.csv` are also in the `lookup.csv`
and that there aren't any duplicate subject,session pairs. The output log of the `prepare.py`
run should have warnings for any subjects that weren't found in the lookup.csv, which means
that they are either completely missing or that there are duplicates.

This script should also create several types of files. It will create pairs of csv and
txt files that separate out the subject files in separate batches of 500 files each, with the
Expand Down

0 comments on commit 6bffc05

Please sign in to comment.