Skip to content

Commit

Permalink
update script usage
Browse files Browse the repository at this point in the history
  • Loading branch information
rosemccollum committed Sep 20, 2023
1 parent d61c439 commit 856366e
Showing 1 changed file with 48 additions and 21 deletions.
69 changes: 48 additions & 21 deletions docs/scripts.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,64 @@
# 6. Using `prepare.py` and `upload.py` Scripts
# Run and Validate Prepare and Upload Scripts

## Using `prepare.py`

The prepare.py script runs filemapper on the data to be uploaded as
specified in the lookup.csv contained in the upload folder (eg
destination folder for prepare.py). It then runs records.py on all of
destination folder for prepare.py). If any of the parent directories
are completely empty, it will delete those folders. It then runs records.py on all of
the file-mapped folders to create manifest JSONs and records.csv, which
contains a list of full paths to all files to upload. 

When using
[prepare.py](https://github.com/DCAN-Labs/nda-bids-upload/blob/master/prepare.py)
there are four mandatory flags:

**`--source`** (or **`-s`**): The directory under which all data desired
```
--source (-s): The directory under which all data desired
for upload is found. This is usually the output of a pipeline like
Dcm2Bids or abcd-hcp-pipeline. It is the directory your file mapper
JSONs will be mapping from.
**`--destination`** (or **`-d`**): The upload directory you began in step
two. This directory is going to be where all of the data will be
--destination (-d): The upload directory you began in [step two](workingdirectory.md).
This directory is going to be where all of the data will be
organized after prepare.py has finished.
**`--subject-list`**: A list of subjects and session pairs within a .csv
--subject-list: A list of subjects and session pairs within a .csv
file with column labels "bids_subject_id" and "bids_session_id,"
respectively.
**`--datatypes`**: A list of NDA data types within a .txt file you plan
--datatypes: A list of NDA data types within a .txt file you plan
to upload.
```

`prepare.py` should create the following:

* parent/child directories for each datatype
* complete_folders.txt: contains one line with path to location of that particular file prepped for upload
* complete_records.csv: contains all data for that subject pulled from lookup.csv
* folders_1_500_1.txt: prepare.py separates out the subject files in separate batches of 500 files each. Eg if there were 1400 files (or is it subjects?), there would be 3 submissions, batch 1 with 500 files, batch 2 with 500 files, and batch 3 with 400 files
* Companion csv for each txt - manifest.txt information for the files being uploaded


## Validating `prepare.py`

Once this script has been run you will want to spot check the results.
In the upload directory, you will find a parent/child directory setup.
You should have a parent directory for each of the JSON/YAML file pairs.

You can validate that all of the expected subjects are present in each
of the datatype folders by running the `validate-prepare.py` script with
your `subject_list.csv` and your working directory as the inputs. This script
will loop through each datatype folder and compare the subject list to the
subject IDs present in the folder. It will otuput a text file containing the
subject IDs that are in the subject list but not in the datatype folders.

You should also validate the directory structure of the datatype folders
that were created. In the upload directory, you will find a parent/child directory setup.
You should have a parent directory for each of the JSON/YAML file pairs
where relevant files were found.
They should have the same name as their corresponding JSON/YAML files
without the extensions. Underneath you should find a child directory for
every subject (and session if used) that was found to have the relevant
files listed in the corresponding file mapper JSON. If there are no
child files under the parent directory then the script couldn't find any
of the relevant files listed in the file mapper JSON.
files listed in the corresponding file mapper JSON.

You will notice a common naming convention for the "child" directories
as well. At the child directory level the naming convention has four
Expand Down Expand Up @@ -121,7 +143,9 @@ mapper JSON files for proper formatting.

## Using `upload.py`

The upload.py script uses records.csv (generated by prepare.py above) to
Before you begin the upload process, we recommend sending the NDA a curtesy email letting them know that you are about to start uploading data. If possible, provide an estimate of how much data is going to be uploaded.

The `upload.py` script uses `records.csv` (generated by `prepare.py` above) to
split the files to be uploaded into batches of 500. For each batch, the
script loops through each of the file paths to generate and run the
necessary upload command using NDA-tools.
Expand Down Expand Up @@ -154,13 +178,16 @@ username and password, which is then stored in
**\~/.NDATools/settings.cfg.** Find the upload logs, validation results,
and submission package here: **\~/NDA/nda-tools/vtcmd**

**When to contact the NDA Helpdesk**
## Validate `upload.py`

After running `upload.py`, check that the submission was successful. First check the output logs for any errors. Then, follow the steps below to check the submission on the NDA:

1. Log into the NDA

2. Navigate to your dashboard.

3. Click on "Collections (#)"

If all goes well in ***upload.py***, you should contact the NDA helpdesk
as soon as possible after uploading and indicate that you are ready for
the NDA's Quality Assurance (QA) checks on your collection's data.
4. Click on the collection name you uploaded to.

If all does not go well, make sure to read your NDA-specific upload log
files, double-check the standard output streams (stdout) of your upload
script runs, and double-check the submission status within your NDA
collection's website tab on **Submissions**.
5. Click on the "Submissions" tab and check the "Submission Loading Status"

0 comments on commit 856366e

Please sign in to comment.