-
Notifications
You must be signed in to change notification settings - Fork 1
Preprocessing
When task related BOLD fMRI data (maybe resting too...though I may create a resting-specific script) has been collected we need to run these scripts.
It will submit jobs that will perform all of the preprocessing steps: motion correction, slice-timing correction, temporal filtering, co-registration, spatial smoothing, etc... For an exhaustive explanation of all of these steps and why we do them please see Chapter 3.
First you'll need to navigate to the following directory:
/home/data/madlab/scripts/preproc_scripts
In this directory you'll find a lot of scripts (including the scripts for converting dicoms to nifiti files). For preprocessing you'll be specifically looking for 2 files:
1) {PROJECT NAME}_preproc.py (e.g., wmaze_preproc.py)
2) {PROJECT NAME}_preproc_submit.py (e.g., wmaze_preproc_submit.py)
We should create preprocessing scripts for each project given that slightly different preprocessing steps may be implemented based on differences in study design (i.e., differences in frequency thresholds in high-pass filters etc...). However, most projects will be preprocessed very similarly and we may converge on a single preprocessing script for all projects (TO BE DETERMINED).
- In the {PROJECT NAME}_preproc.py file a few things need to be edited. First, the name of the 'create_workflow' function should correspond to the project that is being preprocessed. For example if it is currently called 'wmaze' change it to your relevant project name (e.g., 'seqtrd'). You'll also need to change the datasource.inputs.template values to appropriately reference your project specific data location on the cluster. These changes should be made to the script around lines 278-287. You should look for the following string:
'/home/data/madlab/data/mri/{PROJECT_NAME}/%s/bold/bold%03d/bold.nii.gz'
You would need to change the {PROJECT_NAME} to your project...for example:
'/home/data/madlab/data/mri/wmaze/%s/bold/bold%03d/bold.nii.gz'
This line tells the script where it can find your data and it will iterate over your subject id (%s) and over your runs (%03d). These inputs are defined by the infields of the DataGrabber node ['subject_id', 'run'], which are defined by your submit script.
THESE ARE ALL THE EDITS THAT NEED TO BE MADE TO THIS SCRIPT. ANY OTHER CHANGES SHOULD ONLY BE MADE ONCE YOU ARE VERY COMFORTABLE WITH NIPYPE SCRIPTING AND THE IMPORTANCE OF ORDER OF OPERATIONS IN PREPROCESSING.
- In the {PROJECT NAME}_preproc_submit.py file you need to change many things. The first thing you'll need to change is the list of subject ids called subjs. This will always be a list variable and it should correspond to the subjects you want to preprocess. In practice I typically define this variable to be only one subject until I de-bug the script and then once everything is working as it should I run the preprocessing on the remaining participants.
The next variables you'll need to change are your working, output, and surfaces directories: workdir, outdir, surf_dir. Your workdir corresponds to where intermediate files should be written during the preprocessing steps. These are files that are necessary for the implementation of the scripts running successfully but not necessary for the final output. These should ALWAYS be written to the
/scratch/madlab/{PROJECT NAME}/preproc
directory. When things don't work properly it is here that you can look to see how and why things are crashing. The /scratch directory should always be considered a temporary directory. Data are automatically removed from here if not touched within a 40 day period.
The outdir is your sink_directory that is defined in the scripts and it should be the location on the cluster where you want your outputs to reside. It should ALWAYS be something like (again...replace {PROJECT NAME} with your relevant project's name):
/home/data/madlab/data/mri/{PROJECT NAME}/preproc
The surf_dir is the freesurfer project directory where all participants data surfaces reside. It is important that this is appropriately set (AND YOUR PROJECT SPECIFIC ENVIRONMENT is also set) given that many steps in the preprocessing script call on the files created by freesurfer. It should look something like:
/home/data/madlab/surfaces/{PROJECT NAME}
This script then enters a 'for' loop and iterates over your subject list submitting a separate submission to the queue on the cluster for each participant. The 'for' loop creates a shell script that ultimately gets submitted. Within that shell script you have the following lines written. Inputs to these lines will need to be changed depending on your project and specifications:
It joins together the following strings with one white space between everything:
'python {PROJECT NAME}_preproc.py
-s {SUBJECT ID -- this is defined and changed above in the script}
-o {OUTDIR -- this is defined and changed above in the script}
-w {WORKDIR -- this is defined and changed above in the script}
-r {NUMBER OF FUNCTIONAL RUNS}
-d {SURF_DIR}
-u {HIGH PASS FILTER CUTT OFF IN HZ or -1 if not using}
-l {LOW PASS FILTER CUT OFF IN HZ or -1 if not using}
--do_slice_times={TRUE or FALSE depending on whether or not you want to do simultaneous slice timing correction or not}
--use_fsl_bp={TRUE or FALSE depending on whether or not you want to use FSL's filter algorithm or the python one defined in the main script}
-t {YOUR PROJECT SPECIFIC REPETITION TIME [TR]}
-k {WHAT SPATIAL SMOOTHNESS - FWHM - YOU WANT TO IMPOSE ON YOUR DATA}
-n {NUMBER OF PRINCIPAL COMPONENTS YOU WANT TO EXTRACT FROM YOUR WHITE MATTER AND CSF MASKS}
-v {WHICH VOLUME YOU WANT TO USE AS YOUR TEMPLATE IN YOUR MOTION CORRECTION ALGORITHM}'
The components of the convertcmd that you'll most likely need to change (i.e., they'll vary across projects the most) are your script name (change from wmaze_preproc.py to seqtrd_preproc.py), the number of runs in your functional data (-r flag), your TR (-t flag; this may go down once we start utilizing multiband more on the new scanner), and the size of the smoothing kernel you want to impose (-k flag; general rule of thumb is about 2x your acquired resolution, however for smaller anatomical areas averaging of unique signals (a.k.a., partial voluming) should be taken into account.
The for loop will then submit the shell script that is created to the queue with output and error text files written to /scratch/madlab/crash which can be evaluated with the 'cat' or 'less' command to check on the progress of the scripts. For example:
cat /scratch/madlab/crash/wmaze_preproc_out_WMAZE_001
In this file you should see the progress of the nipype script that you would typically view in the standard out. If things work normally everything should look very orderly in this file (jobs submitted, jobs executed, etc...). If things go wrong (which they always do) a TraceBack error will be generated and a crash file will be saved. Look into the crash file using the following command:
nipype_display_crash /full/location/to/crash_file.pklz
This is a pickle file that contains the TraceBack output that will tell you why the script crashed. Most likely it will be something like an IOError where a file can't be found because of a typo or something...but here is where you need to debug things.
************* Read On For Detailed Description of the main script ************
Below I'll briefly describe what this script is doing. The main function in this script is the create_workflow function which is first defined on line 266. Into this function we input the number of functional runs (which will vary across projects), the subject ID, the freesurfer subjects directory (subjects_dir), the amount we want to spatially smooth (fwhm), how we intend to do slice timing correction, temporal frequency filtering bands, the repetition time (TR), where to save the outputs (sink_directory), whether or not to use FSL's filter function or our own python defined function above, how many components to extract from the white matter and csf to account for physiological and scanner related noise, and which volume to use for motion correction.
The first node in this nipype script is the datasource node. It is responsible for loading the relevant data using glob like functionalities (i.e., it can use wild cards '*' to load all contents of directories). This is one of the two places that you'll need to modify for each project. It should look something like:
The next node renames things so that identically named things don't conflict (i.e., bold.nii.gz from the first run doesn't conflict with bold.nii.gz from the second run). This should also be changed to reflect the specific project.
The next list defines the output fields that will be used with our outputspec (next node below) for all of the data that we want to save following the completion of this script. If we want to save additional files, this list will have to be modified.
We'll go over towards the end what all of these outputs are for - ultimately some of the output are important for subsequent analysis steps (e.g., 'smoothed_files', 'reference', 'motionandoutlier_noise_file') while others are important for quality control (e.g., 'motion_plots').
The next two nodes of the nipype script first convert things to floating point format for precision and run AFNI's despiker alogrithm which goes through the BOLD fMRI time series and identifies single time points that exceed 3 standard deviations of the adjacent time points signal. If this spike is detected and the adjacent time points are not corrupted the spike in the signal is replaced with the interpolation of the adjacent time points. Spikes like this are typically due to scanner related radio frequency (RF) noise and not physiologically or psychologically meaningful...however caution should be taken with functions like this because we are changing our original data.
*********** A COUPLE OF THINGS TO CONSIDER **********
This script is created using nipype scripting which is implemented using python. The syntax is slightly unique to nipype and python but there are a couple of things to keep in mind. In python spaces matter. So a lot of the spacing that you see in this script has to be indented 4 spaces because these nodes are all defined within a function. That said, the remaining spacing is used to aid in readability, they're not absolutely necessary but once you go down that spacing path you should probably stick with it. The typically rule of thumb is space according to bracket '[]' or parentheses '()'
The other thing to consider is in nipype each node is a processing step that can be conceptualized when everything is put together as an 'acyclic graph'. That said each node in the graph needs to be connected and there are many ways (i.e., syntax) for doing that. See here. The syntax that I've adopted is to connect each node after it is created with the following connect function:
' workflow_name.connect(previous_node_name, 'out_file_name', next_node_name, 'input_file_name') '
Out/In file names will be either defined by you, for example when you create function nodes OR they're defined by nipype itself and can be found in the documentation of each interface which can be found here.
The next node extracts the reference volume from your BOLD fMRI scans that will be used for co-registration to your anatomical scan and for motion correction.
The next node implements simultaneous slice timing and motion correction if the variable slice_times is defined as TRUE in your submission script otherwise the script will only perform motion correction. The reason we do this is because slice timing correction if performed before motion correction can propagate motion-related intensity differences (which can be quite large) across time. In contrast if motion correction is performed before slice-timing correction, data that were actually acquired at one point in time may be moved to a different slice, thus the acquisition time defined in the algorithm may not match the actual acquisition time. Thus IF slice-timing correction is to be implemented at all, and there are arguments that it doesn't need to be, then it should be performed simultaneously in order to minimize the propagation of motion-related noise across space and time.
If you end up using the simultaneous algorithm you need to calculate or obtain from the DICOM header files the actual slice times for your data. We calculate this based on the TR and the interleaved acquisition at the bottom of this script.
The next node calculates the temporal signal-to-noise ratio (TSNR) of your scans. This is very important to check for quality control to make sure you are getting sufficient signal from your regions of interest.
The next three nodes plot your motion parameters as a .png file. It is a good habit to take a look at this file to make sure that your subjects don't have lots of motion that are not being appropriately identified by the artifact detection node (will be discussed later). The fssource node defines your freesurfer environment so you can take advantage of the files created by freesurfer (e.g., aparc+aseg.mgz) and the bbreg function (boundary based registration) algorithm for coregistering functional and anatomical scans. Last, the fs_threshold binarizes the aparc+aseg file so that you now have an anatomical mask that can be used for skull-stripping or for moving into BOLD fMRI space once the transformation has been computed (in the next node).
The next node performs the boundary based registration algorithm from freesurfer (bbreg). This calculates the transformation affine matrix to co-register your BOLD fMRI scan to your anatomical scan. This transformation will only be as good as your boundary segmentation obtained from the freesurfer algorithm. So...if there are lots of errors (pial/skull included in brainmask.mgz OR gyri missing due to improper intensity normalization) then your co-registration will be shitty. That is why in the freesurfer step before the pre-processing step we need to do a lot of work (but not too much) in quality checking the performance of the algorithm and adding control points or deleting brainmask when necessary.
Just the transformations are saved given that we run all analyses in BOLD fMRI space and then once we are ready to move to template space for group level analyses do we warp things to the group common space.
The following two nodes first binarize the CSF and WM masks obtained from the freesurfer aparc+aseg.mgz file and erodes them by 2 voxels so that these masks do not have any problems with partial voluming with grey matter voxels. These masks will be used to obtain the first three principal components of the fMRI signal to statistically regress out the physiological (i.e., respiration and heart rate) and scanner related noise. The other node applies the inverse transformation to warp the WM and CSF masks from anatomical space into BOLD fMRI space.
The values used in the wmcsf.inputs.match input are obtained from here. Similar syntax can be used to isolate anatomical regions of interest from the freesurfer aparc+aseg file.
The next two nodes apply the inverse transformation on the binarized aparc+aseg.mgz file, bringing the mask file into BOLD fMRI space and then dilates it by 1 voxel. This will provide a brain only mask that can be used to aid in visualization, computational efficiency, etc...
Note that the last node name applies a similar function fs.Binarize but it has a different name. Nodes within the same pipeline CANNOT have the same name.
The next node is the artifact detection node that identifies the time points that exceed either 3 standard deviations of the z-intensity threshold OR 1 mm of the normalized motion. This node will save the normalized motion as a text file as well as text files that identify the time points (0-based) that exceeded the thresholds. These time points will be used (in a node that is coming up) to subsequently create outlier regressors (0's for every ok time point and 1's for bad time points) to regress out the egregious time-points during the general linear model.
The next three nodes (NOTE: only showing the next two) are function nodes that call on functions located at the beginning of the script that create text files noise events, specifically for the motion, first derivative of the motion, composite norm motion, outlier events, and the first three Legrange Polynomials. These will be read useful for accounting for motion related noise when analyzing data at the first level.
The next node is utilized to mask the motion corrected (or simultaneously slice-time and motion corrected) functional data by the binarized and 1-voxel dilated anatomical mask that has been transformed into BOLD fMRI space. This will output a brain only BOLD fMRI dataset.
The next four nodes are necessary for the implementation of FSL's SUSAN spatial smoothing algorithm. More about SUSAN can be found here. The short story is that SUSAN uses the underlying anatomy (based off of voxel intensities) to constrain averaging of local voxels with similar intensity (i.e., gray matter with gray matter versus gray matter with CSF or gray matter with white matter). Alternatively an isotropic smoothing algorithm can be applied.
The remaining nodes are used to temporally filter the data. You can either employ a bandpass filter which is often used for resting-state fMRI or a high-pass filter for task related fMRI data (NOTE: a low pass filter is rarely used with task related data due to the potential of filtering out your signal of interest for fast-event related designs). With our submission script we can choose to use our own filter using numpy or FSL's temporal filter using fslmaths.
The last node (datasink) saves the output files to your output directory. See here for how datasink syntax can be implemented to create specific directory structures.
The remaining bit of code is used to create the flags for the command line inputs for the number of functional runs, whether to temporally filter, etc... If further command line flags are to be utilized you can modify the script here.
- Moving DICOMs to HPC
- DICOM Conversion
- Freesurfer Recon_All, Quality Assurance, and Resubmission
- Preprocessing
- Normalization To Be Completed
- Creation of EV Files
- First Level Analysis
- Second Level Analysis To Be Completed
- Group Level Analysis To Be Completed
- DWI To Be Completed