Build context for a Flywheel Gear which runs CMRRExtractPhysio. Extract physiological log files from encoded "_PHYSIO" DICOM file generated by CMRR MB sequences (>=R015, >=VD13A).
The raw extracted .log files are always left as the original, untouched data. If there are problems with the data, processing is optionally applied to the BIDS compliant output files. Input file and processing settings are set by the user using the following options:
- DICOM_ARCHIVE: Set as an input file, a dicom zip archive containing the dicom with physiological recordings
-
Dry-Run: simply prints command calls to the log rather than actually processing. (Default: off)
-
gear-log-level: what level of detail you would like in the log. (Default: INFO)
-
Generate_Bids: Check this to generate the .tsv BIDS data. If this is unchecked, only the raw .log files will be created. (Default: checked)
-
Generate_json: Check this to generate the BIDS .json file for you to view. Flywheel does not need this file to export to BIDS. It will automatically generate it from the .tsv file's metadata when you export the data to BIDS format. This is just an option to create the file in case you want to look at it. (Default: off)
-
Generate_QA: Check this to generate a QA image of what the BIDS data will look like based on the processing methods you select. An image is generated for each type of physiological recording. (Default: On)
-
Generate_Raw: Check this to keep the raw .log files extracted from the dicom. If you uncheck this, the .log files are used to generate the bids .tsv files, and are then deleted. This can help you limit file clutter. (Default: On)
-
Missing_Data: The strategy to use for missing data points. This ONLY describes how the gear generates a new "tic times" array for the data. Choices are:
- gap_fill: keeps all original sample tic times, and adds new time points at the specified sampling rate anywhere there is a gap. If the time skipped between samples is not an even factor of the sampling rate, this still may result in small (less than 1/2 the sample time) offset. Because tics are only added or modified where they are missing, these are the only points that will require interpolation.
- uniform: Takes the first tic time and the last tic time, and generates a uniform time array based on the specified sampling rate. If time skipped between samples is not an even factor of the sampling rate, this may result in a new time array that's slightly shifted from the original. This means that interpolation may need to be carried out for all these shifted timepoints. However small these shifts may be, interpolation generally can introduce some error. This is probably more acceptable for a slow signal (like RESP) than a fast signal (like ECG).
- upsample: upsample the entire array to the maximum sampling rate. For Siemens, the minimum time between samples is 1 "tic", or 2.5ms, so the maximum sampling rate is 400Hz. (Typically, ECG is sampled at 1 "tic" per sample, and RESP is sampled at 8 "tics" per sample). Since every sample, regardless of the intended sampling rate, must happen at an integer number of tics, upsampling everything to 1 "tic" per sample preserves the original data, while also allowing you the option of resampling at a constant rate if you so choose. You can also just pass in the new, upsampled array to BIDS. It will reflect the new sampling rate change, and should be handled by any BIDS app without error.
- none: do not do anything to address skipped samples. If you have missing data in the signal, this method will add zeros to the end of the array until the array length matches the length of the fMRI scan, assuming a constant sampling rate (which BIDS does).
-
Interpolation_Method: If you chose to handle the missing data with "gap_fill", "uniform", or "upsample", there will be some kind of interpolation that must be carried out. This will determine how the gear calculates signal values at the newly generated timepoints. Different interpolation types are as follows:
- Standard interpolation options (linear, cubic, nearest, etc),
- fill, which will fill any missing data with the numeric value found in the config tag "FIll_Value"
- nan, which will fill any missing data with "nan". (Currently buggy)
-
Fill_Value: A numeric value to fill any missing data with (if not "nan")
-
Process Data: A remnant we will probably remove in the release version. Checking this means it will do the interpolation/data filling processing. Unchecking is essentially equivalent to selecting "none" for "Missing_Data"
This gear uses the CMRRExtractPhysio program to extract physiological data from the dicom to separate .log files. These log files are then used to generate visual plots of the physiological recording, with indicator lines marking the beginning and end of the scan, for validation purposes.
If Generate_Bids is set to True, then two additional files are created for each physio .log file generated from CMRRExtractPhysio, following the BIDS naming conventions.
While BIDS allows multiple physiological recordings to be placed in the same .tsv.gz file, this can only be done if they have the same sampling rate. Typically, this is not the case. Because of this, this gear always creates individual .tsv.gz files for each physiological recording.
BIDS also allows for a "scanner trigger" column in each recording's .tsv.gz file. Though not explicitly stated, this logically is referring to the trigger that is sent by the scanner when a new volume acquisition begins. This column is included with each physiological recording, synced to each individual sampling rate and time.
This gear looks for metadata info on the acquisition name for the BIDS naming convention. If this isn't available, it will try to pull the "SeriesDescription" tag directly from the header. Missing metadata in both of these locations will result in an output file named "UnknownAcquisition_.tsv.gz", and the user will need to manually set these file names.
Every physiological dicom generates one "info.log" file, which has information about the acquisition time of the volumes in the scan. This contains no physiological information, but is necessary to synchronize the physiological recordings with the scan.
An additional ".log" file is created for each physiological measurement stored in the dicom. An optional validation .png image is generated for each ".log" file.
If BIDS generation is selected, an additional ".tzv.gz" and ".json" file are created for each ".log" file. The following directory structure represents output for a dicom with a single physiological measurement (respiration: "RESP"). Files surrounded by "[]" indicate files that are only generated for BIDS. Files surrounded by "{}" indicate files that will be generated for each individual physiological measurement, if present:
Output_Directory
|
|---> Physio_..._info.log
|---> { Physio_..._RESP.log }
|---> { RESP.png }
|---> [ { BIDS_name_for_RESP.tsv.gz } ]
|---> [ { BIDS_name_for_RESP.json } ]
The QA images produced are for researchers to quickly determine if there's something wrong with their data. QA is just a plot of the acquired signal, along with a plot of any triggers recorded during the scan (usually below the signal's timeseries, as shown below). Common problems in data can be signal dropout or signal clipping. Below are some examples of good and bad physio data:
Note that there is no clipping of the signal from start to finish, and the signal is smooth, well defined, and consistent.
Same as with the resp, notice that the signal is uninterupted with no clipping
ECG typically has multiple channels recorded, so each channel is plotted. While it's hard to see the details in this case, we're looking for continuity and clipping, and there are no issues in this recording.
Below is a plot of a respiration data with signal clipping that occurs towards the end of the scan. Notice how the signal becomes saturated for large periods of time. The signal also becomes irregular and unstable. This is likely due to a loose belt.
Below is a plot of signal dropout from RESP. Note that for the times in the signal gaps, there are NO values in the .log file, simply a large jump in time. The lines you see in this plot are placed there by our gear, based off the acquisition time of each sample, to better help you visually inspect your data. The value in these plots is interpolated linearly from the tics preceding and following the gap. Choosing "fill" or "nan" would put different values in these gaps.
The text in the bottom left is letting you know the extent of the data dropout, giving the following values:
- raw offset: The raw time offset between the number of tics sampled, and the expected length of the scan is -3949.875 ms (almost 4 seconds!).
- raw largest gap: The largest gap between consecutive samples in the raw data (over 16 seconds in this case)
- proc offset: The offset time between the number of tics and the expected length in the processed data (1.1 ms, down from 3949).
- proc largest gap: The largest gap between consecutive samples in the processed data (8ms, which is the sample rate of the respiration data). -% tic match in interp: How many tic times used in the processed data for interpolatoin match exactly with real tic times. 100% in this case, but shifts/skips that are a fraction of the sample rate could result in tic times that don't line up with the raw data, and interpolation may introduce extra error for these timepoints.
Another dropout example. This ECG was recorded in the same session as the respiration example above. Notice that the gaps line up perfectly. This is because the respiration and ECG are recorded from the same device in this scan.