- Overview of Nipype
- Semantics of Nipype
- Playing with interfaces
- Creating workflows
- Advanced features
- Future directions
From design to databases [1]
a plethora of evolving options
- different algorithms
- different assumptions
- different platforms
- different interfaces
- different file formats
neuroscientist:
- which packages should I use?
- why should I use these packages?
- how do they differ?
- how should I use these packages?
developer:
- which package(s) should I develop for?
- how do I disseminate my software?
How do we:
|
Coming at it from a developer's perspective, we needed something
- lightweight
- scriptable
- provided formal, common semantics
- allowed interactive exploration
- supported efficient batch processing
- enabled rapid algorithm prototyping
- was flexible and adaptive
shell scripting:
Can be quick to do, and powerful, but application specific scalability, and not easy to port across different architectures.
make/CMake:
Similar in concept to workflow execution in Nipype, but again limited by the need for command line tools and flexibility in terms of scaling across hardware architectures (although see makeflow).
Octave/MATLAB:
Integration with other tools is ad hoc (i.e., system call) and dataflow is managed at a programmatic level. However, see PSOM which offers a very nice alternative to some aspects of Nipype for Octave/Matlab users.
Graphical options: (e.g., LONI pipeline)
Adding or reusing components across different projects require XML manipulation or subscribing to some specific databases.
- easy to learn
- coding style makes for easy readability
- cross-platform
- extensive infrastructure for
- development and distribution
- scientific computing
- brain imaging
- several institutions are adopting it in computer science classes
- scripting (like shell scripts e.g. bash, csh)
- make web sites (like these slides)
- science (like R, Matlab, IDL, Octave, Scilab)
- etc.
You just need to know 1 language to do almost everything !
- IPython, an advanced Python shell: http://ipython.org
- Numpy : provides powerful numerical arrays objects, and routines to manipulate them: http://www.numpy.org
- Scipy : high-level data processing routines. Optimization, regression, interpolation, etc: http://www.scipy.org
- Matplotlib a.k.a. Pylab: 2-D visualization, "publication-ready" plots http://matplotlib.sourceforge.net
- Mayavi : 3-D visualization http://code.enthought.com/projects/mayavi
- Scikit-learn, machine learning: http://scikit-learn.org
- Scikit-Image, image processing: http://scikits-image.org
- RPy2, communicating with R: http://rpy.sourceforge.net/rpy2.html
- NiPy, an umbrella project for Neuroimaging in Python: http://nipy.org
- DiPy, diffusion imaging
- Nibabel, file reading and writing
- NiPy, preprocessing and statistical routines
- Nipype, interfaces and workflows
- Nitime, time series analysis
- PySurfer, Surface visualization
- PyMVPA, machine learning for neuroimaging: http://pymvpa.org
- PsychoPy, stimulus presentation: http://psychopy.org
Nipype architecture [2]
- Interface
- Engine
- Executable Plugins
- Interface: Wraps a program or function
- Node/MapNode: Wraps an Interface for use in a Workflow that provides caching and other goodies (e.g., pseudo-sandbox)
- Workflow: A graph or forest of graphs whose nodes are of type Node, MapNode or Workflow and whose edges represent data flow
- Plugin: A component that describes how a Workflow should be executed
Currently supported (4-2-2012). Click here for latest
AFNI | ANTS |
BRAINS | Camino |
Camino-TrackVis | ConnectomeViewerToolkit |
dcm2nii | Diffusion Toolkit |
FreeSurfer | FSL |
MRtrx | Nipy |
Nitime | PyXNAT |
Slicer | SPM |
Most used/contributed policy!
Not every component of these packages are available.
Properties:
|
Allows seamless execution across many architectures
- local
- serially
- multicore
- Clusters
- Condor
- PBS/Torque
- SGE
- SSH (via IPython)
- Environment and installing
- Nipype as a brain imaging library
- Building and executing workflows
- Contributing to Nipype
- imperative style caching
- Workflow concepts
- Hello World! of workflows
- Grabbing and Sinking
- iterables and iterfields
- Distributed computing
- The Function interface
- Config options
- Debugging
- actual workflows (resting, task, diffusion)
Scientific Python:
- Debian/Ubuntu/Scientific Fedora
- Enthought Python Distribution (EPD)
Installing Nipype:
- Available from @NeuroDebian, @PyPI, and @GitHub
- Dependencies: networkx, nibabel, numpy, scipy, traits
Running Nipype (Quickstart):
- Ensure tools are installed and accessible
- Nipype is a wrapper, not a substitute for AFNI, ANTS, FreeSurfer, FSL, SPM, NiPy, etc.,.
At MIT you can configure your environment as:
source /software/python/EPD/virtualenvs/7.2/nipype0.5/bin/activate export TUT_DIR=/mindhive/scratch/mri_class/$LOGNAME/nipype-tutorial mkdir -p $TUT_DIR cd $TUT_DIR ln -s /mindhive/xnat/data/nki_test_retest nki ln -s /mindhive/xnat/data/openfmri/ds107 ds107 ln -s /mindhive/xnat/surfaces/nki_test_retest nki_surfaces ln -s /mindhive/xnat/surfaces/openfmri/ds107 ds107_surfaces module add torque export ANTSPATH=/software/ANTS/versions/120325/bin/ export PATH=/software/common/bin:$ANTSPATH:$PATH . fss 5.1.0 . /etc/fsl/4.1/fsl.sh
For our interactive session we will use IPython:
ipython notebook --pylab=inline
-
- sub001
- sub049
-
- 2475376
- 0021006
Surfaces reconstructed with FreeSurfer 5.1 without editing
- Nipype as a library
- Imperative programming with caching
- Workflow concepts
- Hello World! of workflows
- Data grabbing and sinking
- Loops: iterables and iterfields
- The IdentityInterface and Function interfaces
- Config options, Debugging, Distributed computing
Importing functionality
>>> from nipype.interfaces.camino import DTIFit
>>> from nipype.interfaces.spm import Realign
Finding interface inputs and outputs and examples
>>> DTIFit.help()
>>> Realign.help()
Executing the interfaces
>>> fitter = DTIFit(scheme_file='A.sch',
in_file='data.bfloat')
>>> fitter.run()
>>> aligner = Realign(in_file='A.nii')
>>> aligner.run()
import os
from shutil import copyfile
library_dir = os.path.join(os.getenv('TUT_DIR'), 'as_a_library')
os.mkdir(library_dir)
os.chdir(library_dir)
We will use FreeSurfer to convert the file to uncompressed Nifti
from nipype.interfaces.freesurfer import MRIConvert
MRIConvert(in_file='../ds107/sub001/BOLD/task001_run001/bold.nii.gz',
out_file='ds107.nii').run()
Normally:
$ mri_convert ../ds107/sub001/BOLD/task001_run001/bold.nii.gz
ds107.nii
Shell script wins!
Import the motion-correction interfaces
from nipype.interfaces.spm import Realign
from nipype.interfaces.fsl import MCFLIRT
Run SPM first
>>> results1 = Realign(in_files='ds107.nii',
register_to_mean=False).run()
>>> ls
ds107.mat ds107.nii meands107.nii pyscript_realign.m rds107.mat
rds107.nii rp_ds107.txt
Shell script goes into hiding. Of course it could do ;)
$ python -c "from nipype.interfaces.spm import Realign;
Realign(...).run()"
but how?
>>> MCFLIRT.help()
or go to: MCFLIRT help
>>> results2 = MCFLIRT(in_file='ds107.nii', ref_vol=0,
save_plots=True).run()
Now we can look at some results
subplot(211);plot(genfromtxt('ds107_mcf.nii.gz.par')[:, 3:]);
title('FSL')
subplot(212);plot(genfromtxt('rp_ds107.txt')[:,:3]);title('SPM')
if i execute the MCFLIRT line again, well, it runs again!
Setup
>>> from nipype.caching import Memory
>>> mem = Memory('.')
Create cacheable objects
>>> spm_realign = mem.cache(Realign)
>>> fsl_realign = mem.cache(MCFLIRT)
Execute interfaces
>>> spm_results = spm_realign(in_files='./as_a_library/ds107.nii',
register_to_mean=False)
>>> fsl_results = fsl_realign(in_file='./as_a_library/ds107.nii',
ref_vol=0, save_plots=True)
Compare
subplot(211);plot(genfromtxt(fsl_results.outputs.par_file)[:, 3:])
subplot(212);
plot(genfromtxt(spm_results.outputs.realignment_parameters)[:,:3])
Execute interfaces again
>>> spm_results = spm_realign(in_files='./as_a_library/ds107.nii',
register_to_mean=False)
>>> fsl_results = fsl_realign(in_file='./as_a_library/ds107.nii',
ref_vol=0, save_plots=True)
Output
- 120401-23:16:21,144 workflow INFO:
- Executing node 43650b0cabb14ef502659398b944be8b in dir: /mindhive/gablab/satra/mri_class/nipype_mem/nipype-interfaces-spm-preprocess-Realign/43650b0cabb14ef502659398b944be8b
- 120401-23:16:21,145 workflow INFO:
- Collecting precomputed outputs
- 120401-23:16:21,158 workflow INFO:
- Executing node e91bcd85558ecd0a2786c9fdd2bcb65a in dir: /mindhive/gablab/satra/mri_class/nipype_mem/nipype-interfaces-fsl-preprocess-MCFLIRT/e91bcd85558ecd0a2786c9fdd2bcb65a
- 120401-23:16:21,159 workflow INFO:
- Collecting precomputed outputs
what if we had more files?
>>> from os.path import abspath as opap
>>> files = [opap('ds107/sub001/BOLD/task001_run001/bold.nii.gz'),
opap('ds107/sub001/BOLD/task001_run002/bold.nii.gz')]
>>> fsl_results = fsl_realign(in_file=files, ref_vol=0,
save_plots=True)
>>> spm_results = spm_realign(in_files=files, register_to_mean=False)
They will both break but for different reasons:
1. Interface incompatibility 2. File format
converter = mem.cache(MRIConvert)
newfiles = []
for idx, fname in enumerate(files):
newfiles.append(converter(in_file=fname,
out_type='nii').outputs.out_file)
Where:
>>> from nipype.pipeline.engine import Node, MapNode, Workflow
Node:
>>> spm_realign = mem.cache(Realign)
>>> realign_spm = Node(Realign(), name='motion_correct')
Mapnode:
>>> realign_fsl = MapNode(MCFLIRT(), iterfield=['in_file'],
name='motion_correct_with_fsl')
Workflow:
>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])
Node:
>>> realign_spm.inputs.in_files = newfiles
>>> realign_spm.inputs.register_to_mean = False
>>> realign_spm.run()
Mapnode:
>>> realign_fsl.inputs.in_file = files
>>> realign_fsl.inputs.ref_vol = 0
>>> realign_fsl.run()
Workflow:
>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])
>>> myflow.base_dir = opap('.')
>>> myflow.run()
Workflow:
>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])
>>> myflow.base_dir = opap('.')
>>> myflow.inputs.motion_correct.in_files = newfiles
>>> myflow.inputs.motion_correct.register_to_mean = False
>>> myflow.inputs.motion_correct_with_fsl.in_file = files
>>> myflow.inputs.motion_correct_with_fsl.ref_vol = 0
>>> myflow.run()
Create two nodes:
>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
iterfield=['in_file'],
name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')
Set inputs:
>>> convert2nii.inputs.in_file = files
>>> realign_spm.inputs.register_to_mean = False
Connect them up:
>>> realignflow = Workflow(name='realign_with_spm')
>>> realignflow.connect(convert2nii, 'out_file',
realign_spm, 'in_files')
>>> realignflow.base_dir = opap('.')
>>> realignflow.run()
>>> realignflow.write_graph()
>>> realignflow.write_graph(graph2use='orig')
Instead of assigning data ourselves, let's glob it
>>> from nipype.interfaces.io import DataGrabber
>>> ds = Node(DataGrabber(infields=['subject_id'],
outfields=['func']),
name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '%s/BOLD/task001*/bold.nii.gz'
>>> ds.inputs.subject_id = 'sub001'
>>> ds.run().outputs
func = ['...mri_class/ds107/sub001/BOLD/task001_run001/bold.nii.gz',
'...mri_class/ds107/sub001/BOLD/task001_run002/bold.nii.gz']
>>> ds.inputs.subject_id = 'sub049'
>>> ds.run().outputs
func = ['...mri_class/ds107/sub049/BOLD/task001_run001/bold.nii.gz',
'...mri_class/ds107/sub049/BOLD/task001_run002/bold.nii.gz']
A little more practical usage
>>> ds = Node(DataGrabber(infields=['subject_id', 'task_id'],
outfields=['func', 'anat']),
name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '*'
>>> ds.inputs.template_args = {'func': [['subject_id', 'task_id']],
'anat': [['subject_id']]}
>>> ds.inputs.field_template =
{'func': '%s/BOLD/task%03d*/bold.nii.gz',
'anat': '%s/anatomy/highres001.nii.gz'}
>>> ds.inputs.subject_id = 'sub001'
>>> ds.inputs.task_id = 1
>>> ds.run().outputs
anat = '...mri_class/ds107/sub001/anatomy/highres001.nii.gz'
func = ['...mri_class/ds107/sub001/BOLD/task001_run001/bold.nii.gz',
'...mri_class/ds107/sub001/BOLD/task001_run002/bold.nii.gz']
MapNode + iterfield: runs underlying interface several times
>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
iterfield=['in_file'],
name='convert2nii')
Workflow + iterables: runs subgraph several times, attribute not input
>>> multiworkflow = Workflow(name='iterables')
>>> ds.iterables = ('subject_id', ['sub001', 'sub049'])
>>> multiworkflow.add_nodes([ds])
>>> multiworkflow.run()
>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
iterfield=['in_file'],
name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')
Set inputs:
>>> convert2nii.inputs.in_file = files
>>> realign_spm.inputs.register_to_mean = False
Connect them up:
>>> realignflow = Workflow(name='realign_with_spm')
>>> realignflow.connect(convert2nii, 'out_file',
realign_spm, 'in_files')
>>> ds = Node(DataGrabber(infields=['subject_id', 'task_id'],
outfields=['func']),
name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '%s/BOLD/task%03d*/bold.nii.gz'
>>> ds.inputs.template_args = {'func': [['subject_id', 'task_id']]}
>>> ds.inputs.task_id = 1
>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
iterfield=['in_file'],
name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')
>>> realign_spm.inputs.register_to_mean = False
>>> connectedworkflow = Workflow(name='connectedtogether')
>>> ds.iterables = ('subject_id', ['sub001', 'sub049'])
>>> connectedworkflow.connect(ds, 'func', convert2nii, 'in_file')
>>> connectedworkflow.connect(convert2nii, 'out_file',
realign_spm, 'in_files')
>>> connectedworkflow.run()
Take output computed in a workflow out of it.
>>> sinker = Node(DataSink(), name='sinker')
>>> sinker.inputs.base_directory = opap('output')
>>> connectedworkflow.connect(realign_spm, 'realigned_files',
sinker, 'realigned')
>>> connectedworkflow.connect(realign_spm, 'realignment_parameters',
sinker, 'realigned.@parameters')
How to determine output location:
'base_directory/container/parameterization/destloc/filename' destloc = string[[.[@]]string[[.[@]]string]] and filename comes from the input to the connect statement.
iterables + MapNode + Node + Workflow + DataGrabber + DataSink
- IdentityInterface: Whatever comes in goes out
- Function: The do anything you want card
>>> from nipype.interfaces.utility import IdentityInterface
>>> subject_id = Node(IdentityInterface(fields=['subject_id']),
name='subject_id')
>>> subject_id.iterables = ('subject_id', [0, 1, 2, 3])
or my usual test mode
>>> subject_id.iterables = ('subject_id', subjects[:1])
or
>>> subject_id.iterables = ('subject_id', subjects[:10])
Do anything you want in Nipype card!
>>> from nipype.interfaces.utility import Function
>>> def myfunc(input1, input2):
"""Add and subtract two inputs
"""
return input1 + input2, input1 - input2
>>> calcfunc = Node(Function(input_names=['input1', 'input2'],
output_names = ['sum', 'difference'],
function=myfunc),
name='mycalc')
>>> calcfunc.inputs.input1 = 1
>>> calcfunc.inputs.input2 = 2
>>> res = calcfunc.run()
>>> res.outputs
sum = 3
difference = -1
Normally calling run executes the workflow in series
>>> connectedworkflow.run()
but you can scale to a cluster very easily
>>> connectedworkflow.run('MultiProc', plugin_args={'n_procs': 4})
>>> connectedworkflow.run('PBS', plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('SGE', plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('Condor',
plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('IPython')
Requirement: shared filesystem
where art thou shell script?
>>> from nipype.interfaces.io import XNATSource
>>> from nipype.pipeline.engine import Node, Workflow
>>> from nipype.interfaces.fsl import BET
>>> dg = Node(XNATSource(infields=['subject_id', 'mpr_id'],
outfields=['struct'],
config='/Users/satra/xnatconfig'),
name='xnatsource')
>>> dg.inputs.query_template = ('/projects/CENTRAL_OASIS_CS/subjects/'
'%s/experiments/%s_MR1/scans/mpr-%d/'
'resources/files')
>>> dg.inputs.query_template_args['struct'] = [['subject_id',
'subject_id',
'mpr_id']]
>>> dg.inputs.subject_id = 'OAS1_0002'
>>> dg.inputs.mpr_id = 1
>>> bet = Node(BET(), name='skull_stripper')
>>> wf = Workflow(name='testxnat')
>>> wf.base_dir = '/software/temp/xnattest'
>>> wf.connect(dg, ('struct', select_img), bet, 'in_file')
['/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon.img',
'/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon.hdr',
'/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon_sag_66.gif']
>>> wf.connect(dg, ('struct', select_img), bet, 'in_file')
>>> def select_img(central_list):
for fname in central_list:
if fname.endswith('img'):
return fname
- Config options: controlling behavior
>>> from nipype import config, logging
>>> config.set_debug_mode()
>>> logging.update_logging()
>>> config.set('execution', 'keep_unnecessary_outputs', 'true')
- Reusing workflows
>>> from nipype.workflows.smri.freesurfer.utils import
create_getmask_flow
>>> getmask = create_getmask_flow()
>>> getmask.inputs.inputspec.source_file = 'mean.nii'
>>> getmask.inputs.inputspec.subject_id = 's1'
>>> getmask.inputs.inputspec.subjects_dir = '.'
>>> getmask.inputs.inputspec.contrast_type = 't2'
>>> getmask.run()
- Quickstart
- Links on the right (connects with mailing lists)
- Debugging recommendations
- Reproducible research (standards)
- Scalability
- AWS
- Graph submission with depth first order
- Social collaboration and workflow development
- Google docs for scientific workflows
[1] | Poline J, Breeze JL, Ghosh SS, Gorgolewski K, Halchenko YO, Hanke M, Haselgrove, C, Helmer KG, Marcus DS, Poldrack RA, Schwartz Y, Ashburner J and Kennedy DN (2012). Data sharing in neuroimaging research. Front. Neuroinform. 6:9. http://dx.doi.org/10.3389/fninf.2012.00009 |
[2] | Gorgolewski K, Burns CD, Madison C, Clark D, Halchenko YO, Waskom ML, Ghosh SS (2011) Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinform. 5:13. http://dx.doi.org/10.3389/fninf.2011.00013 |