{'Nipype': ('why', 'what', 'how')}

What we will cover today

Overview of Nipype
Semantics of Nipype
Playing with interfaces
Creating workflows
Advanced features
Future directions

Why Nipype?

... one ring to bind them ...

Brain imaging: the process

From design to databases [1]

Brainimaging software

a plethora of evolving options

Brainimaging software: issues

different algorithms
different assumptions
different platforms
different interfaces
different file formats

Leads to many questions?

neuroscientist:

which packages should I use?

why should I use these packages?

how do they differ?

how should I use these packages?

developer:

which package(s) should I develop for?

how do I disseminate my software?

... and more questions

How do we:

Install, use, maintain and test multiple packages

Reduce manual intervention

Train people

Tailor to specific projects

Develop new tools

Perform reproducible research

Many workflow systems out there

BioImage Suite
BIRN Tools
BrainVisa
CambaFX
JIST for MIPAV
LONI pipeline
MEVIS Lab
PSOM

Solution requirements

Coming at it from a developer's perspective, we needed something

lightweight
scriptable
provided formal, common semantics
allowed interactive exploration
supported efficient batch processing
enabled rapid algorithm prototyping
was flexible and adaptive

Existing technologies

shell scripting:

Can be quick to do, and powerful, but application specific scalability, and not easy to port across different architectures.

make/CMake:

Similar in concept to workflow execution in Nipype, but again limited by the need for command line tools and flexibility in terms of scaling across hardware architectures (although see makeflow).

Octave/MATLAB:

Integration with other tools is ad hoc (i.e., system call) and dataflow is managed at a programmatic level. However, see PSOM which offers a very nice alternative to some aspects of Nipype for Octave/Matlab users.

Graphical options: (e.g., LONI pipeline)

Adding or reusing components across different projects require XML manipulation or subscribing to some specific databases.

We built Nipype in Python

Why Python?

easy to learn
coding style makes for easy readability
cross-platform
extensive infrastructure for

development and distribution

scientific computing

brain imaging

several institutions are adopting it in computer science classes

What can we use Python for?

scripting (like shell scripts e.g. bash, csh)
make web sites (like these slides)
science (like R, Matlab, IDL, Octave, Scilab)
etc.

You just need to know 1 language to do almost everything !

Scientific Python building blocks

IPython, an advanced Python shell: http://ipython.org
Numpy : provides powerful numerical arrays objects, and routines to manipulate them: http://www.numpy.org
Scipy : high-level data processing routines. Optimization, regression, interpolation, etc: http://www.scipy.org
Matplotlib a.k.a. Pylab: 2-D visualization, "publication-ready" plots http://matplotlib.sourceforge.net
Mayavi : 3-D visualization http://code.enthought.com/projects/mayavi
Scikit-learn, machine learning: http://scikit-learn.org
Scikit-Image, image processing: http://scikits-image.org
RPy2, communicating with R: http://rpy.sourceforge.net/rpy2.html

Brain Imaging in Python

NiPy, an umbrella project for Neuroimaging in Python: http://nipy.org
- DiPy, diffusion imaging
- Nibabel, file reading and writing
- NiPy, preprocessing and statistical routines
- Nipype, interfaces and workflows
- Nitime, time series analysis
- PySurfer, Surface visualization
PyMVPA, machine learning for neuroimaging: http://pymvpa.org
PsychoPy, stimulus presentation: http://psychopy.org

What is Nipype?

Nipype architecture [2]

Interface
Engine
Executable Plugins

Semantics: Interface

Interface: Wraps a program or function

Semantics: Engine

Node/MapNode: Wraps an Interface for use in a Workflow that provides caching and other goodies (e.g., pseudo-sandbox)
Workflow: A graph or forest of graphs whose nodes are of type Node, MapNode or Workflow and whose edges represent data flow

Semantics

Plugin: A component that describes how a Workflow should be executed

Software interfaces

Currently supported (4-2-2012). Click here for latest

AFNI	ANTS
BRAINS	Camino
Camino-TrackVis	ConnectomeViewerToolkit
dcm2nii	Diffusion Toolkit
FreeSurfer	FSL
MRtrx	Nipy
Nitime	PyXNAT
Slicer	SPM

Most used/contributed policy!

Not every component of these packages are available.

Workflows

Properties:

processing pipeline is a directed acyclic graph (DAG)
nodes are processes
edges represent data flow
compact represenation for any process
code and data separation

Execution Plugins

Allows seamless execution across many architectures

local

serially

multicore

Clusters

Condor

PBS/Torque

SGE

SSH (via IPython)

How can I use Nipype?

Environment and installing
Nipype as a brain imaging library
Building and executing workflows
Contributing to Nipype

Presenter Notes

imperative style caching
Workflow concepts
Hello World! of workflows
Grabbing and Sinking
iterables and iterfields
Distributed computing
The Function interface
Config options
Debugging
actual workflows (resting, task, diffusion)

Installing and environment

Scientific Python:

Debian/Ubuntu/Scientific Fedora
Enthought Python Distribution (EPD)

Installing Nipype:

Available from @NeuroDebian, @PyPI, and @GitHub
Dependencies: networkx, nibabel, numpy, scipy, traits

Running Nipype (Quickstart):

Ensure tools are installed and accessible
Nipype is a wrapper, not a substitute for AFNI, ANTS, FreeSurfer, FSL, SPM, NiPy, etc.,.

For today's tutorial

At MIT you can configure your environment as:

source /software/python/EPD/virtualenvs/7.2/nipype0.5/bin/activate
export TUT_DIR=/mindhive/scratch/mri_class/$LOGNAME/nipype-tutorial
mkdir -p $TUT_DIR
cd $TUT_DIR
ln -s /mindhive/xnat/data/nki_test_retest nki
ln -s /mindhive/xnat/data/openfmri/ds107 ds107
ln -s /mindhive/xnat/surfaces/nki_test_retest nki_surfaces
ln -s /mindhive/xnat/surfaces/openfmri/ds107 ds107_surfaces
module add torque
export ANTSPATH=/software/ANTS/versions/120325/bin/
export PATH=/software/common/bin:$ANTSPATH:$PATH
. fss 5.1.0
. /etc/fsl/4.1/fsl.sh

For our interactive session we will use IPython:

ipython notebook --pylab=inline

Tutorial data and subject ids

OpenfMRI test-retest data
- sub001
- sub049
NKI Test-Retest data
- 2475376
- 0021006
Surfaces reconstructed with FreeSurfer 5.1 without editing

Hello nipype!

Nipype as a library
Imperative programming with caching
Workflow concepts
Hello World! of workflows
Data grabbing and sinking
Loops: iterables and iterfields
The IdentityInterface and Function interfaces
Config options, Debugging, Distributed computing

Nipype as a library

Importing functionality

>>> from nipype.interfaces.camino import DTIFit
>>> from nipype.interfaces.spm import Realign

Finding interface inputs and outputs and examples

>>> DTIFit.help()
>>> Realign.help()

Executing the interfaces

>>> fitter = DTIFit(scheme_file='A.sch',
                    in_file='data.bfloat')
>>> fitter.run()

>>> aligner = Realign(in_file='A.nii')
>>> aligner.run()

Work in a directory

import os
from shutil import copyfile
library_dir = os.path.join(os.getenv('TUT_DIR'), 'as_a_library')
os.mkdir(library_dir)
os.chdir(library_dir)

Using interfaces: comparison

We will use FreeSurfer to convert the file to uncompressed Nifti

from nipype.interfaces.freesurfer import MRIConvert
MRIConvert(in_file='../ds107/sub001/BOLD/task001_run001/bold.nii.gz',
           out_file='ds107.nii').run()

Normally:

$ mri_convert ../ds107/sub001/BOLD/task001_run001/bold.nii.gz
       ds107.nii

Shell script wins!

Using interfaces: more Interfaces

Import the motion-correction interfaces

from nipype.interfaces.spm import Realign
from nipype.interfaces.fsl import MCFLIRT

Run SPM first

>>> results1 = Realign(in_files='ds107.nii',
                       register_to_mean=False).run()
>>> ls
ds107.mat  ds107.nii  meands107.nii  pyscript_realign.m  rds107.mat
rds107.nii  rp_ds107.txt

Shell script goes into hiding. Of course it could do ;)

$ python -c "from nipype.interfaces.spm import Realign;
             Realign(...).run()"

Let's use FSL

but how?

>>> MCFLIRT.help()

or go to: MCFLIRT help

>>> results2 = MCFLIRT(in_file='ds107.nii', ref_vol=0,
                       save_plots=True).run()

Now we can look at some results

subplot(211);plot(genfromtxt('ds107_mcf.nii.gz.par')[:, 3:]);
title('FSL')
subplot(212);plot(genfromtxt('rp_ds107.txt')[:,:3]);title('SPM')

if i execute the MCFLIRT line again, well, it runs again!

Using Nipype caching

Setup

>>> from nipype.caching import Memory
>>> mem = Memory('.')

Create cacheable objects

>>> spm_realign = mem.cache(Realign)
>>> fsl_realign = mem.cache(MCFLIRT)

Execute interfaces

>>> spm_results = spm_realign(in_files='./as_a_library/ds107.nii',
                              register_to_mean=False)
>>> fsl_results = fsl_realign(in_file='./as_a_library/ds107.nii',
                              ref_vol=0, save_plots=True)

Compare

subplot(211);plot(genfromtxt(fsl_results.outputs.par_file)[:, 3:])
subplot(212);
plot(genfromtxt(spm_results.outputs.realignment_parameters)[:,:3])

More caching

Execute interfaces again

>>> spm_results = spm_realign(in_files='./as_a_library/ds107.nii',
                              register_to_mean=False)
>>> fsl_results = fsl_realign(in_file='./as_a_library/ds107.nii',
                              ref_vol=0, save_plots=True)

Output

120401-23:16:21,144 workflow INFO:

Executing node 43650b0cabb14ef502659398b944be8b in dir: /mindhive/gablab/satra/mri_class/nipype_mem/nipype-interfaces-spm-preprocess-Realign/43650b0cabb14ef502659398b944be8b

120401-23:16:21,145 workflow INFO:

Collecting precomputed outputs

120401-23:16:21,158 workflow INFO:

Executing node e91bcd85558ecd0a2786c9fdd2bcb65a in dir: /mindhive/gablab/satra/mri_class/nipype_mem/nipype-interfaces-fsl-preprocess-MCFLIRT/e91bcd85558ecd0a2786c9fdd2bcb65a

120401-23:16:21,159 workflow INFO:

Collecting precomputed outputs

More files to process

what if we had more files?

>>> from os.path import abspath as opap
>>> files = [opap('ds107/sub001/BOLD/task001_run001/bold.nii.gz'),
             opap('ds107/sub001/BOLD/task001_run002/bold.nii.gz')]
>>> fsl_results = fsl_realign(in_file=files, ref_vol=0,
                              save_plots=True)
>>> spm_results = spm_realign(in_files=files, register_to_mean=False)

They will both break but for different reasons:

1. Interface incompatibility
2. File format

converter = mem.cache(MRIConvert)
newfiles = []
for idx, fname in enumerate(files):
    newfiles.append(converter(in_file=fname,
                              out_type='nii').outputs.out_file)

Workflow concepts

Where:

>>> from nipype.pipeline.engine import Node, MapNode, Workflow

Node:

>>> spm_realign = mem.cache(Realign)
>>> realign_spm = Node(Realign(), name='motion_correct')

Mapnode:

>>> realign_fsl = MapNode(MCFLIRT(), iterfield=['in_file'],
                          name='motion_correct_with_fsl')

Workflow:

>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])

Workflow: set inputs and run

Node:

>>> realign_spm.inputs.in_files = newfiles
>>> realign_spm.inputs.register_to_mean = False
>>> realign_spm.run()

Mapnode:

>>> realign_fsl.inputs.in_file = files
>>> realign_fsl.inputs.ref_vol = 0
>>> realign_fsl.run()

Workflow:

>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])
>>> myflow.base_dir = opap('.')
>>> myflow.run()

Workflow: setting inputs

Workflow:

>>> myflow = Workflow(name='realign')
>>> myflow.add_nodes([realign_spm, realign_fsl])
>>> myflow.base_dir = opap('.')
>>> myflow.inputs.motion_correct.in_files = newfiles
>>> myflow.inputs.motion_correct.register_to_mean = False
>>> myflow.inputs.motion_correct_with_fsl.in_file = files
>>> myflow.inputs.motion_correct_with_fsl.ref_vol = 0
>>> myflow.run()

"Hello World" of Nipype workflows

Create two nodes:

>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
                          iterfield=['in_file'],
                          name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')

Set inputs:

>>> convert2nii.inputs.in_file = files
>>> realign_spm.inputs.register_to_mean = False

Connect them up:

>>> realignflow = Workflow(name='realign_with_spm')
>>> realignflow.connect(convert2nii, 'out_file',
                        realign_spm, 'in_files')
>>> realignflow.base_dir = opap('.')
>>> realignflow.run()

Visualize the workflow

>>> realignflow.write_graph()

>>> realignflow.write_graph(graph2use='orig')

Data grabbing

Instead of assigning data ourselves, let's glob it

>>> from nipype.interfaces.io import DataGrabber
>>> ds = Node(DataGrabber(infields=['subject_id'],
                          outfields=['func']),
              name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '%s/BOLD/task001*/bold.nii.gz'

>>> ds.inputs.subject_id = 'sub001'
>>> ds.run().outputs
func = ['...mri_class/ds107/sub001/BOLD/task001_run001/bold.nii.gz',
        '...mri_class/ds107/sub001/BOLD/task001_run002/bold.nii.gz']

>>> ds.inputs.subject_id = 'sub049'
>>> ds.run().outputs
func = ['...mri_class/ds107/sub049/BOLD/task001_run001/bold.nii.gz',
        '...mri_class/ds107/sub049/BOLD/task001_run002/bold.nii.gz']

Multiple files

A little more practical usage

>>> ds = Node(DataGrabber(infields=['subject_id', 'task_id'],
                          outfields=['func', 'anat']),
              name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '*'
>>> ds.inputs.template_args = {'func': [['subject_id', 'task_id']],
                               'anat': [['subject_id']]}
>>> ds.inputs.field_template =
                     {'func': '%s/BOLD/task%03d*/bold.nii.gz',
                      'anat': '%s/anatomy/highres001.nii.gz'}

>>> ds.inputs.subject_id = 'sub001'
>>> ds.inputs.task_id = 1
>>> ds.run().outputs
anat = '...mri_class/ds107/sub001/anatomy/highres001.nii.gz'
func = ['...mri_class/ds107/sub001/BOLD/task001_run001/bold.nii.gz',
        '...mri_class/ds107/sub001/BOLD/task001_run002/bold.nii.gz']

Loops: iterfield (MapNode)

MapNode + iterfield: runs underlying interface several times

>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
                          iterfield=['in_file'],
                          name='convert2nii')

Loops: iterables (subgraph)

Workflow + iterables: runs subgraph several times, attribute not input

>>> multiworkflow = Workflow(name='iterables')
>>> ds.iterables = ('subject_id', ['sub001', 'sub049'])
>>> multiworkflow.add_nodes([ds])
>>> multiworkflow.run()

Reminder

>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
                          iterfield=['in_file'],
                          name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')

Set inputs:

>>> convert2nii.inputs.in_file = files
>>> realign_spm.inputs.register_to_mean = False

Connect them up:

>>> realignflow = Workflow(name='realign_with_spm')
>>> realignflow.connect(convert2nii, 'out_file',
                        realign_spm, 'in_files')

Connecting to computation

>>> ds = Node(DataGrabber(infields=['subject_id', 'task_id'],
                          outfields=['func']),
              name='datasource')
>>> ds.inputs.base_directory = opap('ds107')
>>> ds.inputs.template = '%s/BOLD/task%03d*/bold.nii.gz'
>>> ds.inputs.template_args = {'func': [['subject_id', 'task_id']]}
>>> ds.inputs.task_id = 1
>>> convert2nii = MapNode(MRIConvert(out_type='nii'),
                          iterfield=['in_file'],
                          name='convert2nii')
>>> realign_spm = Node(Realign(), name='motion_correct')
>>> realign_spm.inputs.register_to_mean = False

>>> connectedworkflow = Workflow(name='connectedtogether')
>>> ds.iterables = ('subject_id', ['sub001', 'sub049'])
>>> connectedworkflow.connect(ds, 'func', convert2nii, 'in_file')
>>> connectedworkflow.connect(convert2nii, 'out_file',
                              realign_spm, 'in_files')
>>> connectedworkflow.run()

Data sinking

Take output computed in a workflow out of it.

>>> sinker = Node(DataSink(), name='sinker')
>>> sinker.inputs.base_directory = opap('output')
>>> connectedworkflow.connect(realign_spm, 'realigned_files',
                              sinker, 'realigned')
>>> connectedworkflow.connect(realign_spm, 'realignment_parameters',
                              sinker, 'realigned.@parameters')

How to determine output location:

'base_directory/container/parameterization/destloc/filename'

destloc = string[[.[@]]string[[.[@]]string]] and
filename comes from the input to the connect statement.

Putting it all together

iterables + MapNode + Node + Workflow + DataGrabber + DataSink

Two utility interfaces

IdentityInterface: Whatever comes in goes out
Function: The do anything you want card

IdentityInterface

>>> from nipype.interfaces.utility import IdentityInterface
>>> subject_id = Node(IdentityInterface(fields=['subject_id']),
                      name='subject_id')
>>> subject_id.iterables = ('subject_id', [0, 1, 2, 3])

or my usual test mode

>>> subject_id.iterables = ('subject_id', subjects[:1])

or

>>> subject_id.iterables = ('subject_id', subjects[:10])

Function Interface

Do anything you want in Nipype card!

>>> from nipype.interfaces.utility import Function

>>> def myfunc(input1, input2):
        """Add and subtract two inputs
        """
        return input1 + input2, input1 - input2

>>> calcfunc = Node(Function(input_names=['input1', 'input2'],
                             output_names = ['sum', 'difference'],
                             function=myfunc),
                    name='mycalc')
>>> calcfunc.inputs.input1 = 1
>>> calcfunc.inputs.input2 = 2
>>> res = calcfunc.run()
>>> res.outputs
sum = 3
difference = -1

Distributed computing

Normally calling run executes the workflow in series

>>> connectedworkflow.run()

but you can scale to a cluster very easily

>>> connectedworkflow.run('MultiProc', plugin_args={'n_procs': 4})
>>> connectedworkflow.run('PBS', plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('SGE', plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('Condor',
                           plugin_args={'qsub_args': '-q many'})
>>> connectedworkflow.run('IPython')

Requirement: shared filesystem

where art thou shell script?

Databases

>>> from nipype.interfaces.io import XNATSource
>>> from nipype.pipeline.engine import Node, Workflow
>>> from nipype.interfaces.fsl import BET

>>> dg = Node(XNATSource(infields=['subject_id', 'mpr_id'],
                         outfields=['struct'],
                         config='/Users/satra/xnatconfig'),
              name='xnatsource')
>>> dg.inputs.query_template = ('/projects/CENTRAL_OASIS_CS/subjects/'
                                '%s/experiments/%s_MR1/scans/mpr-%d/'
                                'resources/files')
>>> dg.inputs.query_template_args['struct'] = [['subject_id',
                                                'subject_id',
                                                'mpr_id']]
>>> dg.inputs.subject_id = 'OAS1_0002'
>>> dg.inputs.mpr_id = 1

>>> bet = Node(BET(), name='skull_stripper')
>>> wf = Workflow(name='testxnat')
>>> wf.base_dir = '/software/temp/xnattest'
>>> wf.connect(dg, ('struct', select_img), bet, 'in_file')

Databases

['/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon.img',
 '/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon.hdr',
 '/var/.../c67d371..._OAS1_0002_MR1_mpr-1_anon_sag_66.gif']

>>> wf.connect(dg, ('struct', select_img), bet, 'in_file')

>>> def select_img(central_list):
        for fname in central_list:
            if fname.endswith('img'):
                return fname

Miscellaneous topics

Config options: controlling behavior

>>> from nipype import config, logging

>>> config.set_debug_mode()
>>> logging.update_logging()

>>> config.set('execution', 'keep_unnecessary_outputs', 'true')

Reusing workflows

>>> from nipype.workflows.smri.freesurfer.utils import
          create_getmask_flow

>>> getmask = create_getmask_flow()
>>> getmask.inputs.inputspec.source_file = 'mean.nii'
>>> getmask.inputs.inputspec.subject_id = 's1'
>>> getmask.inputs.inputspec.subjects_dir = '.'
>>> getmask.inputs.inputspec.contrast_type = 't2'
>>> getmask.run()

Where to go from here

Nipype website

Quickstart
Links on the right (connects with mailing lists)
Debugging recommendations

Future Directions

Reproducible research (standards)
- BIPS
- Provenance
Scalability
- AWS
- Graph submission with depth first order
Social collaboration and workflow development
- Google docs for scientific workflows

References

[1]	Poline J, Breeze JL, Ghosh SS, Gorgolewski K, Halchenko YO, Hanke M, Haselgrove, C, Helmer KG, Marcus DS, Poldrack RA, Schwartz Y, Ashburner J and Kennedy DN (2012). Data sharing in neuroimaging research. Front. Neuroinform. 6:9. http://dx.doi.org/10.3389/fninf.2012.00009

[2]	Gorgolewski K, Burns CD, Madison C, Clark D, Halchenko YO, Waskom ML, Ghosh SS (2011) Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinform. 5:13. http://dx.doi.org/10.3389/fninf.2011.00013

Files

slides.rst

Latest commit

History

slides.rst

File metadata and controls

What we will cover today

Why Nipype?

... one ring to bind them ...

Brain imaging: the process

Brainimaging software

Brainimaging software: issues

Leads to many questions?

... and more questions

Many workflow systems out there

Solution requirements

Existing technologies

We built Nipype in Python

Why Python?

What can we use Python for?

Scientific Python building blocks

Brain Imaging in Python

What is Nipype?

Nipype architecture [2]

Semantics: Interface

Semantics: Engine

Semantics

Software interfaces

Workflows

Execution Plugins

How can I use Nipype?

Presenter Notes

Installing and environment

For today's tutorial

Tutorial data and subject ids

Hello nipype!

Nipype as a library

Work in a directory

Using interfaces: comparison

Using interfaces: more Interfaces

Let's use FSL

Using Nipype caching

More caching

More files to process

Workflow concepts

Workflow: set inputs and run

Workflow: setting inputs

"Hello World" of Nipype workflows

Visualize the workflow

Data grabbing

Multiple files

Loops: iterfield (MapNode)

Loops: iterables (subgraph)

Reminder

Connecting to computation

Data sinking

Putting it all together

Two utility interfaces

IdentityInterface

Function Interface

Distributed computing

Databases

Databases

Miscellaneous topics

Where to go from here

Future Directions

References