Skip to content

IonTorrent Analysis

Stephen Kelly edited this page Jun 23, 2017 · 16 revisions

The latest documentation on running the IonTorrent pipeline should be found on the pipeline's repo here

The following instructions may not be up to date with the current best practices.

Synopsis

# change to the pipeline's directory
cd /ifs/data/molecpathlab/IonTorrent_reporter/pipeline

# check for runs
code/get_server_run_list.sh

# Make a Sample sheet
code/make_samplesheet.py -p <analysis_ID_1> -p <analysis_ID_2> -n <analysis_ID>

# Check your Sample sheet
cat samplesheets/<analysis_ID>.tsv


# download run data
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -d

# annotate VCF files
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -aq

# wait for jobs to finish...

# creat reports & snapshots
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -pq

# manually review in CyberDuck

# mail the results if it looks good
code/mail_analysis_report.sh <analysis_ID_1> <analysis_ID_2>

Setup

First, change to the directory holding the IonTorrent analysis reporting pipeline

cd /ifs/data/molecpathlab/IonTorrent_reporter/pipeline

Check for the latest runs

code/get_server_run_list.sh

Run

Sample Sheet Method

Make a Sample sheet

If you want to process several runs easily, you can make a sample sheet. Paired analyses should go on the same line (tab-separated), unpaired analyses go on separate lines.

You can use the samplesheet creation script to make a tab-separated sample sheet:

code/make_samplesheet.py -p <analysis_ID_1> -p <analysis_ID_2> -n <analysis_ID>

You should check your sample sheet with the cat command to make sure it looks right:

$ cat samplesheets/<analysis_ID>.tsv

Example

For example, to make a sample sheet for the pair of analyses Auto_user_SN2-271-IT17-19-1_327_355 and Auto_user_SN2-272-IT17-19-2_329_356, you can use this command:

$ code/make_samplesheet.py -p Auto_user_SN2-271-IT17-19-1_327_355 -p Auto_user_SN2-272-IT17-19-2_329_356 -n IT17-19
Single IDs:
[]
Paired IDs:
['Auto_user_SN2-271-IT17-19-1_327_355', 'Auto_user_SN2-272-IT17-19-2_329_356']
New samplesheet file:
samplesheets/IT17-19.tsv

To check the samplesheet samplesheets/IT17-19.tsv, you would run this:

$ cat samplesheets/IT17-19.tsv
Auto_user_SN2-271-IT17-19-1_327_355	Auto_user_SN2-272-IT17-19-2_329_356

To make a sample sheet that includes non-paired analysis runs, simply add the single analysis ID's without any flags:

$ code/make_samplesheet.py -p Auto_user_SN2-271-IT17-19-1_327_355 -p Auto_user_SN2-272-IT17-19-2_329_356 -n mixed_runs Auto_user_SN2-231-IT16-056-1_290_319 SN2-211-IT16-048-2_11-08-2016_300
Single IDs:
['Auto_user_SN2-231-IT16-056-1_290_319', 'SN2-211-IT16-048-2_11-08-2016_300']
Paired IDs:
['Auto_user_SN2-271-IT17-19-1_327_355', 'Auto_user_SN2-272-IT17-19-2_329_356']
New samplesheet file:
samplesheets/mixed_runs.tsv

$ cat samplesheets/mixed_runs.tsv
Auto_user_SN2-231-IT16-056-1_290_319
SN2-211-IT16-048-2_11-08-2016_300
Auto_user_SN2-271-IT17-19-1_327_355	Auto_user_SN2-272-IT17-19-2_329_356

Run your Sample Sheet

To run the sample sheet, you would next do the following, replacing samplesheets/IT17-18.tsv with the path to your sample sheet file:

  • download the files for the analyses
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -d
  • annotate the VCF's and make variant tables (submit job to the cluster)
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -aq
  • generate paired-analysis snapshots and reports (submit job to the cluster)
code/run_samplesheet.py samplesheets/<analysis_ID>.tsv -pq

Example

# download
code/run_samplesheet.py samplesheets/IT17-18.tsv -d

# annotate
code/run_samplesheet.py samplesheets/IT17-18.tsv -aq

# wait for jobs to finish...

# report
code/run_samplesheet.py samplesheets/IT17-18.tsv -pq

Should look something like this

Manual Method

If you don't want to use a samplesheet to run the analysis, you can run each step yourself by calling the desired scripts, followed by the ID's of the analyses to be run.

Review the Output

The analysis report(s) should be manually reviewed; use a desktop program such as CyberDuck or WinSCP for this

screen shot 2017-04-26 at 4 54 32 pm

Currently, only the overview_report.html is used.

Some things to look for in the report:

  • make sure that the number of variants present in the variant table matches the number of IGV snapshot entires (compare the table row numbers with the Table of Contents section numbers)
  • make sure the IGV snapshots loaded BAM files match the given sample (read the little tiny filename on the left side of the image)
  • if a control is included on the lower panel of the IGV snapshot, make sure it is from the correct pair of analysis runs

If everything looks good, then mail the results:

code/mail_analysis_report.sh <analysis_ID_1> <analysis_ID_2>

Example

code/mail_analysis_report.sh Auto_user_SN2-269-IT17-18-1_325_353 Auto_user_SN2-270-IT17-18-2_326_354


Notes

  • File download steps are always run in the current terminal session and may take several minutes to complete. Output messages may not immediately be visible. As such, it is a good idea to run in screen, or be sure that you do not terminate the current session or process while the download is running.

  • The reporting pipeline currently requires Python 2.7. If you try to run it without using Python 2.7, you'll probably get errors like this:

Running pipeline with the following parameters:

Traceback (most recent call last):
  File "code/run_samplesheet.py", line 37, in <module>
    print "Samplesheet file: {:>29}".format(samplesheet_file)
ValueError: zero length field name in format

  • To check if you have Python 2.7 loaded, run this command:
python --version
  • For example:
# GOOD
$ python --version
Python 2.7.3
# BAD
$ python --version
Python 2.6.6
  • If you don't have Python 2.7 loaded, run this command to set it to automatically load on login, then exit the Terminal, log back in, and check again:
echo 'module load python/2.7' >> ~/.bashrc

[Full reporting pipeline documentation is found here]