Skip to content

Latest commit

 

History

History
159 lines (104 loc) · 10.7 KB

README.md

File metadata and controls

159 lines (104 loc) · 10.7 KB

neoANT-HILL is a python toolkit that integrates several pipelines for fully automated identification of potential neoantigens (pNeoAgs) which could be used in personalized immunotherapy due to their ability to elicit and boosting T-cell immune response. It is available as a Docker pre-built image and allows the analysis of single- or multiple samples. As input files is required RNA sequencing reads and/or somatic DNA mutations derived from Next Generating Sequencing.

1. Quick Start Guide

After cloning the repository, build the container:

$ docker build -t neoanthill:1.0 /path/to/Dockerfile

Running the container:

$ docker run -v path/to/input:/home/biodocker/input -v path/to/output:/home/biodocker/output -p host:80 -it neoanthill:1.0 /bin/bash

2. Running neoANT-HILL

To execute neoANT-HILL, run the following command:

$ python app.py

Then, open the web browser and type the following address to start the interface:

 localhost:[host]

3. Input Files

  • Somatic Variant List (.VCF format) and/OR
  • RNA-seq (Aligned and/or Raw)
  • Note: RNA-seq files should match the following naming convention: sampleID{_1,2}.extension

    where:

    
    
  • sampleID is the identifier of the sample;
  • {_1,2} is the read pair in the paired-end samples (FASTQ)
  • extension is the file extension eg. sam, bam, fastq, fastq.gz, etc.
  • Note: The sampleID from VCF should match the sampleID from RNA-seq FASTQ.

    4. Output Files

    For each sample the pipeline creates a generic diretory specified by the user (default: datestamp). Inside this directory there will be folders named sampleID.

    For each sample the following output files can be created:

    Output Description
    variant_calling Somatic mutations called from the RNAseq data
    mutations FASTA sequences (WT and MT)
    allele_prediction HLA predicted haplotypes
    gene_expression Gene expression abundance
    immune_infiltrating Quantification of tumor-infiltrating immune cells

    Licensing

    neoANT-HILL uses the following software components and tools:

  • GATK 4.0
  • #GATK4 is open-source under a BSD 3-clause license (https://software.broadinstitute.org/gatk/).

  • snpEff
  • #SnpEff is open source, released as "LGPLv3" (http://snpeff.sourceforge.net/).

  • IEDB
  • #By using the IEDB software, you are consenting to be bound by and become a "Licensee" for the use of IEDB tools and are consenting to the terms and conditions of the Non-Profit Open Software License ("Non-Profit OSL") version 3.0

    Please read these two license agreements here before proceeding. If you do not agree to all of the terms of these two agreements, you must not install or use the product. Companies (for-profit entities) interested in downloading the command-line versions of the IEDB tools or running the entire analysis resource locally, should contact us ([email protected]) for details on licensing options.

    Citing the IEDB All publications or presentations of data generated by use of the IEDB Resource Analysis tools should include citations to the relevant reference(s), found here.

  • MHCflurry
  • #MHCflurry is available under the Apache License 2.0 (https://github.com/openvax/mhcflurry).

  • Kallisto
  • #Kallisto is distributed under BSD 2-Clause License with permission to use, copy, modify, and distribute the software and its documentation for educational and research not-for-profit purposes (https://pachterlab.github.io/kallisto/).

  • Optitype
  • #OptiType is licensed under the open-source BSD 3-Clause license (https://github.com/FRED-2/OptiType).

  • quanTIseq
  • #quanTIseq project is released under BSD 3-Clause License (https://icbi.i-med.ac.at/software/quantiseq/doc/index.html).

    References

    1. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43(5):491-498.

    2. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics. 2013;:11.10.1-11.10.33.

    3. CINGOLANI, Pablo et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, v. 6, n. 2, p. 80-92, 2012.

    4. VITA, Randi et al. The immune epitope database (IEDB): 2018 update. Nucleic acids research, v. 47, n. D1, p. D339-D343, 2018.

    5. O'DONNELL, Timothy J. et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell systems, v. 7, n. 1, p. 129-132. e4, 2018.

    6. BRAY, Nicolas L. et al. Near-optimal probabilistic RNA-seq quantification. Nature biotechnology, v. 34, n. 5, p. 525, 2016.

    7. SZOLEK, András et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, v. 30, n. 23, p. 3310-3316, 2014.

    8. FINOTELLO, Francesca et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome medicine, v. 11, n. 1, p. 34, 2019.

    Limitations

    This release only supports the human genome version GRCh37.