neoANT-HILL is a python toolkit that integrates several pipelines for fully automated identification of potential neoantigens (pNeoAgs) which could be used in personalized immunotherapy due to their ability to elicit and boosting T-cell immune response. It is available as a Docker pre-built image and allows the analysis of single- or multiple samples. As input files is required RNA sequencing reads and/or somatic DNA mutations derived from Next Generating Sequencing.
After cloning the repository, build the container:
$ docker build -t neoanthill:1.0 /path/to/Dockerfile
Running the container:
$ docker run -v path/to/input:/home/biodocker/input -v path/to/output:/home/biodocker/output -p host:80 -it neoanthill:1.0 /bin/bash
To execute neoANT-HILL, run the following command:
$ python app.py
Then, open the web browser and type the following address to start the interface:
localhost:[host]
Note: RNA-seq files should match the following naming convention: sampleID{_1,2}.extension
where:
sampleID is the identifier of the sample;
{_1,2} is the read pair in the paired-end samples (FASTQ)
extension is the file extension eg. sam, bam, fastq, fastq.gz, etc.
Note: The sampleID from VCF should match the sampleID from RNA-seq FASTQ.
For each sample the pipeline creates a generic diretory specified by the user (default: datestamp
). Inside this directory there will be folders named sampleID.
For each sample the following output files can be created:
Output | Description |
---|---|
variant_calling |
Somatic mutations called from the RNAseq data |
mutations |
FASTA sequences (WT and MT) |
allele_prediction |
HLA predicted haplotypes |
gene_expression |
Gene expression abundance |
immune_infiltrating |
Quantification of tumor-infiltrating immune cells |
neoANT-HILL uses the following software components and tools:
#GATK4 is open-source under a BSD 3-clause license (https://software.broadinstitute.org/gatk/).
#SnpEff is open source, released as "LGPLv3" (http://snpeff.sourceforge.net/).
#By using the IEDB software, you are consenting to be bound by and become a "Licensee" for the use of IEDB tools and are consenting to the terms and conditions of the Non-Profit Open Software License ("Non-Profit OSL") version 3.0
Please read these two license agreements here before proceeding. If you do not agree to all of the terms of these two agreements, you must not install or use the product. Companies (for-profit entities) interested in downloading the command-line versions of the IEDB tools or running the entire analysis resource locally, should contact us ([email protected]) for details on licensing options.
Citing the IEDB All publications or presentations of data generated by use of the IEDB Resource Analysis tools should include citations to the relevant reference(s), found here.
#MHCflurry is available under the Apache License 2.0 (https://github.com/openvax/mhcflurry).
#Kallisto is distributed under BSD 2-Clause License with permission to use, copy, modify, and distribute the software and its documentation for educational and research not-for-profit purposes (https://pachterlab.github.io/kallisto/).
#OptiType is licensed under the open-source BSD 3-Clause license (https://github.com/FRED-2/OptiType).
#quanTIseq project is released under BSD 3-Clause License (https://icbi.i-med.ac.at/software/quantiseq/doc/index.html).
1. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43(5):491-498.
2. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics. 2013;:11.10.1-11.10.33.
3. CINGOLANI, Pablo et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, v. 6, n. 2, p. 80-92, 2012.
4. VITA, Randi et al. The immune epitope database (IEDB): 2018 update. Nucleic acids research, v. 47, n. D1, p. D339-D343, 2018.
5. O'DONNELL, Timothy J. et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell systems, v. 7, n. 1, p. 129-132. e4, 2018.
6. BRAY, Nicolas L. et al. Near-optimal probabilistic RNA-seq quantification. Nature biotechnology, v. 34, n. 5, p. 525, 2016.
7. SZOLEK, András et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, v. 30, n. 23, p. 3310-3316, 2014.
8. FINOTELLO, Francesca et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome medicine, v. 11, n. 1, p. 34, 2019.
This release only supports the human genome version GRCh37.