Egzotek is a bioinformatic pipeline designed for transcript annotation of non-model species with incomplete or poorly annotated genomes. It is developed to build a consensus annotated genome usig long RNA reads.
- Read orientation
- Oriented protocol (eoulsan)
- Non-oriented protocol (restrander)
- Transcript annotation with RNA-Bloom
- Transcript annotation (rna-bloom) (with optional short-read polishing)
- Genome mapping (minimap2)
- Bam to bed file conversion (minimap2-paftools)
- Bed to gff file conversion (agat)
- Gff to gtf file conversion (agat)
- Transcript annotation with Isoquant
- Complement annotation (agat)
- Clusterisation (gffread)
- Merge annotation
$ git clone [email protected]:GenomiqueENS/egzotek.git
$ cd egzotek
Customize runs by editing the nextflow.config file and/or specifying parameters at the command line.
$ nextflow run transcript_annotation.nf
Here are the primary input parameters for configuring the workflow:
Parameter | Description | Default Value |
---|---|---|
reads |
Path to the fastq files (required) | test_data/*.fasta |
samplesheet |
Path to the samplesheet file (required) | test_data/samplesheet.csv |
genome |
Path to the genome .fasta file (required) | test_data/Treesei_QM6a.fasta |
annotation |
Path to the reference transcriptome .gtf file (required) | test_data/transcriptome.gtf |
oriented |
Orientation of reads based on library protocol (required) | false |
sam |
Path to sam files after eoulsan (required if oriented=true) | null |
Configuration of tools used for annotation process:
Parameter | Description | Default Value |
---|---|---|
config |
Path to Restrander configuration file (TSO and RTP sequences) (required if reads are non oriented) | /assets/PCB111.json |
intron_length |
Parameter for maximum intron length for Minimap2 | 20000 |
junc_bed |
Parameter for junction bed annotation for Minimap2 | null |
model_strategy |
Parameter for transcript model construction algorithm | default_ont |
optional_shortread |
Path to Illumina shortreads .fasta file for RNA-Bloom | null |
Parameter | Description | Default Value |
---|---|---|
outdir |
Output directory for results | "result" |
Configuration for running the workflow:
Parameter | Description | Default Value |
---|---|---|
threads |
Number of threads to use | 4 |
docker.runOptions |
Docker run options to use | '-u $(id -u):$(id -g)' |
After execution, results will be available in the specified --outdir
. This includes SAM and BAM files produced for IsoQuant and RNABloom and gtf with annotated transcriptomes.
To clean up temporary files generated by Nextflow:
nextflow clean -f
For support, please open an issue in the repository's "Issues" section. Contributions via Pull Requests are welcome. Follow the contribution guidelines specified in CONTRIBUTING.md
.
Egzotek
is distributed under a specific license. Check the LICENSE
file in the GitHub repository for details.