From 766ab6c9c3059004c7c3f205621909b2d8b0b26d Mon Sep 17 00:00:00 2001
From: Toni Verbeiren <toni.verbeiren@gmail.com>
Date: Wed, 21 Aug 2024 13:32:48 +0200
Subject: [PATCH 01/42] Qualimap rnaseq (#74)

* first version

* complete script for qualimap

* add escaping character before leading hashtag (#50)

* add escaping character before leading hashtag

* update changelog

* Update CHANGELOG.md

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* replace escaping \ by \\

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Samtools collate (#49)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* Initial commit, whole component is functional

* Update viash (#51)

* update viash

* update readme

* update changelog

* update changelog

* fix incorrect heading detection

* update again

* clean up readme

* Samtools view (#48)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* initial version with a few tests, script, and config file

* update changelog, add one test

* add a 4th test, fix option names in the script

* Fix name of component in config

* remove option named with a number

* add must_exist to input file argument

* removed "default: null" from one of the arguments in config

* remove utf8 characters from config

* Update CHANGELOG.md

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Samtools fastq (#52)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* Initial commit, config, script, help and test_data

* Update changelog, add tests, fix argument naming errors, add test data

* update changelog, remove gffread namespace field

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* format URL in the description (#55)

* format URL in the description

* update changelog

* Change name in _viash.yaml (#60)

* Update operational code (#63)

* update readme

* switch ci to toolbox

* update to viash 0.9.0-RC6

* edit keywords

* fix version

* update biobox

* cutadapt (#7)

* First commit, clone of cutadapt in htrnaseq + help.txt

* Add config

* Don't allow multiple: true when providing a FASTA file with adapters

* First version of script

* Updates and fixes - se/pe

* Add tests and fix --json argument

* Add software version

* Better consistency in using snake_case

* Update src/cutadapt/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/cutadapt/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/cutadapt/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Specify --input and --input_r2 as separate arguments

* Avoid specifying default arg values

* Add more information to `--minimum_length` and `maximum_length`

* Add --cpus by means of $meta_cpus and set proper default

* Allow multiple for adapters/fasta and add test

* change multiple_sep to ';'

* add example

* simplify code with a helper function

* create directories in test

* use a different output extension if --fasta is provided

* decrease code duplication by separating optional outputs from paired/unpaired output arguments

* write custom tests for cutadapt

* fix _r2 arguments

* add debug flag as not to always print the cli command

* remove comment

* Update to Viash 0.9.0-RC4

* Ability to specify output globbing patterns

* Avoid the need for both output_dir and output

* Move fields from `info` to `links`

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Move references back to the info field

* apologies, I proposed a wrong syntax

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* update changelog

* update readme

* Update salmon quant arguments (#57)

* Make index an optional argument

* FIx argument type and add optional argument

* FEAT: add bedtools getfasta. (#59)

* FEAT: add bedtools getfasta.

* Add PR number to CHANGELOG

* Add star genomegenerate component (#58)

* Add star genomegenerate component

* Update changelog

* Rename component

* Update test

* Update CHANGELOG.md

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* fix package config (#65)

* Delete src/bgzip directory (#64)

It was moved to toolbox

* Output alignments to the transcriptome (#56)

* Output alignments to  the transcriptome

* Change argument name

* BUG: pear component failure is ignored (#70)

* FEAT + BUG: cutadapt; allowing disabling demultiplexing and fix par_quality_cutoff_r2 (#69)

* FEAT: Disable cutadapt demultiplexing by default

* Cutadapt: fix --par_quality_cutoff_r2

* FEAT: update busco to 5.7.1 (#72)

* FEAT: update busco to 5.7.1

* Typo

* Samtools fasta (#53)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* Fasta component

* change script resource to samtools_fastq script, with dummy argument to specify the command

* add dummy argument to samtools_fastq to share the script with samtools_fasta

* fix path to script in config

* Update src/samtools/samtools_fastq/script.sh

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Change default fields to examples

* Two more default fields changed to examples

* Minor formatting changes

* Markdown formatting changes in configs

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Umi tools dedup (#54)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* inital commit dedup

* Working component with one test

* Update test 1 and test data, fix some arg types in config and script

* test data files and changes to script

* Add third test and test data

* Fix typo in script

* remove utf8 characters in config

* Add choices fields and change default fields to exampels

* Minor formatting changes

* md formatting changes in config

* Fix typo (#79)

* add vscode to gitignore

* update multiple separator (#81)

* update multiple separator

* update changelog

* Update src/multiqc/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/multiqc/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/multiqc/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/multiqc/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* update ifs

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* add test data

* add tests

* update changelog

* remove unrequired test data

* update descriptions

* update changelog

* update help text

* Update src/qualimap/qualimap_rnaseq/script.sh

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* update unit tests

* update unit tests

* addres pr changes request

* add version

* remove whitespace multiqc

* Apply suggestions from code review

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* address pr comments

* Update CHANGELOG.md

* fix doi

* Fix name

* update version and container image

* write software version to file

---------

Co-authored-by: dorien-er <roosen.dorien@gmail.com>
Co-authored-by: Leila011 <leilapaquay@gmail.com>
Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: emmarousseau <emmarou1@icloud.com>
Co-authored-by: Sai Nirmayi Yasa <92786623+sainirmayi@users.noreply.github.com>
Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Co-authored-by: Dorien <41797896+dorien-er@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +
 src/qualimap/qualimap_rnaseq/config.vsh.yaml  | 103 ++++++++++++++++
 src/qualimap/qualimap_rnaseq/help.txt         |  52 ++++++++
 src/qualimap/qualimap_rnaseq/script.sh        |  50 ++++++++
 src/qualimap/qualimap_rnaseq/test.sh          | 112 ++++++++++++++++++
 src/qualimap/qualimap_rnaseq/test_data/a.bam  | Bin 0 -> 2447 bytes
 .../qualimap_rnaseq/test_data/annotation.gtf  |  10 ++
 .../qualimap_rnaseq/test_data/script.sh       |  10 ++
 8 files changed, 339 insertions(+)
 create mode 100644 src/qualimap/qualimap_rnaseq/config.vsh.yaml
 create mode 100644 src/qualimap/qualimap_rnaseq/help.txt
 create mode 100644 src/qualimap/qualimap_rnaseq/script.sh
 create mode 100755 src/qualimap/qualimap_rnaseq/test.sh
 create mode 100644 src/qualimap/qualimap_rnaseq/test_data/a.bam
 create mode 100644 src/qualimap/qualimap_rnaseq/test_data/annotation.gtf
 create mode 100755 src/qualimap/qualimap_rnaseq/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6bd21a1e..2f4c0c71 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -31,6 +31,8 @@
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
   - `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
+ 
+* `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74). 
 
 ## MINOR CHANGES
 
diff --git a/src/qualimap/qualimap_rnaseq/config.vsh.yaml b/src/qualimap/qualimap_rnaseq/config.vsh.yaml
new file mode 100644
index 00000000..ffc807ab
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/config.vsh.yaml
@@ -0,0 +1,103 @@
+name: qualimap_rnaseq
+namespace: qualimap
+keywords: [RNA-seq, quality control, QC Report]
+description: |
+  Qualimap RNA-seq QC reports quality control metrics and bias estimations 
+  which are specific for whole transcriptome sequencing, including reads genomic 
+  origin, junction analysis, transcript coverage and 5’-3’ bias computation.
+links:
+  homepage: http://qualimap.conesalab.org/
+  documentation: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc
+  issue_tracker: https://bitbucket.org/kokonech/qualimap/issues?status=new&status=open
+  repository: https://bitbucket.org/kokonech/qualimap/commits/branch/master
+references:
+  doi: 10.1093/bioinformatics/btv566
+license: GPL-2.0
+authors:
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: "Input"
+    arguments: 
+    - name: "--bam"
+      type: file
+      required: true
+      example: alignment.bam
+      description: Path to the sequence alignment file in BAM format, produced by a splicing-aware aligner.
+    - name: "--gtf"
+      type: file
+      required: true
+      example: annotations.gtf
+      description: Path to genomic annotations in Ensembl GTF format.
+
+  - name: "Output"
+    arguments: 
+    - name: "--qc_results"
+      direction: output
+      type: file
+      required: true
+      example: rnaseq_qc_results.txt
+      description: Text file containing the RNAseq QC results.
+    - name: "--counts"
+      type: file
+      required: false
+      direction: output
+      description: Output file for computed counts.
+    - name: "--report"
+      type: file
+      direction: output
+      required: false
+      example: report.html
+      description: Report output file. Supported formats are PDF or HTML.
+
+  - name: "Optional"
+    arguments: 
+    - name: "--num_pr_bases"
+      type: integer
+      required: false
+      min: 1
+      description: Number of upstream/downstream nucleotide bases to compute 5'-3' bias (default = 100).
+    - name: "--num_tr_bias"
+      type: integer
+      required: false
+      min: 1
+      description: Number of top highly expressed transcripts to compute 5'-3' bias (default = 1000).
+    - name: "--algorithm"
+      type: string
+      required: false
+      choices: ["uniquely-mapped-reads", "proportional"]
+      description: Counting algorithm (uniquely-mapped-reads (default) or proportional).
+    - name: "--sequencing_protocol"
+      type: string
+      required: false
+      choices: ["non-strand-specific", "strand-specific-reverse", "strand-specific-forward"]
+      description: Sequencing library protocol (strand-specific-forward, strand-specific-reverse or non-strand-specific (default)).
+    - name: "--paired"
+      type: boolean_true
+      description: Setting this flag for paired-end experiments will result in counting fragments instead of reads.
+    - name: "--sorted"
+      type: boolean_true
+      description: Setting this flag indicates that the input file is already sorted by name. If flag is not set, additional sorting by name will be performed. Only requiredfor paired-end analysis.
+    - name: "--java_memory_size"
+      type: string
+      required: false
+      description: maximum Java heap memory size, default = 4G.
+
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data/
+
+engines:
+  - type: docker
+    image: quay.io/biocontainers/qualimap:2.3--hdfd78af_0
+    setup:   
+      - type: docker
+        run: |
+          echo QualiMap: $(qualimap 2>&1 | grep QualiMap | sed 's/^.*QualiMap//') > /var/software_versions.txt
+runners: 
+  - type: executable
+  - type: nextflow
diff --git a/src/qualimap/qualimap_rnaseq/help.txt b/src/qualimap/qualimap_rnaseq/help.txt
new file mode 100644
index 00000000..c6493ed9
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/help.txt
@@ -0,0 +1,52 @@
+QualiMap v.2.3
+Built on 2023-05-19 16:57
+
+usage: qualimap <tool> [options]
+
+To launch GUI leave <tool> empty.
+
+Available tools:
+
+    bamqc            Evaluate NGS mapping to a reference genome
+    rnaseq           Evaluate RNA-seq alignment data
+    counts           Counts data analysis (further RNA-seq data evaluation)
+    multi-bamqc      Compare QC reports from multiple NGS mappings
+    clustering       Cluster epigenomic signals
+    comp-counts      Compute feature counts
+
+Special arguments: 
+
+    --java-mem-size  Use this argument to set Java memory heap size. Example:
+                     qualimap bamqc -bam very_large_alignment.bam --java-mem-size=4G
+                     
+usage: qualimap rnaseq [-a <arg>] -bam <arg> -gtf <arg> [-npb <arg>] [-ntb
+       <arg>] [-oc <arg>] [-outdir <arg>] [-outfile <arg>] [-outformat <arg>]
+       [-p <arg>] [-pe] [-s]
+ -a,--algorithm <arg>             Counting algorithm:
+                                  uniquely-mapped-reads(default) or
+                                  proportional.
+ -bam <arg>                       Input mapping file in BAM format.
+ -gtf <arg>                       Annotations file in Ensembl GTF format.
+ -npb,--num-pr-bases <arg>        Number of upstream/downstream nucleotide bases
+                                  to compute 5'-3' bias (default is 100).
+ -ntb,--num-tr-bias <arg>         Number of top highly expressed transcripts to
+                                  compute 5'-3' bias (default is 1000).
+ -oc <arg>                        Output file for computed counts. If only name
+                                  of the file is provided, then the file will be
+                                  saved in the output folder.
+ -outdir <arg>                    Output folder for HTML report and raw data.
+ -outfile <arg>                   Output file for PDF report (default value is
+                                  report.pdf).
+ -outformat <arg>                 Format of the output report (PDF, HTML or both
+                                  PDF:HTML, default is HTML).
+ -p,--sequencing-protocol <arg>   Sequencing library protocol:
+                                  strand-specific-forward,
+                                  strand-specific-reverse or non-strand-specific
+                                  (default)
+ -pe,--paired                     Setting this flag for paired-end experiments
+                                  will result in counting fragments instead of
+                                  reads
+ -s,--sorted                      This flag indicates that the input file is
+                                  already sorted by name. If not set, additional
+                                  sorting by name will be performed. Only
+                                  required for paired-end analysis.
\ No newline at end of file
diff --git a/src/qualimap/qualimap_rnaseq/script.sh b/src/qualimap/qualimap_rnaseq/script.sh
new file mode 100644
index 00000000..351e5159
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/script.sh
@@ -0,0 +1,50 @@
+#!/bin/bash
+
+set -eo pipefail
+
+tmp_dir=$(mktemp -d -p "$meta_temp_dir" qualimap_XXXXXXXXX)
+
+# Handle output parameters
+if [ -n "$par_report" ]; then
+    outfile=$(basename "$par_report")
+    report_extension="${outfile##*.}"
+fi
+
+if [ -n "$par_counts" ]; then
+    counts=$(basename "$par_counts")
+fi
+
+# disable flags
+[[ "$par_paired" == "false" ]] && unset par_paired
+[[ "$par_sorted" == "false" ]] && unset par_sorted
+
+# Run qualimap
+qualimap rnaseq \
+    ${meta_memory_mb:+--java-mem-size=${meta_memory_mb}M} \
+    ${par_algorithm:+--algorithm $par_algorithm} \
+    ${par_sequencing_protocol:+--sequencing-protocol $par_sequencing_protocol} \
+    -bam $par_bam \
+    -gtf $par_gtf \
+    -outdir "$tmp_dir" \
+    ${par_num_pr_bases:+--num-pr-bases $par_num_pr_bases} \
+    ${par_num_tr_bias:+--num-tr-bias $par_num_tr_bias} \
+    ${par_report:+-outformat $report_extension} \
+    ${par_paired:+--paired} \
+    ${par_sorted:+--sorted} \
+    ${par_report:+-outfile "$outfile"} \
+    ${par_counts:+-oc "$counts"}
+
+# Move output files
+mv "$tmp_dir/rnaseq_qc_results.txt" "$par_qc_results"
+
+if [ -n "$par_report" ] && [ $report_extension = "html" ]; then
+    mv "$tmp_dir/qualimapReport.html" "$par_report"
+fi
+
+if [ -n "$par_report" ] && [ $report_extension = "pdf" ]; then
+    mv "$tmp_dir/$outfile" "$par_report"
+fi
+
+if [ -n "$par_counts" ]; then
+    mv "$tmp_dir/$counts" "$par_counts"
+fi
diff --git a/src/qualimap/qualimap_rnaseq/test.sh b/src/qualimap/qualimap_rnaseq/test.sh
new file mode 100755
index 00000000..2e1b647b
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/test.sh
@@ -0,0 +1,112 @@
+set -e
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_doesnt_exist() {
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+#############################################
+
+
+test_dir="$meta_resources_dir/test_data"
+
+mkdir "run_qualimap_rnaseq_html"
+cd "run_qualimap_rnaseq_html"
+
+echo "> Running qualimap with html output report"
+
+"$meta_executable" \
+    --bam $test_dir/a.bam \
+    --gtf $test_dir/annotation.gtf \
+    --report report.html \
+    --counts counts.txt \
+    --qc_results output.txt
+
+echo ">> Checking output"
+assert_file_exists "report.html"
+assert_file_exists "counts.txt"
+assert_file_exists "output.txt"
+assert_file_doesnt_exist "report.pdf"
+
+echo ">> Checking if output is empty"
+assert_file_not_empty "report.html"
+assert_file_not_empty "counts.txt"
+assert_file_not_empty "output.txt"
+
+echo ">> Checking output contents"
+assert_file_contains "output.txt" ">>>>>>> Input"
+assert_file_contains "output.txt" ">>>>>>> Reads alignment"
+assert_file_contains "output.txt" ">>>>>>> Reads genomic origin"
+assert_file_contains "output.txt" ">>>>>>> Transcript coverage profile"
+assert_file_contains "output.txt" ">>>>>>> Junction analysis"
+assert_file_contains "output.txt" ">>>>>>> Transcript coverage profile"
+
+assert_file_contains "counts.txt" "ENSG00000125841.12"
+
+assert_file_contains "report.html" "<title>Qualimap report: RNA Seq QC</title>"
+assert_file_contains "report.html" "<h3>Input</h3>"
+assert_file_contains "report.html" "<h3>Reads alignment</h3>"
+assert_file_contains "report.html" "<h3>Reads genomic origin</h3>"
+assert_file_contains "report.html" "<h3>Transcript coverage profile</h3>"
+assert_file_contains "report.html" "<h3>Junction analysis</h3>"
+
+
+cd ..
+rm -r run_qualimap_rnaseq_html
+
+mkdir "run_qualimap_rnaseq_pdf"
+cd "run_qualimap_rnaseq_pdf"
+
+echo "> Running qualimap with pdf output report"
+
+"$meta_executable" \
+    --bam $test_dir/a.bam \
+    --gtf $test_dir/annotation.gtf \
+    --report report.pdf \
+    --counts counts.txt \
+    --qc_results output.txt
+
+echo ">> Checking output"
+assert_file_exists "report.pdf"
+assert_file_exists "counts.txt"
+assert_file_exists "output.txt"
+assert_file_doesnt_exist "report.html"
+
+echo ">> Checking if output is empty"
+assert_file_not_empty "report.pdf"
+assert_file_not_empty "counts.txt"
+assert_file_not_empty "output.txt"
+
+cd ..
+rm -r run_qualimap_rnaseq_pdf
+
+mkdir "run_qualimap_rnaseq"
+cd "run_qualimap_rnaseq"
+
+echo "> Running qualimap without report and counts output"
+
+"$meta_executable" \
+    --bam $test_dir/a.bam \
+    --gtf $test_dir/annotation.gtf \
+    --qc_results output.txt
+
+echo ">> Checking output"
+assert_file_doesnt_exist "report.pdf"
+assert_file_doesnt_exist "report.html"
+assert_file_doesnt_exist "counts.txt"
+assert_file_exists "output.txt"
+
+echo ">> Checking if output is empty"
+assert_file_not_empty "output.txt"
+
+cd ..
+rm -r run_qualimap_rnaseq
\ No newline at end of file
diff --git a/src/qualimap/qualimap_rnaseq/test_data/a.bam b/src/qualimap/qualimap_rnaseq/test_data/a.bam
new file mode 100644
index 0000000000000000000000000000000000000000..c8ea1065e89ca06cf12711850c36f85fba0d31b3
GIT binary patch
literal 2447
zcmV;A32^owiwFb&00000{{{d;LjnL`0CRHmWi(=7U~uqo;SBS$GSoBU4EDE5&d)DO
z$;?YEN#$|~4&)5>vr5h=GBV)w@v|~B0Rlrab1p`pE;b+r%P_EqUuOmY%wr|6OaK4?
zABzYC000000RIL6LPG)o5edy%O^9Si9e+KOiG~T1sx7fvv$$2fVM}u9e*H1sjo?;I
z5lhI2uzOI^AnR^U8Z~?JpzbRQErbQl3SK-2f`LWULr$WbKo(sQKk$-+AUU{t5Q(5D
zM$GuDneLvqJ^fzq?2aM-nVz0kJw5dA_y79+|8Jv}?b(+vZe*u-pQ7v8ce9K8N7*A!
zZ)60o>rQ7p=uNssf8ri39&`5WM?N|!Cf(k!nDjfN-lQ1!yQAzocnihjM;lqDESFJA
z<vkT8mC}0_qL+e3B~>g9vtGth8%dO8P7oi?Y}Hu%w&%^tE1R2pm+h@(w*2%7x6^VT
z+G**tJ2#$qe5due{aCm2)Xun@y<P#Gh-hVJ=R_oOUVCGA_YR5p@cmill@x?VSxO<K
zQeF}(m9SQ5Lb;MrQ0b(kUU{Vq*D~4}0n0!5=GFrEW!A2}nA-vY*&YGi>_1ba&wcj(
zOzobhtx6cYh_mOi`Y?b^y2D~Po)o=e)ZK$0ERW6yTUu_l6Utgfxm6}arFE&5V%7*9
zbPyInz;dQ*{NHZ4T59c@-}kKo|K<|-FV5i~I6qPH%x=B&m%>*7&p4yLv{o{1IU=4(
zVYFq&E6Y4Z*juYZutr%bbZz(_tF3_lmorZS{#Q;^z5jOeWajIOK+zox@ngF`niPor
z@gDr>Y1EXt2pZsU>zNH!TkZf_BmILGA{wHFBPP!g2I|z-QEb7lE7GDpe_8=QLEVRc
z;Qd6&m+Di0DSRc6)mjPQS0P&X*{BdCB{GO$ohhx0gbEg%$cg1X@>7RD?lK7XXf^O>
zmG9{vTxgx%{r$UI*{8A#)3a~=@IvcKLkk#o`;+dVKO9U(-F_!K1N1^ljsPIU86K^&
z(J`ca<YkHhf}$veir)LcMT}9DjMbIAdChC5zi4mmmHT$p-P7aQ`!{E=&3eVbKdxZ^
z$+YnIJLd-D?7tPH3H@4jb?*Gc$fbt5zkvQFG`?@9LMc3D3^8DQDUtQkbHkmGsTG!`
zQHD6_ot0D}|208KoY|U(@$z79-yQ_*g<lp3oL+7&94~N?{68+{(Vq>z)>7-A=Wu^(
zM!<QT{k*>P7rpLa(kpt~-AQk-U5uv5Z+tYuiyU<>7tZE6SGn{w2&z0)GT10s&OP|y
z!YFAU+B$42BK{5&qWWh*zXtScHOYT|4*%*LexhVtU;3BAx2!biex<OvK+cOCRlgL`
ztH@)CfafHZipmf})PWMM?5RT^^^Wb1q|g543iuy?@Cx96kylIqb(|g5`$jkwJQtKu
zW=ltSSrWntDTyWAF=dUj!Ej@gBVIB^{i*uTylD5p(G-4N>AyaQe^B}pB}etX&$U%a
z)iB}>dZ1+524bUOff=p6F+v&R1z6x#FqV^t4t>;>b{Dx(1Al$0{@?lX$u+h6(z~nb
z{}aDHxpv{f2K8T{=Z%YgZ_?`zI@{T0RKJ(6W!Z1>UcnFtP8jAB6^=>nyrRS?<z&#t
zqB%kz2m(dObRZ}cM-u>dYn2N4%Ox}<$on)LL_wnF+kbD+{tGm`x|U^^HnZ$=>siKt
z64hNY29*m96a-2GV@`pTkOIXh2Bi=*N4$V6zO)n`ztzw%*Fhf`frj_Jr&<fH;p}Zq
zLZCkc7yW)`(%tSB+k462N+=<~Le2MRhZND^OC25hzQC6=m=L+dFvJ*ivL;r%B~<XW
zJ@^;btF<6e^6w@kumpZ!9!e-`%@u^OD{W-l1Qr5~NPQx5Bk*9FJF_R1NqAGd-(CU#
zwd{Gozx8mEfiw8OYCXTU(=Y@KhM0YO<3WGY>-GozeeY97$|Ugt-vz>mg7S}wsu&HW
z0xJQo3Aa(93kEC;+>wTW<{H@a>I(P?>OTAf9hfNjX~Ph(41P(dvWiQVdY?|SZxVlm
zDHk*&%*sfJ$HRdgC<7|U<*hrmdzc4S3<0OETwgoAyK}Od`>){a;fCBVa8PUyN23WS
z*dKwk4$el*eoo^X1x^Paf+dj#fxK6O3X5d`bfQFrPz(x7o!@}7y<9Ua6Uh}@f&P^_
z`h(O@jQp`7^_QUMvJ~8urA5h8GBrBQOCL;*vhQi>koXRLfU3R9&xo~edi$%QpN&BF
zneMvUeaCyNng1-#u4YXG;CRyQY<D`7-mo*;7eT6&C6m}|LGd#Z3Fbd^xe_W6gE9BW
z_R>(oN)l8m4_>&H@Y}0aUw2{^_&=M@fW6+i;r2ZDpPdtMko$?0&+InMfQJcaKq?g1
z%3&NpKt!CC9>L+&vc`8jLx0s42$=bwOy$p?yDzMt-ffHfD*nGUd13w02kQMl7;I0v
zqak)+gYI~k{R40LdQw1FAO!Jja-}i!M2U)zmWN7@K}dRyOPc3M0&Ih$mY4!Vo|nj#
zcy|)us9aT;xB@0Nz(mnI*V!qGozc$VgGDcEwGIeKq@=yTY%7w8`td*OV`71n8~9h3
z@5?garz`PWqA?)l-eO!}`<su2jmOjuXGC3&v>R7s3skIwiqRek{n0!z{&4Fi664dN
zS{WYs)6MlSY}9Ll{&rf9bbDhoMXX2nhhuaF8Zjg}NwEOI04#Fu0(aKoR)Go<(lANn
z5U?K(m=k1eWw`t7*gSk$;ouib=7Iz@!C$Y`g+$3$U$55$i}3%3dEnAxS@ua}g?3I`
ztj2OnrL{ppD4-qUD9uBGgCCQpDFG{Tj7l~)<~aL9yY{$-hFKK6|6jiW4e_=pxPi03
zHtYzBAtdp5G#XBNy}_W^-w~w!uTh?9$qX?RpmAqpc4`k!6GxOX8IKEoa~D~m3~$Wg
zA4EZ-<V3@cU>SVif#U^c0D*-tP@+*Lq=Xrul{vE_Z3xoRTri0x$+5w2PMaF|uTG&p
zaN>7>{|%9p;knA$xdwI6>!oT~bnsGl8_n^|MjDDupcsTu39Q8J0Mx)d;0__a@_??1
zBNebIP-g4@U=@(xn}Qtmvo{tnu2uA3I{SWu09eGxP&5)w=dcHr+T&)Y(jq7ubZ+uM
zvHy_HVC+>A&rx6=^wDA*3diW?23yX+{{U)jjNb?z001A02m}BC000301^_}s0stET
N0{{R300000002N$thE3D

literal 0
HcmV?d00001

diff --git a/src/qualimap/qualimap_rnaseq/test_data/annotation.gtf b/src/qualimap/qualimap_rnaseq/test_data/annotation.gtf
new file mode 100644
index 00000000..976de753
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/test_data/annotation.gtf
@@ -0,0 +1,10 @@
+chr20	HAVANA	transcript	347024	354868	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	exon	347024	347142	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 1; exon_id "ENSE00001831391.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	exon	349249	349363	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 2; exon_id "ENSE00001491647.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	exon	349638	349832	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	CDS	349644	349832	.	+	0	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	start_codon	349644	349646	.	+	0	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	exon	353210	354868	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	CDS	353210	353632	.	+	0	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	stop_codon	353633	353635	.	+	0	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
+chr20	HAVANA	UTR	347024	347142	.	+	.	gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 1; exon_id "ENSE00001831391.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
diff --git a/src/qualimap/qualimap_rnaseq/test_data/script.sh b/src/qualimap/qualimap_rnaseq/test_data/script.sh
new file mode 100755
index 00000000..801fe405
--- /dev/null
+++ b/src/qualimap/qualimap_rnaseq/test_data/script.sh
@@ -0,0 +1,10 @@
+# qualimap test data
+
+# Test data was obtained from https://github.com/snakemake/snakemake-wrappers/raw/master/bio/qualimap/rnaseq/test
+
+if [ ! -d /tmp/snakemake-wrappers ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
+fi
+
+cp -r /tmp/snakemake-wrappers/bio/qualimap/rnaseq/test/mapped/a.bam src/qualimap/qualimap_rnaseq/test_data
+cp -r /tmp/snakemake-wrappers/bio/qualimap/rnaseq/test/annotation.gtf src/qualimap/qualimap_rnaseq/test_data

From c4ea23a0f508b93b31bb1a36418ad4868fdb5bc3 Mon Sep 17 00:00:00 2001
From: Sai Nirmayi Yasa <92786623+sainirmayi@users.noreply.github.com>
Date: Wed, 21 Aug 2024 17:31:32 +0200
Subject: [PATCH 02/42] Add RSEM prepare reference component (#89)

* initial commit

* incorporaate some requested changes

* update test

* change argument reference_fasta_files to multiple true and update docker setup

* Update src/rsem/rsem_prepare_reference/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/rsem/rsem_prepare_reference/config.vsh.yaml

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* Update src/rsem/rsem_prepare_reference/script.sh

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* set multiple true

* update changelog

* Apply suggestions from code review

* fix script

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   2 +
 .../rsem_prepare_reference/config.vsh.yaml    | 196 +++++++++++++++++
 src/rsem/rsem_prepare_reference/help.txt      | 207 ++++++++++++++++++
 src/rsem/rsem_prepare_reference/script.sh     |  42 ++++
 src/rsem/rsem_prepare_reference/test.sh       |  37 ++++
 5 files changed, 484 insertions(+)
 create mode 100644 src/rsem/rsem_prepare_reference/config.vsh.yaml
 create mode 100644 src/rsem/rsem_prepare_reference/help.txt
 create mode 100644 src/rsem/rsem_prepare_reference/script.sh
 create mode 100644 src/rsem/rsem_prepare_reference/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2f4c0c71..3e9f40fc 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -34,6 +34,8 @@
  
 * `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74). 
 
+* `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/rsem/rsem_prepare_reference/config.vsh.yaml b/src/rsem/rsem_prepare_reference/config.vsh.yaml
new file mode 100644
index 00000000..44915a2f
--- /dev/null
+++ b/src/rsem/rsem_prepare_reference/config.vsh.yaml
@@ -0,0 +1,196 @@
+name: rsem_prepare_reference
+namespace: rsem
+description: | 
+  RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. This component prepares transcript references for RSEM.
+keywords: ["Transcriptome", "Index"]
+links:
+  homepage: http://deweylab.github.io/RSEM
+  documentation: https://deweylab.github.io/RSEM/rsem-prepare-reference.html
+  repository: https://github.com/deweylab/RSEM
+references: 
+  doi: 10.1186/1471-2105-12-323
+license: GPL-3.0 
+requirements:
+  commands: [ rsem-prepare-reference ]
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --reference_fasta_files
+        type: file
+        description: | 
+          Semi-colon separated list of Multi-FASTA formatted files OR a directory name. If a directory name is specified, RSEM will read all files with suffix ".fa" or ".fasta" in this directory. The files should contain either the sequences of transcripts or an entire genome, depending on whether the '--gtf' option is used.
+        required: true
+        multiple: true
+        example: read1.fasta
+      - name: --reference_name
+        type: string
+        description: | 
+          The name of the reference used. RSEM will generate several reference-related files that are prefixed by this name. This name can contain path information (e.g. '/ref/mm9').
+        required: true
+        example: /ref/mm9 
+  
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        description: Directory containing reference files generated by RSEM. 
+        required: true
+        direction: output
+
+  - name: Other options
+    arguments: 
+      - name: --gtf
+        type: file
+        description: Assume that 'reference_fasta_files' contains the sequence of a genome, and extract transcript reference sequences using the gene annotations specified in the GTF file. If this and '--gff3' options are not provided, RSEM will assume 'reference_fasta_files' contains the reference transcripts. In this case, RSEM assumes that name of each sequence in the Multi-FASTA files is its transcript_id.
+        example: annotations.gtf
+      - name: --gff3
+        type: file
+        description: GFF3 annotation file. Converted to GTF format with the file name 'reference_name.gtf'. Please make sure that 'reference_name.gtf' does not exist. 
+        example: annotations.gff
+      - name: --gff3_rna_patterns
+        type: string
+        description: List of transcript categories (separated by semi-colon). Only transcripts that match the string will be extracted.
+        multiple: true
+        example: mRNA;rRNA
+      - name: --gff3_genes_as_transcripts
+        type: boolean_true
+        description: This option is designed for untypical organisms, such as viruses, whose GFF3 files only contain genes. RSEM will assume each gene as a unique transcript when it converts the GFF3 file into GTF format.
+      - name: --trusted_sources
+        type: string
+        description: List of trusted sources (separated by semi-colon). Only transcripts coming from these sources will be extracted. If this option is off, all sources are accepted.
+        multiple: true
+        example: ENSEMBL;HAVANA
+      - name: --transcript_to_gene_map
+        type: file
+        description: | 
+          Use information from this file to map from transcript (isoform) ids to gene ids. Each line of this file should be of the form: 
+            gene_id transcript_id
+          with the two fields separated by a tab character.
+          If you are using a GTF file for the "UCSC Genes" gene set from the UCSC Genome Browser, then the "knownIsoforms.txt" file (obtained from the "Downloads" section of the UCSC Genome Browser site) is of this format. 
+          If this option is off, then the mapping of isoforms to genes depends on whether the '--gtf' option is specified. If '--gtf' is specified, then RSEM uses the "gene_id" and "transcript_id" attributes in the GTF file. Otherwise, RSEM assumes that each sequence in the reference sequence files is a separate gene.
+        example: isoforms.txt
+      - name: --allele_to_gene_map 
+        type: file
+        description: |
+          Use information from <file> to provide gene_id and transcript_id information for each allele-specific transcript. Each line of <file> should be of the form:
+            gene_id transcript_id allele_id
+          with the fields separated by a tab character.
+          This option is designed for quantifying allele-specific expression. It is only valid if '--gtf' option is not specified. allele_id should be the sequence names presented in the Multi-FASTA-formatted files.
+      - name: --polyA
+        type: boolean_true
+        description: Add poly(A) tails to the end of all reference isoforms. The length of poly(A) tail added is specified by '--polyA-length' option. STAR aligner users may not want to use this option. 
+      - name: --polyA_length 
+        type: integer
+        description: The length of the poly(A) tails to be added. 
+        example: 125
+      - name: --no_polyA_subset 
+        type: file
+        description: Only meaningful if '--polyA' is specified. Do not add poly(A) tails to those transcripts listed in this file containing a list of transcript_ids.
+        example: transcript_ids.txt
+      - name: --bowtie
+        type: boolean_true
+        description: Build Bowtie indices. 
+      - name: --bowtie2
+        type: boolean_true
+        description: Build Bowtie 2 indices.
+      - name: --star
+        type: boolean_true
+        description: Build STAR indices.
+      - name: --star_sjdboverhang
+        type: integer
+        description: Length of the genomic sequence around annotated junction. It is only used for STAR to build splice junctions database and not needed for Bowtie or Bowtie2. It will be passed as the --sjdbOverhang option to STAR. According to STAR's manual, its ideal value is max(ReadLength)-1, e.g. for 2x101 paired-end reads, the ideal value is 101-1=100. In most cases, the default value of 100 will work as well as the ideal value. (Default is 100)
+        example: 100
+      - name: --hisat2_hca
+        type: boolean_true
+        description: Build HISAT2 indices on the transcriptome according to Human Cell Atlas (HCA) SMART-Seq2 pipeline.
+      - name: --quiet
+        alternatives: -q
+        type: boolean_true
+        description: Suppress the output of logging information. 
+  
+  - name: Prior-enhanced RSEM options
+    arguments: 
+      - name: --prep_pRSEM
+        type: boolean_true
+        description: A Boolean indicating whether to prepare reference files for pRSEM, including building Bowtie indices for a genome and selecting training set isoforms. The index files will be used for aligning ChIP-seq reads in prior-enhanced RSEM and the training set isoforms will be used for learning prior. A path to Bowtie executables and a mappability file in bigWig format are required when this option is on. Currently, Bowtie2 is not supported for prior-enhanced RSEM. 
+      - name: --mappability_bigwig_file 
+        type: file
+        description: Full path to a whole-genome mappability file in bigWig format. This file is required for running prior-enhanced RSEM. It is used for selecting a training set of isoforms for prior-learning. This file can be either downloaded from UCSC Genome Browser or generated by GEM (Derrien et al., 2012, PLoS One). 
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+    
+engines:
+- type: docker
+  image: ubuntu:22.04
+  setup:
+    - type: apt
+      packages: 
+        - build-essential 
+        - gcc 
+        - g++ 
+        - make 
+        - wget 
+        - zlib1g-dev 
+        - unzip xxd 
+        - perl 
+        - r-base
+        - bowtie2
+        - pip 
+        - git
+    - type: python
+      packages: bowtie
+    - type: docker
+      env: 
+        - STAR_VERSION=2.7.11b
+        - RSEM_VERSION=1.3.3
+        - BOWTIE_VERSION=1.3.1
+        - TZ=Europe/Brussels
+      run: |
+        ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \
+        cd /tmp && \
+        wget --no-check-certificate https://github.com/alexdobin/STAR/archive/refs/tags/${STAR_VERSION}.zip && \
+        unzip ${STAR_VERSION}.zip && \
+        cd STAR-${STAR_VERSION}/source && \
+        make STARstatic CXXFLAGS_SIMD=-std=c++11 && \
+        cp STAR /usr/local/bin && \
+        cd /tmp && \
+        wget --no-check-certificate https://github.com/deweylab/RSEM/archive/refs/tags/v${RSEM_VERSION}.zip && \
+        unzip v${RSEM_VERSION}.zip && \
+        cd RSEM-${RSEM_VERSION} && \
+        make && \
+        make install && \
+        cd /tmp && \
+        wget --no-check-certificate -O bowtie-${BOWTIE_VERSION}-linux-x86_64.zip https://sourceforge.net/projects/bowtie-bio/files/bowtie/${BOWTIE_VERSION}/bowtie-${BOWTIE_VERSION}-linux-x86_64.zip/download  && \
+        unzip bowtie-${BOWTIE_VERSION}-linux-x86_64.zip && \
+        cp bowtie-${BOWTIE_VERSION}-linux-x86_64/bowtie* /usr/local/bin && \
+        cd /tmp && \
+        git clone https://github.com/DaehwanKimLab/hisat2.git /tmp/hisat2 && \
+        cd /tmp/hisat2 && \
+        make && \
+        cp -r hisat2* /usr/local/bin && \
+        cd && \
+        rm -rf /tmp/STAR-${STAR_VERSION} /tmp/${STAR_VERSION}.zip /tmp/bowtie-${BOWTIE_VERSION}-linux-x86_64 /tmp/hisat2 && \
+        apt-get --purge autoremove -y ${PACKAGES} && \
+        apt-get clean 
+
+    - type: docker
+      run: |
+        echo "RSEM: `rsem-calculate-expression --version | sed -e 's/Current version: RSEM v//g'`" > /var/software_versions.txt && \
+        echo "STAR: `STAR --version`" >> /var/software_versions.txt && \
+        echo "bowtie2: `bowtie2 --version | grep -oP '\d+\.\d+\.\d+'`" >> /var/software_versions.txt && \
+        echo "bowtie: `bowtie --version | grep -oP 'bowtie-align-s version \K\d+\.\d+\.\d+'`" >> /var/software_versions.txt && \
+        echo "HISAT2: `hisat2 --version | grep -oP 'hisat2-align-s version \K\d+\.\d+\.\d+'`" >> /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/rsem/rsem_prepare_reference/help.txt b/src/rsem/rsem_prepare_reference/help.txt
new file mode 100644
index 00000000..c69899ec
--- /dev/null
+++ b/src/rsem/rsem_prepare_reference/help.txt
@@ -0,0 +1,207 @@
+```bash
+rsem-prepare-reference --help
+```
+
+NAME
+rsem-prepare-reference - Prepare transcript references for RSEM and optionally build BOWTIE/BOWTIE2/STAR/HISAT2(transcriptome) indices.
+
+SYNOPSIS
+ rsem-prepare-reference [options] reference_fasta_file(s) reference_name
+ARGUMENTS
+reference_fasta_file(s)
+Either a comma-separated list of Multi-FASTA formatted files OR a directory name. If a directory name is specified, RSEM will read all files with suffix ".fa" or ".fasta" in this directory. The files should contain either the sequences of transcripts or an entire genome, depending on whether the '--gtf' option is used.
+
+reference name
+The name of the reference used. RSEM will generate several reference-related files that are prefixed by this name. This name can contain path information (e.g. '/ref/mm9').
+
+OPTIONS
+--gtf <file>
+If this option is on, RSEM assumes that 'reference_fasta_file(s)' contains the sequence of a genome, and will extract transcript reference sequences using the gene annotations specified in <file>, which should be in GTF format.
+
+If this and '--gff3' options are off, RSEM will assume 'reference_fasta_file(s)' contains the reference transcripts. In this case, RSEM assumes that name of each sequence in the Multi-FASTA files is its transcript_id.
+
+(Default: off)
+
+--gff3 <file>
+The annotation file is in GFF3 format instead of GTF format. RSEM will first convert it to GTF format with the file name 'reference_name.gtf'. Please make sure that 'reference_name.gtf' does not exist. (Default: off)
+
+--gff3-RNA-patterns <pattern>
+<pattern> is a comma-separated list of transcript categories, e.g. "mRNA,rRNA". Only transcripts that match the <pattern> will be extracted. (Default: "mRNA")
+
+--gff3-genes-as-transcripts
+This option is designed for untypical organisms, such as viruses, whose GFF3 files only contain genes. RSEM will assume each gene as a unique transcript when it converts the GFF3 file into GTF format.
+
+--trusted-sources <sources>
+<sources> is a comma-separated list of trusted sources, e.g. "ENSEMBL,HAVANA". Only transcripts coming from these sources will be extracted. If this option is off, all sources are accepted. (Default: off)
+
+--transcript-to-gene-map <file>
+Use information from <file> to map from transcript (isoform) ids to gene ids. Each line of <file> should be of the form:
+
+gene_id transcript_id
+
+with the two fields separated by a tab character.
+
+If you are using a GTF file for the "UCSC Genes" gene set from the UCSC Genome Browser, then the "knownIsoforms.txt" file (obtained from the "Downloads" section of the UCSC Genome Browser site) is of this format.
+
+If this option is off, then the mapping of isoforms to genes depends on whether the '--gtf' option is specified. If '--gtf' is specified, then RSEM uses the "gene_id" and "transcript_id" attributes in the GTF file. Otherwise, RSEM assumes that each sequence in the reference sequence files is a separate gene.
+
+(Default: off)
+
+--allele-to-gene-map <file>
+Use information from <file> to provide gene_id and transcript_id information for each allele-specific transcript. Each line of <file> should be of the form:
+
+gene_id transcript_id allele_id
+
+with the fields separated by a tab character.
+
+This option is designed for quantifying allele-specific expression. It is only valid if '--gtf' option is not specified. allele_id should be the sequence names presented in the Multi-FASTA-formatted files.
+
+(Default: off)
+
+--polyA
+Add poly(A) tails to the end of all reference isoforms. The length of poly(A) tail added is specified by '--polyA-length' option. STAR aligner users may not want to use this option. (Default: do not add poly(A) tail to any of the isoforms)
+
+--polyA-length <int>
+The length of the poly(A) tails to be added. (Default: 125)
+
+--no-polyA-subset <file>
+Only meaningful if '--polyA' is specified. Do not add poly(A) tails to those transcripts listed in <file>. <file> is a file containing a list of transcript_ids. (Default: off)
+
+--bowtie
+Build Bowtie indices. (Default: off)
+
+--bowtie-path <path>
+The path to the Bowtie executables. (Default: the path to Bowtie executables is assumed to be in the user's PATH environment variable)
+
+--bowtie2
+Build Bowtie 2 indices. (Default: off)
+
+--bowtie2-path <path>
+The path to the Bowtie 2 executables. (Default: the path to Bowtie 2 executables is assumed to be in the user's PATH environment variable)
+
+--star
+Build STAR indices. (Default: off)
+
+--star-path <path>
+The path to STAR's executable. (Default: the path to STAR executable is assumed to be in user's PATH environment variable)
+
+--star-sjdboverhang <int>
+Length of the genomic sequence around annotated junction. It is only used for STAR to build splice junctions database and not needed for Bowtie or Bowtie2. It will be passed as the --sjdbOverhang option to STAR. According to STAR's manual, its ideal value is max(ReadLength)-1, e.g. for 2x101 paired-end reads, the ideal value is 101-1=100. In most cases, the default value of 100 will work as well as the ideal value. (Default: 100)
+
+--hisat2-hca
+Build HISAT2 indices on the transcriptome according to Human Cell Atlas (HCA) SMART-Seq2 pipeline. (Default: off)
+
+--hisat2-path <path>
+The path to the HISAT2 executables. (Default: the path to HISAT2 executables is assumed to be in the user's PATH environment variable)
+
+-p/--num-threads <int>
+Number of threads to use for building STAR's genome indices. (Default: 1)
+
+-q/--quiet
+Suppress the output of logging information. (Default: off)
+
+-h/--help
+Show help information.
+
+PRIOR-ENHANCED RSEM OPTIONS
+--prep-pRSEM
+A Boolean indicating whether to prepare reference files for pRSEM, including building Bowtie indices for a genome and selecting training set isoforms. The index files will be used for aligning ChIP-seq reads in prior-enhanced RSEM and the training set isoforms will be used for learning prior. A path to Bowtie executables and a mappability file in bigWig format are required when this option is on. Currently, Bowtie2 is not supported for prior-enhanced RSEM. (Default: off)
+
+--mappability-bigwig-file <string>
+Full path to a whole-genome mappability file in bigWig format. This file is required for running prior-enhanced RSEM. It is used for selecting a training set of isoforms for prior-learning. This file can be either downloaded from UCSC Genome Browser or generated by GEM (Derrien et al., 2012, PLoS One). (Default: "")
+
+DESCRIPTION
+This program extracts/preprocesses the reference sequences for RSEM and prior-enhanced RSEM. It can optionally build Bowtie indices (with '--bowtie' option) and/or Bowtie 2 indices (with '--bowtie2' option) using their default parameters. It can also optionally build STAR indices (with '--star' option) using parameters from ENCODE3's STAR-RSEM pipeline. For prior-enhanced RSEM, it can build Bowtie genomic indices and select training set isoforms (with options '--prep-pRSEM' and '--mappability-bigwig-file <string>'). If an alternative aligner is to be used, indices for that particular aligner can be built from either 'reference_name.idx.fa' or 'reference_name.n2g.idx.fa' (see OUTPUT for details). This program is used in conjunction with the 'rsem-calculate-expression' program.
+
+OUTPUT
+This program will generate 'reference_name.grp', 'reference_name.ti', 'reference_name.transcripts.fa', 'reference_name.seq', 'reference_name.chrlist' (if '--gtf' is on), 'reference_name.idx.fa', 'reference_name.n2g.idx.fa', optional Bowtie/Bowtie 2 index files, and optional STAR index files.
+
+'reference_name.grp', 'reference_name.ti', 'reference_name.seq', and 'reference_name.chrlist' are used by RSEM internally.
+
+'reference_name.transcripts.fa' contains the extracted reference transcripts in Multi-FASTA format. Poly(A) tails are not added and it may contain lower case bases in its sequences if the corresponding genomic regions are soft-masked.
+
+'reference_name.idx.fa' and 'reference_name.n2g.idx.fa' are used by aligners to build their own indices. In these two files, all sequence bases are converted into upper case. In addition, poly(A) tails are added if '--polyA' option is set. The only difference between 'reference_name.idx.fa' and 'reference_name.n2g.idx.fa' is that 'reference_name.n2g.idx.fa' in addition converts all 'N' characters to 'G' characters. This conversion is in particular desired for aligners (e.g. Bowtie) that do not allow reads to overlap with 'N' characters in the reference sequences. Otherwise, 'reference_name.idx.fa' should be used to build the aligner's index files. RSEM uses 'reference_name.idx.fa' to build Bowtie 2 indices and 'reference_name.n2g.idx.fa' to build Bowtie indices. For visualizing the transcript-coordinate-based BAM files generated by RSEM in IGV, 'reference_name.idx.fa' should be imported as a "genome" (see Visualization section in README.md for details).
+
+If the whole genome is indexed for prior-enhanced RSEM, all the index files will be generated with prefix as 'reference_name_prsem'. Selected isoforms for training set are listed in the file 'reference_name_prsem.training_tr_crd'
+
+EXAMPLES
+1) Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the directory '/data/mm9'. We want to put the generated reference files under '/ref' with name 'mouse_0'. We do not add any poly(A) tails. Please note that GTF files generated from UCSC's Table Browser do not contain isoform-gene relationship information. For the UCSC Genes annotation, this information can be obtained from the knownIsoforms.txt file. Suppose we want to build Bowtie indices and Bowtie executables are found in '/sw/bowtie'.
+
+There are two ways to write the command:
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --bowtie \
+                        --bowtie-path /sw/bowtie \                  
+                        /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
+                        /ref/mouse_0
+OR
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --bowtie \
+                        --bowtie-path /sw/bowtie \
+                        /data/mm9 \
+                        /ref/mouse_0
+2) Suppose we also want to build Bowtie 2 indices in the above example and Bowtie 2 executables are found in '/sw/bowtie2', the command will be:
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --bowtie \
+                        --bowtie-path /sw/bowtie \
+                        --bowtie2 \
+                        --bowtie2-path /sw/bowtie2 \
+                        /data/mm9 \
+                        /ref/mouse_0
+3) Suppose we want to build STAR indices in the above example and save index files under '/ref' with name 'mouse_0'. Assuming STAR executable is '/sw/STAR', the command will be:
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --star \
+                        --star-path /sw/STAR \
+                        -p 8 \
+                        /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
+                        /ref/mouse_0
+OR
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --star \
+                        --star-path /sw/STAR \
+                        -p 8 \
+                        /data/mm9
+                        /ref/mouse_0
+STAR genome index files will be saved under '/ref/'.
+
+4) Suppose we want to prepare references for prior-enhanced RSEM in the above example. In this scenario, both STAR and Bowtie are required to build genomic indices - STAR for RNA-seq reads and Bowtie for ChIP-seq reads. Assuming their executables are under '/sw/STAR' and '/sw/Bowtie', respectively. Also, assuming the mappability file for mouse genome is '/data/mm9.bigWig'. The command will be:
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --star \
+                        --star-path /sw/STAR \
+                        -p 8 \
+                        --prep-pRSEM \
+                        --bowtie-path /sw/Bowtie \
+                        --mappability-bigwig-file /data/mm9.bigWig \
+                        /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
+                        /ref/mouse_0
+OR
+
+ rsem-prepare-reference --gtf mm9.gtf \
+                        --transcript-to-gene-map knownIsoforms.txt \
+                        --star \
+                        --star-path /sw/STAR \
+                        -p 8 \
+                        --prep-pRSEM \
+                        --bowtie-path /sw/Bowtie \
+                        --mappability-bigwig-file /data/mm9.bigWig \
+                        /data/mm9
+                        /ref/mouse_0
+Both STAR and Bowtie's index files will be saved under '/ref/'. Bowtie files will have name prefix 'mouse_0_prsem'
+
+5) Suppose we only have transcripts from EST tags stored in 'mm9.fasta' and isoform-gene information stored in 'mapping.txt'. We want to add 125bp long poly(A) tails to all transcripts. The reference_name is set as 'mouse_125'. In addition, we do not want to build Bowtie/Bowtie 2 indices, and will use an alternative aligner to align reads against either 'mouse_125.idx.fa' or 'mouse_125.idx.n2g.fa':
+
+ rsem-prepare-reference --transcript-to-gene-map mapping.txt \
+                        --polyA
+                        mm9.fasta \
+                        mouse_125
\ No newline at end of file
diff --git a/src/rsem/rsem_prepare_reference/script.sh b/src/rsem/rsem_prepare_reference/script.sh
new file mode 100644
index 00000000..806804d8
--- /dev/null
+++ b/src/rsem/rsem_prepare_reference/script.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+
+set -eo pipefail
+
+unset_if_false=( par_gff3_genes_as_transcripts par_polyA par_bowtie par_bowtie2 par_star par_hisat2_hca par_quiet par_prep_pRSEM )
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# replace ';' with ','
+par_reference_fasta_files=$(echo $par_reference_fasta_files | tr ';' ',')
+par_gff3_rna_patterns=$(echo $par_gff3_rna_patterns | tr ';' ',')
+par_trusted_sources=$(echo $par_trusted_sources | tr ';' ',')
+
+echo "$par_reference_fasta_files"
+rsem-prepare-reference \
+    ${par_gtf:+--gtf "${par_gtf}"} \
+    ${par_gff3:+--gff3 "${par_gff3}"} \
+    ${par_gff3_rna_patterns:+--gff3-RNA-patterns "${par_gff3_rna_patterns}"} \
+    ${par_gff3_genes_as_transcripts:+--gff3-genes-as-transcripts "${par_gff3_genes_as_transcripts}"} \
+    ${par_trusted_sources:+--trusted-sources "${par_trusted_sources}"} \
+    ${par_transcript_to_gene_map:+--transcript-to-gene-map "${par_transcript_to_gene_map}"} \
+    ${par_allele_to_gene_map:+--allele-to-gene-map "${par_allele_to_gene_map}"} \
+    ${par_polyA:+--polyA} \
+    ${par_polyA_length:+--polyA-length "${par_polyA_length}"} \
+    ${par_no_polyA_subset:+--no-polyA-subset "${par_no_polyA_subset}"} \
+    ${par_bowtie:+--bowtie} \
+    ${par_bowtie2:+--bowtie2} \
+    ${par_star:+--star} \
+    ${par_star_sjdboverhang:+--star-sjdboverhang "${par_star_sjdboverhang}"} \
+    ${par_hisat2_hca:+--hisat2-hca} \
+    ${par_quiet:+--quiet} \
+    ${par_prep_pRSEM:+--prep-pRSEM} \
+    ${par_mappability_bigwig_file:+--mappability-bigwig-file "${par_mappability_bigwig_file}"} \
+    ${meta_cpus:+--num-threads "${meta_cpus}"} \
+    "${par_reference_fasta_files}" \
+    "${par_reference_name}"
+
+mkdir -p "${par_output}"
+mv ${par_reference_name}.* "${par_output}/"
diff --git a/src/rsem/rsem_prepare_reference/test.sh b/src/rsem/rsem_prepare_reference/test.sh
new file mode 100644
index 00000000..b38dd0a9
--- /dev/null
+++ b/src/rsem/rsem_prepare_reference/test.sh
@@ -0,0 +1,37 @@
+
+#!/bin/bash
+
+set -e pipefail
+
+echo ">>> Testing $meta_functionality_name"
+
+cat > genome.fasta <<'EOF'
+>Sheila
+GCTAGCTCAGAAAAaaaNNN
+EOF
+
+echo ">>> Prepare RSEM reference without gene annotations"
+"$meta_executable" \
+  --reference_fasta_files genome.fasta \
+  --reference_name test \
+  --output RSEM_index
+
+echo ">>> Checking whether output files exist"
+[ ! -d "RSEM_index" ] && echo "RSEM index does not exist!" && exit 1
+[ ! -f "RSEM_index/test.grp" ] && echo "test.grp does not exist!" && exit 1
+[ ! -f "RSEM_index/test.n2g.idx.fa" ] && echo "test.n2g.idx.fa does not exist!" && exit 1
+[ ! -f "RSEM_index/test.ti" ] && echo "test.ti does not exist!" && exit 1
+[ ! -f "RSEM_index/test.idx.fa" ] && echo "test.idx.fa does not exist!" && exit 1
+[ ! -f "RSEM_index/test.seq" ] && echo "test.seq does not exist!" && exit 1
+[ ! -f "RSEM_index/test.transcripts.fa" ] && echo "test.transcripts.fa does not exist!" && exit 1
+
+echo ">>> Checking whether output is correct"
+[ ! -s "RSEM_index/test.grp" ] && echo "test.grp is empty!" && exit 1
+[ ! -s "RSEM_index/test.ti" ] && echo "test.ti is empty!" && exit 1
+[ ! -s "RSEM_index/test.seq" ] && echo "test.seq is empty!" && exit 1
+grep -q "GCTAGCTCAGAAAAaaaNNN" "RSEM_index/test.transcripts.fa" || { echo "The content of file 'test.transcripts.fa' seems to be incorrect." && exit 1; }
+grep -q "GCTAGCTCAGAAAAAAANNN" "RSEM_index/test.idx.fa" || { echo "The content of file 'test.idx.fa' seems to be incorrect." && exit 1; }
+grep -q "GCTAGCTCAGAAAAAAAGGG" "RSEM_index/test.n2g.idx.fa" || { echo "The content of file 'test.n2g.idx.fa' seems to be incorrect." && exit 1; }
+
+echo "All tests succeeded!"
+exit 0

From 2d0a990cac4bf2d194ba9c610e00cc99b1c2c4c5 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Mon, 2 Sep 2024 14:41:55 +0200
Subject: [PATCH 03/42] Bedtools merge (#118)

* Initial Commit

* Script file

* strand option tests

* -bed option test

* distance option test

* all test implemented

* Update CHANGELOG.md

* Update config.vsh.yaml

* adding more links

* exit on error

* suggested changes

* working on suggested changes

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                  |   1 +
 src/bedtools/bedtools_merge/config.vsh.yaml   | 160 +++++++++++++
 src/bedtools/bedtools_merge/help.txt          |  85 +++++++
 src/bedtools/bedtools_merge/script.sh         |  35 +++
 src/bedtools/bedtools_merge/test.sh           | 222 ++++++++++++++++++
 .../bedtools_merge/test_data/feature.bam      | Bin 0 -> 287 bytes
 6 files changed, 503 insertions(+)
 create mode 100644 src/bedtools/bedtools_merge/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_merge/help.txt
 create mode 100644 src/bedtools/bedtools_merge/script.sh
 create mode 100644 src/bedtools/bedtools_merge/test.sh
 create mode 100644 src/bedtools/bedtools_merge/test_data/feature.bam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3e9f40fc..8c1af805 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -29,6 +29,7 @@
 * `bedtools`:
   - `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
+  - `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
   - `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
  
diff --git a/src/bedtools/bedtools_merge/config.vsh.yaml b/src/bedtools/bedtools_merge/config.vsh.yaml
new file mode 100644
index 00000000..45e4a01d
--- /dev/null
+++ b/src/bedtools/bedtools_merge/config.vsh.yaml
@@ -0,0 +1,160 @@
+name: bedtools_merge
+namespace: bedtools
+description: | 
+  Merges overlapping BED/GFF/VCF entries into a single interval.
+links:
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/merge.html
+  repository: https://github.com/arq5x/bedtools2
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input file (BED/GFF/VCF) to be merged.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        direction: output
+        description: Output merged file BED to be written.
+        required: true
+
+  - name: Options
+    arguments:
+      - name: --strand
+        alternatives: -s
+        type: boolean_true
+        description: | 
+          Force strandedness. That is, only merge features
+          that are on the same strand.
+          - By default, merging is done without respect to strand.
+
+      - name: --specific_strand
+        alternatives: -S
+        type: string
+        choices: ["+", "-"]
+        description: | 
+          Force merge for one specific strand only.
+          Follow with + or - to force merge from only
+          the forward or reverse strand, respectively.
+          - By default, merging is done without respect to strand.
+
+      - name: --distance
+        alternatives: -d
+        type: integer
+        description: | 
+          Maximum distance between features allowed for features
+          to be merged.
+          - Def. 0. That is, overlapping & book-ended features are merged.
+          - (INTEGER)
+          - Note: negative values enforce the number of b.p. required for overlap.
+
+      - name: --columns
+        alternatives: -c
+        type: integer
+        description: | 
+          Specify columns from the B file to map onto intervals in A.
+          Default: 5.
+          Multiple columns can be specified in a comma-delimited list.
+
+      - name: --operation
+        alternatives: -o
+        type: string
+        description: | 
+          Specify the operation that should be applied to -c.
+          Valid operations:
+              sum, min, max, absmin, absmax,
+              mean, median, mode, antimode
+              stdev, sstdev
+              collapse (i.e., print a delimited list (duplicates allowed)), 
+              distinct (i.e., print a delimited list (NO duplicates allowed)), 
+              distinct_sort_num (as distinct, sorted numerically, ascending),
+              distinct_sort_num_desc (as distinct, sorted numerically, desscending),
+              distinct_only (delimited list of only unique values),
+              count
+              count_distinct (i.e., a count of the unique values in the column), 
+              first (i.e., just the first value in the column), 
+              last (i.e., just the last value in the column), 
+          Default: sum
+          Multiple operations can be specified in a comma-delimited list.
+
+          If there is only column, but multiple operations, all operations will be
+          applied on that column. Likewise, if there is only one operation, but
+          multiple columns, that operation will be applied to all columns.
+          Otherwise, the number of columns must match the the number of operations,
+          and will be applied in respective order.
+          E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5,
+          the mean of column 4, and the count of column 6.
+          The order of output columns will match the ordering given in the command.
+      
+      - name: --delimiter
+        alternatives: -delim
+        type: string
+        description: | 
+          Specify a custom delimiter for the collapse operations.
+        example: "|"
+        default: ","
+
+      - name: --precision
+        alternatives: -prec
+        type: integer
+        description: | 
+          Sets the decimal precision for output (Default: 5).
+      
+      - name: --bed
+        type: boolean_true
+        description: | 
+          If using BAM input, write output as BED.
+
+      - name: --header
+        type: boolean_true
+        description: | 
+          Print the header from the A file prior to results.
+
+      - name: --no_buffer
+        alternatives: -nobuf
+        type: boolean_true
+        description: | 
+          Disable buffered output. Using this option will cause each line
+          of output to be printed as it is generated, rather than saved
+          in a buffer. This will make printing large output files 
+          noticeably slower, but can be useful in conjunction with
+          other software tools and scripts that need to process one
+          line of bedtools output at a time.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/bedtools/bedtools_merge/help.txt b/src/bedtools/bedtools_merge/help.txt
new file mode 100644
index 00000000..bc78fc67
--- /dev/null
+++ b/src/bedtools/bedtools_merge/help.txt
@@ -0,0 +1,85 @@
+```bash
+bedtools merge
+```
+
+Tool:    bedtools merge (aka mergeBed)
+Version: v2.30.0
+Summary: Merges overlapping BED/GFF/VCF entries into a single interval.
+
+Usage:   bedtools merge [OPTIONS] -i <bed/gff/vcf>
+
+Options: 
+	-s	Force strandedness.  That is, only merge features
+		that are on the same strand.
+		- By default, merging is done without respect to strand.
+
+	-S	Force merge for one specific strand only.
+		Follow with + or - to force merge from only
+		the forward or reverse strand, respectively.
+		- By default, merging is done without respect to strand.
+
+	-d	Maximum distance between features allowed for features
+		to be merged.
+		- Def. 0. That is, overlapping & book-ended features are merged.
+		- (INTEGER)
+		- Note: negative values enforce the number of b.p. required for overlap.
+
+	-c	Specify columns from the B file to map onto intervals in A.
+		Default: 5.
+		Multiple columns can be specified in a comma-delimited list.
+
+	-o	Specify the operation that should be applied to -c.
+		Valid operations:
+		    sum, min, max, absmin, absmax,
+		    mean, median, mode, antimode
+		    stdev, sstdev
+		    collapse (i.e., print a delimited list (duplicates allowed)), 
+		    distinct (i.e., print a delimited list (NO duplicates allowed)), 
+		    distinct_sort_num (as distinct, sorted numerically, ascending),
+		    distinct_sort_num_desc (as distinct, sorted numerically, desscending),
+		    distinct_only (delimited list of only unique values),
+		    count
+		    count_distinct (i.e., a count of the unique values in the column), 
+		    first (i.e., just the first value in the column), 
+		    last (i.e., just the last value in the column), 
+		Default: sum
+		Multiple operations can be specified in a comma-delimited list.
+
+		If there is only column, but multiple operations, all operations will be
+		applied on that column. Likewise, if there is only one operation, but
+		multiple columns, that operation will be applied to all columns.
+		Otherwise, the number of columns must match the the number of operations,
+		and will be applied in respective order.
+		E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5,
+		the mean of column 4, and the count of column 6.
+		The order of output columns will match the ordering given in the command.
+
+
+	-delim	Specify a custom delimiter for the collapse operations.
+		- Example: -delim "|"
+		- Default: ",".
+
+	-prec	Sets the decimal precision for output (Default: 5)
+
+	-bed	If using BAM input, write output as BED.
+
+	-header	Print the header from the A file prior to results.
+
+	-nobuf	Disable buffered output. Using this option will cause each line
+		of output to be printed as it is generated, rather than saved
+		in a buffer. This will make printing large output files 
+		noticeably slower, but can be useful in conjunction with
+		other software tools and scripts that need to process one
+		line of bedtools output at a time.
+
+	-iobuf	Specify amount of memory to use for input buffer.
+		Takes an integer argument. Optional suffixes K/M/G supported.
+		Note: currently has no effect with compressed files.
+
+Notes: 
+	(1) The input file (-i) file must be sorted by chrom, then start.
+
+
+
+
+***** ERROR: No input file given. Exiting. *****
diff --git a/src/bedtools/bedtools_merge/script.sh b/src/bedtools/bedtools_merge/script.sh
new file mode 100644
index 00000000..db50dd83
--- /dev/null
+++ b/src/bedtools/bedtools_merge/script.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_strand
+    par_bed
+    par_header
+    par_no_buffer
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Execute bedtools merge with the provided arguments
+bedtools merge \
+    ${par_strand:+-s} \
+    ${par_specific_strand:+-S "$par_specific_strand"} \
+    ${par_bed:+-bed} \
+    ${par_header:+-header} \
+    ${par_no_buffer:+-nobuf} \
+    ${par_distance:+-d "$par_distance"} \
+    ${par_columns:+-c "$par_columns"} \
+    ${par_operation:+-o "$par_operation"} \
+    ${par_delimiter:+-delim "$par_delimiter"} \
+    ${par_precision:+-prec "$par_precision"} \
+    -i "$par_input" \
+    > "$par_output"
diff --git a/src/bedtools/bedtools_merge/test.sh b/src/bedtools/bedtools_merge/test.sh
new file mode 100644
index 00000000..e2b46c15
--- /dev/null
+++ b/src/bedtools/bedtools_merge/test.sh
@@ -0,0 +1,222 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+## VIASH START
+meta_executable="target/executable/bedtools/bedtools_sort/bedtools_merge"
+meta_resources_dir="src/bedtools/bedtools_merge"
+## VIASH END
+
+# directory of the bam file
+test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create and populate example files
+printf "chr1\t100\t200\nchr1\t150\t250\nchr1\t300\t400\n" > "$TMPDIR/featureA.bed"
+printf "chr1\t100\t200\ta1\t1\t+\nchr1\t180\t250\ta2\t2\t+\nchr1\t250\t500\ta3\t3\t-\nchr1\t501\t1000\ta4\t4\t+\n" > "$TMPDIR/featureB.bed"
+printf "chr1\t100\t200\ta1\t1.9\t+\nchr1\t180\t250\ta2\t2.5\t+\nchr1\t250\t500\ta3\t3.3\t-\nchr1\t501\t1000\ta4\t4\t+\n" > "$TMPDIR/feature_precision.bed"
+
+# Create and populate feature.gff file
+printf "##gff-version 3\n" > "$TMPDIR/feature.gff"
+printf "chr1\t.\tgene\t1000\t2000\t.\t+\t.\tID=gene1;Name=Gene1\n" >> "$TMPDIR/feature.gff"
+printf "chr1\t.\texon\t1000\t1200\t.\t+\t.\tID=exon1;Parent=transcript1\n" >> "$TMPDIR/feature.gff"
+printf "chr1\t.\tCDS\t1000\t1200\t.\t+\t0\tID=cds1;Parent=transcript1\n" >> "$TMPDIR/feature.gff"
+printf "chr1\t.\tCDS\t1500\t1700\t.\t+\t2\tID=cds2;Parent=transcript1\n" >> "$TMPDIR/feature.gff"
+printf "chr2\t.\texon\t1500\t1700\t.\t+\t.\tID=exon2;Parent=transcript1\n" >> "$TMPDIR/feature.gff"
+printf "chr3\t.\tmRNA\t1000\t2000\t.\t+\t.\tID=transcript1;Parent=gene1\n" >> "$TMPDIR/feature.gff"
+
+# Create expected output files
+printf "chr1\t100\t250\nchr1\t300\t400\n" > "$TMPDIR/expected.bed"
+printf "chr1\t100\t250\nchr1\t250\t500\nchr1\t501\t1000\n" > "$TMPDIR/expected_strand.bed"
+printf "chr1\t100\t250\nchr1\t501\t1000\n" > "$TMPDIR/expected_specific_strand.bed"
+printf "chr1\t128\t228\nchr1\t428\t528\n" > "$TMPDIR/expected_bam.bed"
+printf "chr1\t100\t400\n" > "$TMPDIR/expected_distance.bed"
+printf "chr1\t100\t500\t2\t1\t3\nchr1\t501\t1000\t4\t4\t4\n" > "$TMPDIR/expected_operation.bed"
+printf "chr1\t100\t500\ta1|a2|a3\nchr1\t501\t1000\ta4\n" > "$TMPDIR/expected_delim.bed"
+printf "chr1\t100\t500\t2.567\nchr1\t501\t1000\t4\n" > "$TMPDIR/expected_precision.bed"
+printf "##gff-version 3\nchr1\t999\t2000\nchr2\t1499\t1700\nchr3\t999\t2000\n" > "$TMPDIR/expected_header.bed"
+
+# Test 1: Default sort on BED file
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools_merge on BED file"
+"$meta_executable" \
+  --input "../featureA.bed" \
+  --output "output.bed"
+
+# # checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected.bed"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: strand option
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools_merge on BED file with strand option"
+"$meta_executable" \
+  --input "../featureB.bed" \
+  --output "output.bed" \
+  --strand
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_strand.bed"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: specific strand option
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bedtools_merge on BED file with specific strand option"
+"$meta_executable" \
+  --input "../featureB.bed" \
+  --output "output.bed" \
+  --specific_strand "+" 
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_specific_strand.bed"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: BED option
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bedtools_merge on BAM file with BED option"
+"$meta_executable" \
+  --input "$test_data/feature.bam" \
+  --output "output.bed" \
+  --bed
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_bam.bed"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: distance option
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bedtools_merge on BED file with distance option"
+"$meta_executable" \
+  --input "../featureA.bed" \
+  --output "output.bed" \
+  --distance -5 
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected.bed"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: columns option & operation option
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bedtools_merge on BED file with columns & operation options"
+"$meta_executable" \
+  --input "../featureB.bed" \
+  --output "output.bed" \
+  --columns 5 \
+  --operation "mean,min,max"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_operation.bed"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: delimeter option
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bedtools_merge on BED file with delimeter option"
+"$meta_executable" \
+  --input "../featureB.bed" \
+  --output "output.bed" \
+  --columns 4 \
+  --operation "collapse" \
+  --delimiter "|"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_delim.bed"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: precision option
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bedtools_merge on BED file with precision option"
+"$meta_executable" \
+  --input "../feature_precision.bed" \
+  --output "output.bed" \
+  --columns 5 \
+  --operation "mean" \
+  --precision 4
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_precision.bed"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+# Test 9: header option
+mkdir "$TMPDIR/test9" && pushd "$TMPDIR/test9" > /dev/null
+
+echo "> Run bedtools_merge on GFF file with header option"
+"$meta_executable" \
+  --input "../feature.gff" \
+  --output "output.gff" \
+  --header
+
+# checks
+assert_file_exists "output.gff"
+assert_file_not_empty "output.gff"
+assert_identical_content "output.gff" "../expected_header.bed"
+echo "- test9 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
diff --git a/src/bedtools/bedtools_merge/test_data/feature.bam b/src/bedtools/bedtools_merge/test_data/feature.bam
new file mode 100644
index 0000000000000000000000000000000000000000..3d56a6317ba2f31f1df17f2f4247a9ad8a0585ae
GIT binary patch
literal 287
zcmb2|=3rp}f&Xj_PR>jWyBLa#zNFqcap1s%2M-TPK1)wsk$yn(O@8RC@Hz1zlV%-y
zD)8Xk%a=({pS%*9G=F}u%={^{geJY8GJ`uvIxKBTdd`CM15X8HPDs8<pE@Tn<-|*g
zoT>9>O`kt|Rd`U~p_h--Q&W->Gqci?Qd*Xrk(h0y?ChN!tn96l>NevPkFIWyj<2t;
z?#>yFeQB&o6V~$N9AEd*(efD2)ZkBvGRkb<n8kN8uC=zi_XOx-c{HcHGh9w?Xkb%#
z_Upao5ss(-+nEHcw;S!#Fg0|&KjC0wZ=)mssw7@zpLPa6S+@jPh95_57pGt`N1A~d
I6d+&#0J{lptpET3

literal 0
HcmV?d00001


From 7269ae4e7b4d4aa2b8e1631a216a5531eb7165b6 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Mon, 2 Sep 2024 14:42:44 +0200
Subject: [PATCH 04/42] Bedtools links (#137)

* Initial Commit

* Tests

* Adding help file

* Adding more description

* Update test.sh

* Update help.txt

* Update CHANGELOG.md
---
 CHANGELOG.md                                |  1 +
 src/bedtools/bedtools_links/config.vsh.yaml | 91 +++++++++++++++++++
 src/bedtools/bedtools_links/help.txt        | 25 ++++++
 src/bedtools/bedtools_links/script.sh       | 14 +++
 src/bedtools/bedtools_links/test.sh         | 98 +++++++++++++++++++++
 5 files changed, 229 insertions(+)
 create mode 100644 src/bedtools/bedtools_links/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_links/help.txt
 create mode 100644 src/bedtools/bedtools_links/script.sh
 create mode 100644 src/bedtools/bedtools_links/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8c1af805..6dda7ab4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -32,6 +32,7 @@
   - `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
   - `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
+  - `bedtools/bedtools_links`: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).
  
 * `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74). 
 
diff --git a/src/bedtools/bedtools_links/config.vsh.yaml b/src/bedtools/bedtools_links/config.vsh.yaml
new file mode 100644
index 00000000..b4e43cd3
--- /dev/null
+++ b/src/bedtools/bedtools_links/config.vsh.yaml
@@ -0,0 +1,91 @@
+name: bedtools_links
+namespace: bedtools
+description: | 
+  Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a file. 
+  This is useful for cases when one wants to manually inspect through a large set of annotations or features.
+keywords: [Links, BED, GFF, VCF]
+links:
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/links.html
+  repository: https://github.com/arq5x/bedtools2
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input file (bed/gff/vcf).
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        type: file
+        direction: output
+        description: Output HTML file to be written.
+
+  - name: Options
+    description: |
+      By default, the links created will point to human (hg18) UCSC browser.
+      If you have a local mirror, you can override this behavior by supplying
+      the -base, -org, and -db options.
+
+      For example, if the URL of your local mirror for mouse MM9 is called: 
+      http://mymirror.myuniversity.edu, then you would use the following:
+      --base_url http://mymirror.myuniversity.edu
+      --organism mouse
+      --database mm9
+    arguments:
+      - name: --base_url
+        alternatives: -base
+        type: string
+        description: | 
+          The “basename” for the UCSC browser.
+        default: http://genome.ucsc.edu
+      
+      - name: --organism
+        alternatives: -org
+        type: string
+        description: | 
+          The organism (e.g. mouse, human). 
+        default: human
+
+      - name: --database
+        alternatives: -db
+        type: string
+        description: | 
+          The genome build. 
+        default: hg18
+      
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bedtools/bedtools_links/help.txt b/src/bedtools/bedtools_links/help.txt
new file mode 100644
index 00000000..d848d989
--- /dev/null
+++ b/src/bedtools/bedtools_links/help.txt
@@ -0,0 +1,25 @@
+```
+bedtools links -h
+```
+
+Tool:    bedtools links (aka linksBed)
+Version: v2.30.0
+Summary: Creates HTML links to an UCSC Genome Browser from a feature file.
+
+Usage:   bedtools links [OPTIONS] -i <bed/gff/vcf> > out.html
+
+Options: 
+	-base	The browser basename.  Default: http://genome.ucsc.edu 
+	-org	The organism. Default: human
+	-db	The build.  Default: hg18
+
+Example: 
+	By default, the links created will point to human (hg18) UCSC browser.
+	If you have a local mirror, you can override this behavior by supplying
+	the -base, -org, and -db options.
+
+	For example, if the URL of your local mirror for mouse MM9 is called: 
+	http://mymirror.myuniversity.edu, then you would use the following:
+	-base http://mymirror.myuniversity.edu
+	-org mouse
+	-db mm9
diff --git a/src/bedtools/bedtools_links/script.sh b/src/bedtools/bedtools_links/script.sh
new file mode 100644
index 00000000..b8ee9a56
--- /dev/null
+++ b/src/bedtools/bedtools_links/script.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+# Execute bedtools links
+bedtools links \
+    ${par_base_url:+-base "$par_base_url"} \
+    ${par_organism:+-org "$par_organism"} \
+    ${par_database:+-db "$par_database"} \
+    -i "$par_input" \
+    > "$par_output"
diff --git a/src/bedtools/bedtools_links/test.sh b/src/bedtools/bedtools_links/test.sh
new file mode 100644
index 00000000..d79cbd6c
--- /dev/null
+++ b/src/bedtools/bedtools_links/test.sh
@@ -0,0 +1,98 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/genes.bed"
+chr21	9928613	10012791	uc002yip.1	0	-
+chr21	9928613	10012791	uc002yiq.1	0	-
+chr21	9928613	10012791	uc002yir.1	0	-
+chr21	9928613	10012791	uc010gkv.1	0	-
+chr21	9928613	10061300	uc002yis.1	0	-
+chr21	10042683	10120796	uc002yit.1	0	-
+chr21	10042683	10120808	uc002yiu.1	0	-
+chr21	10079666	10120808	uc002yiv.1	0	-
+chr21	10080031	10081687	uc002yiw.1	0	-
+chr21	10081660	10120796	uc002yix.2	0	-
+EOF
+
+# Test 1: Default Use
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools_links on BED file"
+"$meta_executable" \
+  --input "../genes.bed" \
+  --output "genes.html"
+
+# checks
+assert_file_exists "genes.html"
+assert_file_not_empty "genes.html"
+assert_file_contains "genes.html" "uc002yip.1"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: Base URL
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools_links with base option"
+"$meta_executable" \
+  --input "../genes.bed" \
+  --output "genes.html" \
+  --base_url "http://genome.ucsc.edu"
+
+# checks
+assert_file_exists "genes.html"
+assert_file_not_empty "genes.html"
+assert_file_contains "genes.html" "uc002yip.1"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: Organism and Genome Database Build
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bedtools_links with organism option and genome database build"
+"$meta_executable" \
+  --input "../genes.bed" \
+  --output "genes.html" \
+  --base_url "http://genome.ucsc.edu" \
+  --organism "mouse" \
+  --database "mm9"
+
+# checks
+assert_file_exists "genes.html"
+assert_file_not_empty "genes.html"
+assert_file_contains "genes.html" "uc002yip.1"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0

From 2b29a47575db9dbdff8448b287925c25d9a8b01d Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Mon, 2 Sep 2024 15:00:09 +0200
Subject: [PATCH 05/42] Bedtools GroupBY (#123)

* Initial Commit

* Update config.vsh.yaml

* config file

* script.sh

* adding some tests

* more test

* Update CHANGELOG.md

* deleted test_data

* bug fix

* Update config.vsh.yaml

* adding more links

* exit on error

* $TMPDIR

* Update script.sh

* Update config.vsh.yaml

* Suggested change on column option

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +
 src/bedtools/bedtools_groupby/config.vsh.yaml | 155 ++++++++++++++
 src/bedtools/bedtools_groupby/help.txt        |  93 ++++++++
 src/bedtools/bedtools_groupby/script.sh       |  36 ++++
 src/bedtools/bedtools_groupby/test.sh         | 198 ++++++++++++++++++
 5 files changed, 484 insertions(+)
 create mode 100644 src/bedtools/bedtools_groupby/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_groupby/help.txt
 create mode 100644 src/bedtools/bedtools_groupby/script.sh
 create mode 100644 src/bedtools/bedtools_groupby/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6dda7ab4..29fb8cfa 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -29,6 +29,7 @@
 * `bedtools`:
   - `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
+  - `bedtools/bedtools_groupby`: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).
   - `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
   - `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
@@ -38,6 +39,7 @@
 
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/bedtools/bedtools_groupby/config.vsh.yaml b/src/bedtools/bedtools_groupby/config.vsh.yaml
new file mode 100644
index 00000000..89c4845b
--- /dev/null
+++ b/src/bedtools/bedtools_groupby/config.vsh.yaml
@@ -0,0 +1,155 @@
+name: bedtools_groupby
+namespace: bedtools
+description: |
+  Summarizes a dataset column based upon common column groupings. 
+  Akin to the SQL "group by" command.
+keywords: [groupby, BED]
+links:
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/groupby.html
+  repository: https://github.com/arq5x/bedtools2
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        direction: input
+        description: |
+          The input BED file to be used.
+        required: true
+        example: input_a.bed
+        
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        direction: output
+        description: | 
+          The output groupby BED file. 
+        required: true
+        example: output.bed
+  
+  - name: Options
+    arguments:
+      - name: --groupby
+        alternatives: [-g, -grp]
+        type: string
+        description: |
+          Specify the columns (1-based) for the grouping.
+          The columns must be comma separated.
+          - Default: 1,2,3  
+        required: true 
+
+      - name: --column
+        alternatives: [-c, -opCols]
+        type: integer
+        description: |
+          Specify the column (1-based) that should be summarized.
+        required: true   
+
+      - name: --operation
+        alternatives: [-o, -ops]
+        type: string
+        description: |
+          Specify the operation that should be applied to opCol.
+          Valid operations:
+              sum, count, count_distinct, min, max,
+              mean, median, mode, antimode,
+              stdev, sstdev (sample standard dev.),
+              collapse (i.e., print a comma separated list (duplicates allowed)), 
+              distinct (i.e., print a comma separated list (NO duplicates allowed)), 
+              distinct_sort_num (as distinct, but sorted numerically, ascending), 
+              distinct_sort_num_desc (as distinct, but sorted numerically, descending), 
+              concat   (i.e., merge values into a single, non-delimited string), 
+              freqdesc (i.e., print desc. list of values:freq)
+              freqasc (i.e., print asc. list of values:freq)
+              first (i.e., print first value)
+              last (i.e., print last value)
+          
+          Default value: sum   
+
+          If there is only column, but multiple operations, all operations will be
+          applied on that column. Likewise, if there is only one operation, but
+          multiple columns, that operation will be applied to all columns.
+          Otherwise, the number of columns must match the the number of operations,
+          and will be applied in respective order.
+          E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5,
+          the mean of column 4, and the count of column 6.
+          The order of output columns will match the ordering given in the command.
+
+      - name: --full
+        type: boolean_true
+        description: |
+          Print all columns from input file. The first line in the group is used.
+          Default: print only grouped columns.
+
+      - name: --inheader
+        type: boolean_true
+        description: |
+          Input file has a header line - the first line will be ignored.
+
+      - name: --outheader
+        type: boolean_true
+        description: |
+          Print header line in the output, detailing the column names. 
+          If the input file has headers (-inheader), the output file
+          will use the input's column names.
+          If the input file has no headers, the output file
+          will use "col_1", "col_2", etc. as the column names.
+      
+      - name: --header
+        type: boolean_true
+        description: same as '-inheader -outheader'.
+
+      - name: --ignorecase
+        type: boolean_true
+        description: |
+          Group values regardless of upper/lower case.
+
+      - name: --precision
+        alternatives: -prec
+        type: integer
+        description: |
+          Sets the decimal precision for output. 
+        default: 5
+
+      - name: --delimiter
+        alternatives: -delim
+        type: string
+        description: |
+          Specify a custom delimiter for the collapse operations.
+        example: "|"
+        default: ","
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bedtools/bedtools_groupby/help.txt b/src/bedtools/bedtools_groupby/help.txt
new file mode 100644
index 00000000..a631b4b1
--- /dev/null
+++ b/src/bedtools/bedtools_groupby/help.txt
@@ -0,0 +1,93 @@
+```bash
+bedtools groupby
+```
+
+Tool:    bedtools groupby 
+Version: v2.30.0
+Summary: Summarizes a dataset column based upon
+	 common column groupings. Akin to the SQL "group by" command.
+
+Usage:	 bedtools groupby -g [group_column(s)] -c [op_column(s)] -o [ops] 
+	 cat [FILE] | bedtools groupby -g [group_column(s)] -c [op_column(s)] -o [ops] 
+
+Options: 
+	-i		Input file. Assumes "stdin" if omitted.
+
+	-g -grp		Specify the columns (1-based) for the grouping.
+			The columns must be comma separated.
+			- Default: 1,2,3
+
+	-c -opCols	Specify the column (1-based) that should be summarized.
+			- Required.
+
+	-o -ops		Specify the operation that should be applied to opCol.
+			Valid operations:
+			    sum, count, count_distinct, min, max,
+			    mean, median, mode, antimode,
+			    stdev, sstdev (sample standard dev.),
+			    collapse (i.e., print a comma separated list (duplicates allowed)), 
+			    distinct (i.e., print a comma separated list (NO duplicates allowed)), 
+			    distinct_sort_num (as distinct, but sorted numerically, ascending), 
+			    distinct_sort_num_desc (as distinct, but sorted numerically, descending), 
+			    concat   (i.e., merge values into a single, non-delimited string), 
+			    freqdesc (i.e., print desc. list of values:freq)
+			    freqasc (i.e., print asc. list of values:freq)
+			    first (i.e., print first value)
+			    last (i.e., print last value)
+			- Default: sum
+
+		If there is only column, but multiple operations, all operations will be
+		applied on that column. Likewise, if there is only one operation, but
+		multiple columns, that operation will be applied to all columns.
+		Otherwise, the number of columns must match the the number of operations,
+		and will be applied in respective order.
+		E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5,
+		the mean of column 4, and the count of column 6.
+		The order of output columns will match the ordering given in the command.
+
+
+	-full		Print all columns from input file.  The first line in the group is used.
+			Default: print only grouped columns.
+
+	-inheader	Input file has a header line - the first line will be ignored.
+
+	-outheader	Print header line in the output, detailing the column names. 
+			If the input file has headers (-inheader), the output file
+			will use the input's column names.
+			If the input file has no headers, the output file
+			will use "col_1", "col_2", etc. as the column names.
+
+	-header		same as '-inheader -outheader'
+
+	-ignorecase	Group values regardless of upper/lower case.
+
+	-prec	Sets the decimal precision for output (Default: 5)
+
+	-delim	Specify a custom delimiter for the collapse operations.
+		- Example: -delim "|"
+		- Default: ",".
+
+Examples: 
+	$ cat ex1.out
+	chr1 10  20  A   chr1    15  25  B.1 1000    ATAT
+	chr1 10  20  A   chr1    25  35  B.2 10000   CGCG
+
+	$ groupBy -i ex1.out -g 1,2,3,4 -c 9 -o sum
+	chr1 10  20  A   11000
+
+	$ groupBy -i ex1.out -grp 1,2,3,4 -opCols 9,9 -ops sum,max
+	chr1 10  20  A   11000   10000
+
+	$ groupBy -i ex1.out -g 1,2,3,4 -c 8,9 -o collapse,mean
+	chr1 10  20  A   B.1,B.2,    5500
+
+	$ cat ex1.out | groupBy -g 1,2,3,4 -c 8,9 -o collapse,mean
+	chr1 10  20  A   B.1,B.2,    5500
+
+	$ cat ex1.out | groupBy -g 1,2,3,4 -c 10 -o concat
+	chr1 10  20  A   ATATCGCG
+
+Notes: 
+	(1)  The input file/stream should be sorted/grouped by the -grp. columns
+	(2)  If -i is unspecified, input is assumed to come from stdin.
+
diff --git a/src/bedtools/bedtools_groupby/script.sh b/src/bedtools/bedtools_groupby/script.sh
new file mode 100644
index 00000000..b8a40cdc
--- /dev/null
+++ b/src/bedtools/bedtools_groupby/script.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_full
+    par_inheader
+    par_outheader
+    par_header
+    par_ignorecase
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+bedtools groupby \
+    ${par_full:+-full} \
+    ${par_inheader:+-inheader} \
+    ${par_outheader:+-outheader} \
+    ${par_header:+-header} \
+    ${par_ignorecase:+-ignorecase} \
+    ${par_precision:+-prec "$par_precision"} \
+    ${par_delimiter:+-delim "$par_delimiter"} \
+    -i "$par_input" \
+    -g "$par_groupby" \
+    -c "$par_column" \
+    ${par_operation:+-o "$par_operation"} \
+    > "$par_output"
+    
\ No newline at end of file
diff --git a/src/bedtools/bedtools_groupby/test.sh b/src/bedtools/bedtools_groupby/test.sh
new file mode 100644
index 00000000..ce99a1ec
--- /dev/null
+++ b/src/bedtools/bedtools_groupby/test.sh
@@ -0,0 +1,198 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+## VIASH START
+meta_executable="target/executable/bedtools/bedtools_groupby/bedtools_groupby"
+meta_resources_dir="src/bedtools/bedtools_groupby"
+## VIASH END
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create and populate example.bed
+cat << EOF > $TMPDIR/example.bed
+# Header
+chr21	9719758	9729320	variant1	chr21	9719768	9721892	ALR/Alpha	1004	+
+chr21	9719758	9729320	variant1	chr21	9721905	9725582	ALR/Alpha	1010	+
+chr21	9719758	9729320	variant1	chr21	9725582	9725977	L1PA3	3288	+
+chr21	9719758	9729320	variant1	chr21	9726021	9729309	ALR/Alpha	1051	+
+chr21	9729310	9757478	variant2	chr21	9729320	9729809	L1PA3	3897	-
+chr21	9729310	9757478	variant2	chr21	9729809	9730866	L1P1	8367	+
+chr21	9729310	9757478	variant2	chr21	9730866	9734026	ALR/Alpha	1036	-
+chr21	9729310	9757478	variant2	chr21	9734037	9757471	ALR/Alpha	1182	-
+chr21	9795588	9796685	variant3	chr21	9795589	9795713	(GAATG)n	308	+
+chr21	9795588	9796685	variant3	chr21	9795736	9795894	(GAATG)n	683	+
+chr21	9795588	9796685	variant3	chr21	9795911	9796007	(GAATG)n	345	+
+chr21	9795588	9796685	variant3	chr21	9796028	9796187	(GAATG)n	756	+
+chr21	9795588	9796685	variant3	chr21	9796202	9796615	(GAATG)n	891	+
+chr21	9795588	9796685	variant3	chr21	9796637	9796824	(GAATG)n	621	+
+EOF
+
+# Create and populate expected output files for different tests
+cat << EOF > $TMPDIR/expected.bed
+chr21	9719758	9729320	6353
+chr21	9729310	9757478	14482
+chr21	9795588	9796685	3604
+EOF
+cat << EOF > $TMPDIR/expected_max.bed
+chr21	9719758	9729320	variant1	3288
+chr21	9729310	9757478	variant2	8367
+chr21	9795588	9796685	variant3	891
+EOF
+cat << EOF > $TMPDIR/expected_full.bed
+chr21	9719758	9729320	variant1	chr21	9719768	9721892	ALR/Alpha	1004	+	6353
+chr21	9729310	9757478	variant2	chr21	9729320	9729809	L1PA3	3897	-	14482
+chr21	9795588	9796685	variant3	chr21	9795589	9795713	(GAATG)n	308	+	3604
+EOF
+cat << EOF > $TMPDIR/expected_delimited.bed
+chr21	9719758	9729320	variant1	1004;1010;3288;1051
+chr21	9729310	9757478	variant2	3897;8367;1036;1182
+chr21	9795588	9796685	variant3	308;683;345;756;891;621
+EOF
+cat << EOF > $TMPDIR/expected_precision.bed
+chr21	9719758	9729320	variant1	1.6e+03
+chr21	9729310	9757478	variant2	3.6e+03
+chr21	9795588	9796685	variant3	6e+02
+EOF
+
+# Test 1: without operation option, default operation is sum
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools groupby on BED file"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1,2,3" \
+  --column "9" \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected.bed"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: with operation max option
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools groupby on BED file with max operation"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1-4" \
+  --column "9" \
+  --operation "max" \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_max.bed"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: full option
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bedtools groupby on BED file with full option"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1-4" \
+  --column "9" \
+  --full \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_full.bed"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: header option
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bedtools groupby on BED file with header option"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1-4" \
+  --column "9" \
+  --header \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_file_contains "output.bed" "# Header"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: Delimiter and collapse
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bedtools groupby on BED file with delimiter and collapse options"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1-4" \
+  --column "9" \
+  --operation "collapse" \
+  --delimiter ";" \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_delimited.bed"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: precision option
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bedtools groupby on BED file with precision option"
+"$meta_executable" \
+  --input "../example.bed" \
+  --groupby "1-4" \
+  --column "9" \
+  --operation "mean" \
+  --precision 2 \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_precision.bed"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0

From f3e87e58c921a4ef59fe8946edcd066cdfc8de9c Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Mon, 2 Sep 2024 15:06:37 +0200
Subject: [PATCH 06/42] Bedtools bed12tobed6 (#140)

* Initial commit

* Update test.sh

* help file + n option

* adding n_score option

* small changes

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                  |  1 +
 .../bedtools_bed12tobed6/config.vsh.yaml      | 67 +++++++++++++++
 src/bedtools/bedtools_bed12tobed6/help.txt    | 13 +++
 src/bedtools/bedtools_bed12tobed6/script.sh   | 15 ++++
 src/bedtools/bedtools_bed12tobed6/test.sh     | 85 +++++++++++++++++++
 5 files changed, 181 insertions(+)
 create mode 100644 src/bedtools/bedtools_bed12tobed6/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_bed12tobed6/help.txt
 create mode 100644 src/bedtools/bedtools_bed12tobed6/script.sh
 create mode 100644 src/bedtools/bedtools_bed12tobed6/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 29fb8cfa..828253f0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -33,6 +33,7 @@
   - `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
   - `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
+  - `bedtools/bedtools_bed12tobed6`: Converts BED12 files to BED6 files (PR #140).
   - `bedtools/bedtools_links`: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).
  
 * `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74). 
diff --git a/src/bedtools/bedtools_bed12tobed6/config.vsh.yaml b/src/bedtools/bedtools_bed12tobed6/config.vsh.yaml
new file mode 100644
index 00000000..8dd6328c
--- /dev/null
+++ b/src/bedtools/bedtools_bed12tobed6/config.vsh.yaml
@@ -0,0 +1,67 @@
+name: bedtools_bed12tobed6
+namespace: bedtools
+description: | 
+  Converts BED features in BED12 (a.k.a. “blocked” BED features such as genes) to discrete BED6 features.
+  For example, in the case of a gene with six exons, bed12ToBed6 would create six separate BED6 features (i.e., one for each exon).
+keywords: [Converts, BED12, BED6]
+links:
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/bed12tobed6.html
+  repository: https://github.com/arq5x/bedtools2
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input BED12 file.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        type: file
+        direction: output
+        description: Output BED6 file to be written.
+
+  - name: Options
+    arguments:
+      - name: --n_score
+        alternatives: -n
+        type: boolean_true
+        description: | 
+          Force the score to be the (1-based) block number from the BED12.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bedtools/bedtools_bed12tobed6/help.txt b/src/bedtools/bedtools_bed12tobed6/help.txt
new file mode 100644
index 00000000..17af6983
--- /dev/null
+++ b/src/bedtools/bedtools_bed12tobed6/help.txt
@@ -0,0 +1,13 @@
+```
+bedtools bed12tobed6 -h
+```
+
+Tool:    bedtools bed12tobed6 (aka bed12ToBed6)
+Version: v2.30.0
+Summary: Splits BED12 features into discrete BED6 features.
+
+Usage:   bedtools bed12tobed6 [OPTIONS] -i <bed12>
+
+Options: 
+	-n	Force the score to be the (1-based) block number from the BED12.
+
diff --git a/src/bedtools/bedtools_bed12tobed6/script.sh b/src/bedtools/bedtools_bed12tobed6/script.sh
new file mode 100644
index 00000000..bbfaddc6
--- /dev/null
+++ b/src/bedtools/bedtools_bed12tobed6/script.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+# Unset parameters
+[[ "$par_n_score" == "false" ]] && unset par_n_score
+
+# Execute bedtools bed12tobed6 conversion 
+bedtools bed12tobed6 \
+    ${par_n_score:+-n} \
+    -i "$par_input" \
+    > "$par_output"
diff --git a/src/bedtools/bedtools_bed12tobed6/test.sh b/src/bedtools/bedtools_bed12tobed6/test.sh
new file mode 100644
index 00000000..2ef596d9
--- /dev/null
+++ b/src/bedtools/bedtools_bed12tobed6/test.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create example BED12 file
+cat <<EOF > "$TMPDIR/example.bed12"
+chr21	10079666	10120808	uc002yiv.1	0	-	10081686	1	0	1	2	0	6	0	8	0	4	528,91,101,215,	0,1930,39750,40927,
+chr21	10080031	10081687	uc002yiw.1	0	-	10080031	1	0	0	8	0	0	3	1	0	2	200,91,	0,1565,
+chr21	10081660	10120796	uc002yix.2	0	-	10081660	1	0	0	8	1	6	6	0	0	3	27,101,223,	0,37756,38913,
+EOF
+
+# Expected output bed6 file
+cat <<EOF > "$TMPDIR/expected.bed6"
+chr21	10079666	10120808	uc002yiv.1	0	-
+chr21	10080031	10081687	uc002yiw.1	0	-
+chr21	10081660	10120796	uc002yix.2	0	-
+EOF
+# Expected output bed6 file with -n option
+cat <<EOF > "$TMPDIR/expected_n.bed6"
+chr21	10079666	10120808	uc002yiv.1	1	-
+chr21	10080031	10081687	uc002yiw.1	1	-
+chr21	10081660	10120796	uc002yix.2	1	-
+EOF
+
+# Test 1: Default conversion BED12 to BED6
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools_bed12tobed6 on BED12 file"
+"$meta_executable" \
+  --input "../example.bed12" \
+  --output "output.bed6"
+
+# checks
+assert_file_exists "output.bed6"
+assert_file_not_empty "output.bed6"
+assert_identical_content "output.bed6" "../expected.bed6"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: Conversion BED12 to BED6 with -n option
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools_bed12tobed6 on BED12 file with -n option"
+"$meta_executable" \
+  --input "../example.bed12" \
+  --output "output.bed6" \
+  --n_score
+
+# checks
+assert_file_exists "output.bed6"
+assert_file_not_empty "output.bed6"
+assert_identical_content "output.bed6" "../expected_n.bed6"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0

From da3272d0118227ee788cd93b222201f557729397 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Mon, 2 Sep 2024 15:25:41 +0200
Subject: [PATCH 07/42] Bcftools sort (#141)

* Initial commit

* Update on config file

* Update

* Update config.vsh.yaml

* Update config.vsh.yaml

* Update test.sh

* Update help.txt

* adding meta variables

* Adding test for bcf file

* Update CHANGELOG.md

* Update config.vsh.yaml

* requested changes

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +
 src/bcftools/bcftools_sort/config.vsh.yaml    |  73 +++++++
 src/bcftools/bcftools_sort/help.txt           |  14 ++
 src/bcftools/bcftools_sort/script.sh          |  16 ++
 src/bcftools/bcftools_sort/test.sh            | 185 ++++++++++++++++++
 .../bcftools_sort/test_data/example.bcf       | Bin 0 -> 1183 bytes
 6 files changed, 290 insertions(+)
 create mode 100644 src/bcftools/bcftools_sort/config.vsh.yaml
 create mode 100644 src/bcftools/bcftools_sort/help.txt
 create mode 100644 src/bcftools/bcftools_sort/script.sh
 create mode 100644 src/bcftools/bcftools_sort/test.sh
 create mode 100644 src/bcftools/bcftools_sort/test_data/example.bcf

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 828253f0..11052113 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -40,6 +40,8 @@
 
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
+* `bcftools`:
+  - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
 
 ## MINOR CHANGES
 
diff --git a/src/bcftools/bcftools_sort/config.vsh.yaml b/src/bcftools/bcftools_sort/config.vsh.yaml
new file mode 100644
index 00000000..71a15309
--- /dev/null
+++ b/src/bcftools/bcftools_sort/config.vsh.yaml
@@ -0,0 +1,73 @@
+name: bcftools_sort
+namespace: bcftools
+description: | 
+  Sorts VCF/BCF files.
+keywords: [Sort, VCF, BCF]
+links:
+  homepage: https://samtools.github.io/bcftools/
+  documentation: https://samtools.github.io/bcftools/bcftools.html#sort
+  repository: https://github.com/samtools/bcftools
+  issue_tracker: https://github.com/samtools/bcftools/issues
+references:
+  doi: https://doi.org/10.1093/gigascience/giab008
+license: MIT/Expat, GNU
+requirements:
+  commands: [bcftools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input VCF/BCF file.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        direction: output
+        type: file
+        description: Output sorted VCF/BCF file.
+        required: true
+         
+  - name: Options
+    arguments:
+      - name: --output_type
+        alternatives: -O
+        type: string
+        choices: [b, u, z, v]
+        description: | 
+          Compresses or uncompresses the output.
+          The options are:
+            b: compressed BCF, 
+            u: uncompressed BCF, 
+            z: compressed VCF, 
+            v: uncompressed VCF.        
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bcftools, procps]
+      - type: docker
+        run: |
+          echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bcftools/bcftools_sort/help.txt b/src/bcftools/bcftools_sort/help.txt
new file mode 100644
index 00000000..3b5fa80b
--- /dev/null
+++ b/src/bcftools/bcftools_sort/help.txt
@@ -0,0 +1,14 @@
+```
+bcftools sort
+```
+
+About:   Sort VCF/BCF file.
+Usage:   bcftools sort [OPTIONS] <FILE.vcf>
+
+Options:
+    -m, --max-mem FLOAT[kMG]       maximum memory to use [768M]
+    -o, --output FILE              output file name [stdout]
+    -O, --output-type b|u|z|v      b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
+    -O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
+    -T, --temp-dir DIR             temporary files [/tmp/bcftools.XXXXXX]
+
diff --git a/src/bcftools/bcftools_sort/script.sh b/src/bcftools/bcftools_sort/script.sh
new file mode 100644
index 00000000..e9afb223
--- /dev/null
+++ b/src/bcftools/bcftools_sort/script.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Execute bedtools bamtofastq with the provided arguments
+bcftools sort \
+    -o "$par_output" \
+    ${par_output_type:+-O "$par_output_type"} \
+    ${meta_memory_mb:+-m "${meta_memory_mb}M"} \
+    ${meta_temp_dir:+-T "$meta_temp_dir"} \
+    $par_input \
+
diff --git a/src/bcftools/bcftools_sort/test.sh b/src/bcftools/bcftools_sort/test.sh
new file mode 100644
index 00000000..f406b8e2
--- /dev/null
+++ b/src/bcftools/bcftools_sort/test.sh
@@ -0,0 +1,185 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/example.vcf"
+##fileformat=VCFv4.0
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##contig=<ID=19,length=58617616>
+##contig=<ID=20,length=58617616>
+##phasing=partial
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
+##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
+##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
+##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
+##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
+##FILTER=<ID=q10,Description="Quality below 10">
+##FILTER=<ID=s50,Description="Less than 50% of samples have data">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
+##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
+##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
+##ALT=<ID=CNV,Description="Copy number variable region">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
+19	112	.	A	G	10	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+19	111	.	A	C	9.6	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+20	1235237	.	T	.	.	.	.	GT	0/0	0|0	./.
+20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
+20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:.,.
+20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:.,.
+20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:.:56,60	0|0:48:4:51,51	0/0:61:2:.,.
+20	1234567	microsat1	G	GA,GAC	50	PASS	NS=3;DP=9;AA=G;AN=6;AC=3,1	GT:GQ:DP	0/1:.:4	0/2:17:2	1/1:40:3
+EOF
+
+# Create expected output
+cat <<EOF > "$TMPDIR/expected_output.vcf"
+##fileformat=VCFv4.0
+##FILTER=<ID=PASS,Description="All filters passed">
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##contig=<ID=19,length=58617616>
+##contig=<ID=20,length=58617616>
+##phasing=partial
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
+##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
+##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
+##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
+##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
+##FILTER=<ID=q10,Description="Quality below 10">
+##FILTER=<ID=s50,Description="Less than 50% of samples have data">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
+##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
+##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
+##ALT=<ID=CNV,Description="Copy number variable region">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
+19	111	.	A	C	9.6	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+19	112	.	A	G	10	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
+20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:.,.
+20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:.,.
+20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:.:56,60	0|0:48:4:51,51	0/0:61:2:.,.
+20	1234567	microsat1	G	GA,GAC	50	PASS	NS=3;DP=9;AA=G;AN=6;AC=3,1	GT:GQ:DP	0/1:.:4	0/2:17:2	1/1:40:3
+20	1235237	.	T	.	.	.	.	GT	0/0	0|0	./.
+EOF
+
+cat <<EOF > "$TMPDIR/expected_bcf.vcf"
+##fileformat=VCFv4.0
+##FILTER=<ID=PASS,Description="All filters passed">
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##contig=<ID=19,length=58617616>
+##contig=<ID=20,length=58617616>
+##phasing=partial
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
+##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
+##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
+##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
+##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
+##FILTER=<ID=q10,Description="Quality below 10">
+##FILTER=<ID=s50,Description="Less than 50% of samples have data">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
+##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
+##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
+##ALT=<ID=CNV,Description="Copy number variable region">
+##bcftools_viewVersion=1.16+htslib-1.16
+##bcftools_viewCommand=view -O b -o example.bcf example.vcf.gz; Date=Mon Aug 26 13:00:22 2024
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
+19	111	.	A	C	9.6	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+19	112	.	A	G	10	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
+20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:.,.
+20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:.,.
+20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:.:56,60	0|0:48:4:51,51	0/0:61:2:.,.
+20	1234567	microsat1	G	GA,GAC	50	PASS	NS=3;DP=9;AA=G;AN=6;AC=3,1	GT:GQ:DP	0/1:.:4	0/2:17:2	1/1:40:3
+20	1235237	.	T	.	.	.	.	GT	0/0	0|0	./.
+EOF
+
+
+# Test 1: Default Use
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bcftools_sort on VCF file"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "output.vcf" \
+  --output_type "v" \
+  &> /dev/null
+
+# checks
+assert_file_exists "output.vcf"
+assert_file_not_empty "output.vcf"
+assert_identical_content "output.vcf" "../expected_output.vcf"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: BCF file input
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bcftools_sort on BCF file"
+"$meta_executable" \
+  --input "${test_data}/example.bcf" \
+  --output "output.vcf" \
+  --output_type "v" \
+  &> /dev/null
+
+# checks
+assert_file_exists "output.vcf"
+assert_file_not_empty "output.vcf"
+assert_identical_content "output.vcf" "../expected_bcf.vcf"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
diff --git a/src/bcftools/bcftools_sort/test_data/example.bcf b/src/bcftools/bcftools_sort/test_data/example.bcf
new file mode 100644
index 0000000000000000000000000000000000000000..d78ae010b4f1b6924f72b296a50e1cab5ca5f0ca
GIT binary patch
literal 1183
zcmV;Q1Yr9giwFb&00000{{{d;LjnMT1dWu>Z{tK1$6Y%~*IgFjN>zkFt7*dumQACv
z)1*JByX!bj8YxZFCR@0nNis<+`PbT$F697`koLlz6^H{jByRi%T=*A|_zyU9<izuj
zq;aU0P856WdGGVS@68)m^cv6ql;OBsZfKZv=y(=k>ZM-0+9}|*T&~t=c8@RA!$wtY
zYn@KMO8vfPxXf^DbxSi%1YV4KK5~ig)4^80QT<HCM{5zTN*GlYjQ8;#F2OtBnRtDw
zTGtKBoiGwm+MYA=h;>;k$Y9{nA@!&YHZsPzPHo4ce%mk|w%yVzjba&W`i{+vQ7B2?
z&zsa9v9Vg(E6clOxqK6U!A!xon#qBgJ0`wik5rd<%pfMbX|!r*p<AnyZ%wQo^;9|E
zy>_YEuo)duFIj}h1UW<0A(m@WAAM@DF@n^|)=ii>RBPP@+I1K)kv&Hf)1(2~uu(sl
z56}o|!@%E<f+)!#`YcG{4MDUJiu%sPW&wGQ%p>RnL`i>)v}QLBo)1jem?EE86gl1i
zlo219hR<gEUQCFnw(p*4mAY&kL3Wav)Rr=-nGRvM=27$LiwX43b;KzpIogt#4)b8!
zmq5^XKJ!3Ngp5GtWLA|K2+stOCCs4LX|<6>11o=QxOvo@80G-U%6>%LO%P9%|0SLf
z_msHG0y6^b9VL(G3mGRJ&nLt**fr`=(|L(x9J%c;x%s6fw<h&{*lS{=5`Gl)pC7mO
zm|RgbxM^)1lFlwpwQfkNckfW<dfaC*LtbW_-=%M65EQ2v!;~f1-J0-%lEHs^aDj3%
zm!!Ob0=s11rYSrYaF8NF$IXM&;{&bvJn3A8ehX$nwP6v#qeX4Dwdkw2E|X5W<6h6c
z5?91CNDt0|M@O(2I`;ZQ<~XMR`ISLmUP9*tUUES$KN&OMG<w^?eA!NSEQ{C!HJGFA
zGt@)d4x(R#9Z&$v`TVLsER4SV1chJuQ;@DrMo1|mxp;u_fubM<D?4jB{p8~8RBWGh
z#70%TIIfA1j(GlDYl=||ior<`qiTyS4a1P*yAt1vYurPQ<JP%oIPhNyXGr5Xo)h4*
zn$@y(PP)Iz3-C)x4>s3e2*RSD9sB+8{ksW2{T9FX!A^|rKlpP!ysvQ6uY;=qm(^@H
z^x}DVQep97Y`nQQ6Ze;vj)J6Kdwv<0c9ha?ww~1_PFj%!DJ8uvr8mItskF*T_aY4`
zA(fKek}!S(>nr!tpr;69IxQ@O_*I{Uc=k+)Lx3J2i!u2=kiXNvJ&f{7^E@zR;W4ZT
zX#toAiY82_)dl9yf4>tmUscjToXg(9#ZoCKZnhf==K-=Nr62wC)h|ci{PyuvDFX|u
zf%?>HCY{MZqzyg?Mj0-sX7;^xemZ@;5b4L?*W*Zs6(Oo-pnhJJ2O%-ukU)4DtMGc7
z;sLjJ7!;!d&KoiA)gLq5$xOKL+A{i{;}a%l>mSy&-MXgF@gm^skVQrkHieZSPKwWj
xMW0F^!Fq}p{sk88Nlg_A001A02m}BC000301^_}s0stET0{{R300000005SPILiP4

literal 0
HcmV?d00001


From e6627ec728761fe63fe75b0a10ba51da2bccec21 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Tue, 3 Sep 2024 10:55:42 +0200
Subject: [PATCH 08/42] FastQC (#92)

* Starting Component

* Creating Files

* update on config file

* Update on test.sh

* Update on config

* Update script.sh

* Update on script.sh

* trying to figure multiple: true

* Update on script.sh

* Update on script

* Adding some tests

* More tests

* Update on script.sh

* Added more tests

* Small Changes

* Update test.sh

* Update on Script and Test

- change the --zip and --html to take wild card '*'

* Added one more test

* Removed test_data dir

* More description

* Update CHANGELOG.md

* Update on config and script

- meta_cpus
- meta_tmp_dir

* Bug Fixed

* unset_if_false

* Updating Tests

* Update script.sh

* debugging

* Minor changes

* Update config.vsh.yaml

* Update config.vsh.yaml

* Required Changes

- large changes on script.sh

* Update config.vsh.yaml

* Adding extra links

* tmpdir bug

* Updating tests

* minor changes

* Adding extra output options

--summary
--data

* minor change

* Update script.sh

* small change in config
---
 CHANGELOG.md               |   2 +
 src/fastqc/config.vsh.yaml | 209 +++++++++++++++++++++++++++++++++
 src/fastqc/help.txt        | 125 ++++++++++++++++++++
 src/fastqc/script.sh       |  86 ++++++++++++++
 src/fastqc/test.sh         | 235 +++++++++++++++++++++++++++++++++++++
 5 files changed, 657 insertions(+)
 create mode 100644 src/fastqc/config.vsh.yaml
 create mode 100644 src/fastqc/help.txt
 create mode 100644 src/fastqc/script.sh
 create mode 100644 src/fastqc/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 11052113..98e78c17 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -43,6 +43,8 @@
 * `bcftools`:
   - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
 
+* `fastqc`: High throughput sequence quality control analysis tool (PR #92).
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/fastqc/config.vsh.yaml b/src/fastqc/config.vsh.yaml
new file mode 100644
index 00000000..75b16f36
--- /dev/null
+++ b/src/fastqc/config.vsh.yaml
@@ -0,0 +1,209 @@
+name: fastqc
+description: FastQC - A high throughput sequence QC analysis tool.
+keywords: [Quality control, BAM, SAM, FASTQ]
+links:
+  homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
+  documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
+  repository: https://github.com/s-andrews/FastQC
+  issue_tracker: https://github.com/s-andrews/FastQC/issues
+license: GPL-3.0, Apache-2.0
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        type: file
+        direction: input
+        multiple: true
+        description: | 
+          FASTQ file(s) to be analyzed.
+        required: true
+        example: input.fq
+        
+  - name: Outputs
+    description: |
+      At least one of the output options (--html, --zip, --summary, --data) must be used.
+    arguments:
+
+      - name: --html
+        type: file
+        direction: output
+        multiple: true
+        description: |
+          Create the HTML report of the results. 
+          '*' wild card must be provided in the output file name. 
+          Wild card will be replaced by the input file basename.
+          e.g. 
+            --input "sample_1.fq"
+            --html "*.html"
+            would create an output html file named sample_1.html
+        example: "*.html"
+      
+      - name: --zip
+        type: file
+        direction: output
+        multiple: true
+        description: |
+          Create the zip file(s) containing: html report, data, images, icons, summary, etc.
+          '*' wild card must be provided in the output file name.
+          Wild card will be replaced by the input basename.
+          e.g. 
+            --input "sample_1.fq"
+            --html "*.zip"
+            would create an output zip file named sample_1.zip
+        example: "*.zip"   
+
+      - name: --summary
+        type: file
+        direction: output
+        multiple: true
+        description: |
+          Create the summary file(s).
+          '*' wild card must be provided in the output file name.
+          Wild card will be replaced by the input basename.
+          e.g. 
+            --input "sample_1.fq"
+            --summary "*_summary.txt"
+            would create an output summary.txt file named sample_1_summary.txt
+        example: "*_summary.txt"
+
+      - name: --data
+        type: file
+        direction: output
+        multiple: true
+        description: |
+          Create the data file(s).
+          '*' wild card must be provided in the output file name.
+          Wild card will be replaced by the input basename.
+          e.g. 
+            --input "sample_1.fq"
+            --summary "*_data.txt"
+            would create an output data.txt file named sample_1_data.txt
+        example: "*_data.txt"
+
+  - name: Options
+    arguments:  
+      - name: --casava
+        type: boolean_true
+        description: | 
+          Files come from raw casava output. Files in the same sample
+          group (differing only by the group number) will be analysed
+          as a set rather than individually. Sequences with the filter
+          flag set in the header will be excluded from the analysis.
+          Files must have the same names given to them by casava
+          (including being gzipped and ending with .gz) otherwise they
+          won't be grouped together correctly.
+      
+      - name: --nano
+        type: boolean_true
+        description: |
+          Files come from nanopore sequences and are in fast5 format. In
+          this mode you can pass in directories to process and the program
+          will take in all fast5 files within those directories and produce
+          a single output file from the sequences found in all files.
+      
+      - name: --nofilter
+        type: boolean_true
+        description: |
+          If running with --casava then don't remove read flagged by
+          casava as poor quality when performing the QC analysis.
+
+      - name: --nogroup
+        type: boolean_true
+        description: |
+          Disable grouping of bases for reads >50bp. 
+          All reports will show data for every base in the read. 
+          WARNING: Using this option will cause fastqc to crash 
+          and burn if you use it on really long reads, and your 
+          plots may end up a ridiculous size. You have been warned!
+
+      - name: --min_length
+        type: integer
+        description: |
+          Sets an artificial lower limit on the length of the 
+          sequence to be shown in the report. As long as you 
+          set this to a value greater or equal to your longest 
+          read length then this will be the sequence length used 
+          to create your read groups. This can be useful for making
+          directly comparable statistics from datasets with somewhat 
+          variable read lengths.
+        example: 0
+
+      - name: --format
+        alternatives: -f
+        type: string
+        description: |
+          Bypasses the normal sequence file format detection and 
+          forces the program to use the specified format. 
+          Valid formats are bam, sam, bam_mapped, sam_mapped, and fastq.
+        example: bam
+        
+      - name: --contaminants
+        alternatives: -c
+        type: file
+        description: |
+          Specifies a non-default file which contains the list 
+          of contaminants to screen overrepresented sequences against. 
+          The file must contain sets of named contaminants in the form
+          name[tab]sequence. Lines prefixed with a hash will be ignored.
+        example: contaminants.txt
+        
+      - name: --adapters
+        alternatives: -a
+        type: file
+        description: |
+          Specifies a non-default file which contains the list of 
+          adapter sequences which will be explicitly searched against 
+          the library. The file must contain sets of named adapters 
+          in the form name[tab]sequence. Lines prefixed with a hash will be ignored.
+        example: adapters.txt
+
+      - name: --limits
+        alternatives: -l
+        type: file
+        description: |
+          Specifies a non-default file which contains 
+          a set of criteria which will be used to determine 
+          the warn/error limits for the various modules. 
+          This file can also be used to selectively remove 
+          some modules from the output altogether. The format 
+          needs to mirror the default limits.txt file found in 
+          the Configuration folder.
+        example: limits.txt
+
+      - name: --kmers
+        alternatives: -k
+        type: integer
+        description: |
+          Specifies the length of Kmer to look for in the Kmer 
+          content module. Specified Kmer length must be between 
+          2 and 10. Default length is 7 if not specified.
+        example: 7
+        
+      - name: --quiet
+        alternatives: -q
+        type: boolean_true
+        description: |
+          Suppress all progress messages on stdout and only report errors.
+        
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: biocontainers/fastqc:v0.11.9_cv8
+    setup:
+      - type: docker
+        run: |
+          echo "fastqc: $(fastqc --version | sed -n 's/^FastQC //p')" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/fastqc/help.txt b/src/fastqc/help.txt
new file mode 100644
index 00000000..502aebc0
--- /dev/null
+++ b/src/fastqc/help.txt
@@ -0,0 +1,125 @@
+```bash
+fastqc --help
+```
+
+            FastQC - A high throughput sequence QC analysis tool
+
+SYNOPSIS
+
+	fastqc seqfile1 seqfile2 .. seqfileN
+
+    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] 
+           [-c contaminant file] seqfile1 .. seqfileN
+
+DESCRIPTION
+
+    FastQC reads a set of sequence files and produces from each one a quality
+    control report consisting of a number of different modules, each one of 
+    which will help to identify a different potential type of problem in your
+    data.
+    
+    If no files to process are specified on the command line then the program
+    will start as an interactive graphical application.  If files are provided
+    on the command line then the program will run with no user interaction
+    required.  In this mode it is suitable for inclusion into a standardised
+    analysis pipeline.
+    
+    The options for the program as as follows:
+    
+    -h --help       Print this help file and exit
+    
+    -v --version    Print the version of the program and exit
+    
+    -o --outdir     Create all output files in the specified output directory.
+                    Please note that this directory must exist as the program
+                    will not create it.  If this option is not set then the 
+                    output file for each sequence file is created in the same
+                    directory as the sequence file which was processed.
+                    
+    --casava        Files come from raw casava output. Files in the same sample
+                    group (differing only by the group number) will be analysed
+                    as a set rather than individually. Sequences with the filter
+                    flag set in the header will be excluded from the analysis.
+                    Files must have the same names given to them by casava
+                    (including being gzipped and ending with .gz) otherwise they
+                    won't be grouped together correctly.
+                    
+    --nano          Files come from nanopore sequences and are in fast5 format. In
+                    this mode you can pass in directories to process and the program
+                    will take in all fast5 files within those directories and produce
+                    a single output file from the sequences found in all files.                    
+                    
+    --nofilter      If running with --casava then don't remove read flagged by
+                    casava as poor quality when performing the QC analysis.
+                   
+    --extract       If set then the zipped output file will be uncompressed in
+                    the same directory after it has been created.  By default
+                    this option will be set if fastqc is run in non-interactive
+                    mode.
+                    
+    -j --java       Provides the full path to the java binary you want to use to
+                    launch fastqc. If not supplied then java is assumed to be in
+                    your path.
+                   
+    --noextract     Do not uncompress the output file after creating it.  You
+                    should set this option if you do not wish to uncompress
+                    the output when running in non-interactive mode.
+                    
+    --nogroup       Disable grouping of bases for reads >50bp. All reports will
+                    show data for every base in the read.  WARNING: Using this
+                    option will cause fastqc to crash and burn if you use it on
+                    really long reads, and your plots may end up a ridiculous size.
+                    You have been warned!
+                    
+    --min_length    Sets an artificial lower limit on the length of the sequence
+                    to be shown in the report.  As long as you set this to a value
+                    greater or equal to your longest read length then this will be
+                    the sequence length used to create your read groups.  This can
+                    be useful for making directly comaparable statistics from 
+                    datasets with somewhat variable read lengths.
+                    
+    -f --format     Bypasses the normal sequence file format detection and
+                    forces the program to use the specified format.  Valid
+                    formats are bam,sam,bam_mapped,sam_mapped and fastq
+                    
+    -t --threads    Specifies the number of files which can be processed
+                    simultaneously.  Each thread will be allocated 250MB of
+                    memory so you shouldn't run more threads than your
+                    available memory will cope with, and not more than
+                    6 threads on a 32 bit machine
+                  
+    -c              Specifies a non-default file which contains the list of
+    --contaminants  contaminants to screen overrepresented sequences against.
+                    The file must contain sets of named contaminants in the
+                    form name[tab]sequence.  Lines prefixed with a hash will
+                    be ignored.
+
+    -a              Specifies a non-default file which contains the list of
+    --adapters      adapter sequences which will be explicity searched against
+                    the library. The file must contain sets of named adapters
+                    in the form name[tab]sequence.  Lines prefixed with a hash
+                    will be ignored.
+                    
+    -l              Specifies a non-default file which contains a set of criteria
+    --limits        which will be used to determine the warn/error limits for the
+                    various modules.  This file can also be used to selectively 
+                    remove some modules from the output all together.  The format
+                    needs to mirror the default limits.txt file found in the
+                    Configuration folder.
+                    
+   -k --kmers       Specifies the length of Kmer to look for in the Kmer content
+                    module. Specified Kmer length must be between 2 and 10. Default
+                    length is 7 if not specified.
+                    
+   -q --quiet       Supress all progress messages on stdout and only report errors.
+   
+   -d --dir         Selects a directory to be used for temporary files written when
+                    generating report images. Defaults to system temp directory if
+                    not specified.
+                    
+BUGS
+
+    Any bugs in fastqc should be reported either to simon.andrews@babraham.ac.uk
+    or in www.bioinformatics.babraham.ac.uk/bugzilla/
+                   
+    
diff --git a/src/fastqc/script.sh b/src/fastqc/script.sh
new file mode 100644
index 00000000..5cf55868
--- /dev/null
+++ b/src/fastqc/script.sh
@@ -0,0 +1,86 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# exit on error
+set -eo pipefail
+
+# Check if both outputs are empty, at least one must be passed.
+if [[ -z "$par_html" ]] && [[ -z "$par_zip" ]] && [[ -z "$par_summary" ]] && [[ -z "$par_data" ]]; then
+  echo "Error: At least one of the output arguments (--html, --zip, --summary, and --data) must be passed."
+  exit 1
+fi
+
+# unset flags
+unset_if_false=(
+  par_casava
+  par_nano
+  par_nofilter
+  par_extract
+  par_noextract
+  par_nogroup
+  par_quiet
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+tmpdir=$(mktemp -d "${meta_temp_dir}/${meta_name}-XXXXXXXX")
+function clean_up {
+  rm -rf "$tmpdir"
+}
+trap clean_up EXIT
+
+# Create input array 
+IFS=";" read -ra input <<< $par_input
+
+# Run fastqc
+fastqc \
+  --extract \
+  ${par_casava:+--casava} \
+  ${par_nano:+--nano} \
+  ${par_nofilter:+--nofilter} \
+  ${par_nogroup:+--nogroup} \
+  ${par_min_length:+--min_length "$par_min_length"} \
+  ${par_format:+--format "$par_format"} \
+  ${par_contaminants:+--contaminants "$par_contaminants"} \
+  ${par_adapters:+--adapters "$par_adapters"} \
+  ${par_limits:+--limits "$par_limits"} \
+  ${par_kmers:+--kmers "$par_kmers"} \
+  ${par_quiet:+--quiet} \
+  ${meta_cpus:+--threads "$meta_cpus"} \
+  ${meta_temp_dir:+--dir "$meta_temp_dir"} \
+  --outdir "${tmpdir}" \
+  "${input[@]}"
+ 
+# Move output files
+for file in "${input[@]}"; do
+  # Removes everthing after the first dot of the basename
+  sample_name=$(basename "${file}" | sed 's/\..*$//')
+  if [[ -n "$par_html" ]]; then
+    input_html="${tmpdir}/${sample_name}_fastqc.html"
+    html_file="${par_html//\*/$sample_name}"
+    mv "$input_html" "$html_file"
+  fi
+  if [[ -n "$par_zip" ]]; then
+    input_zip="${tmpdir}/${sample_name}_fastqc.zip"
+    zip_file="${par_zip//\*/$sample_name}"
+    mv "$input_zip" "$zip_file"
+  fi
+  if [[ -n "$par_summary" ]]; then
+    summary_file="${tmpdir}/${sample_name}_fastqc/summary.txt"
+    new_summary="${par_summary//\*/$sample_name}"
+    mv "$summary_file" "$new_summary"
+  fi
+  if [[ -n "$par_data" ]]; then
+    data_file="${tmpdir}/${sample_name}_fastqc/fastqc_data.txt"
+    new_data="${par_data//\*/$sample_name}"
+    mv "$data_file" "$new_data"
+  fi
+  # Remove the extracted directory
+  rm -r "${tmpdir}/${sample_name}_fastqc"
+done
+
diff --git a/src/fastqc/test.sh b/src/fastqc/test.sh
new file mode 100644
index 00000000..8c581ac8
--- /dev/null
+++ b/src/fastqc/test.sh
@@ -0,0 +1,235 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+## VIASH START
+# meta_executable="target/executable/fastqc"
+# meta_resources_dir="src/fastqc"
+## VIASH END
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create and populate input.fasta
+cat > "$TMPDIR/input_1.fq" <<EOL
+@HWI-ST330:304:H045HADXX:1:1101:1111:61397
+CACTTGTAAGGGCAGGCCCCCTTCACCCTCCCGCTCCTGGGGGANNNNNNNNNNANNNCGAGGCCCTGGGGTAGAGGGNNNNNNNNNNNNNNGATCTTGG
++
+@?@DDDDDDHHH?GH:?FCBGGB@C?DBEGIIIIAEF;FCGGI#########################################################
+EOL
+
+cat > "$TMPDIR/input_2.fq" <<EOL
+@HWI-ST330:304:H045HADXX:1:1101:1111:61397
+CACTTGTAAGGGCAGGCCCCCTTCACCCTCCCGCTCCTGGGGGANNNNNNNNNNANNNCGAGGCCCTGGGGTAGAGGGNNNNNNNNNNNNNNGATCTTGG
++
+@?@DDDDDDHHH?GH:?FCBGGB@C?DBEGIIIIAEF;FCGGI#########################################################
+EOL
+
+# Create and populate contaminants.txt
+printf "contaminant_sequence1\tCACTTGTAAGGGCAGGCCCCCTTCACCCTCCCGCTCCTGGGGGA\n" > "$TMPDIR/contaminants.txt"
+printf "contaminant_sequence2\tGATCTTGG\n" >> "$TMPDIR/contaminants.txt"
+
+# Create and populate SAM file 
+printf "@HD\tVN:1.0\tSO:unsorted\n" > "$TMPDIR/example.sam"
+printf "@SQ\tSN:chr1\tLN:248956422\n" >> "$TMPDIR/example.sam"
+printf "@SQ\tSN:chr2\tLN:242193529\n" >> "$TMPDIR/example.sam"
+printf "@PG\tID:bowtie2\tPN:bowtie2\tVN:2.3.4.1\tCL:\"/usr/bin/bowtie2-align-s --wrapper basic-0 -x genome -U reads.fq -S output.sam\"\n" >> "$TMPDIR/example.sam"
+printf "read1\t0\tchr1\t100\t255\t50M\t*\t0\t0\tACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT\tIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII\tAS:i:-10\tXN:i:0\tXM:i:0\tXO:i:0\tXG:i:0\tNM:i:0\tMD:Z:50\tYT:Z:UU\n" >> "$TMPDIR/example.sam"
+printf "read2\t0\tchr2\t150\t255\t50M\t*\t0\t0\tTGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC\tIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII\tAS:i:-8\tXN:i:0\tXM:i:0\tXO:i:0\tXG:i:0\tNM:i:0\tMD:Z:50\tYT:Z:UU\n" >> "$TMPDIR/example.sam"
+printf "read3\t16\tchr1\t200\t255\t50M\t*\t0\t0\tGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA\tIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII\tAS:i:-12\tXN:i:0\tXM:i:0\tXO:i:0\tXG:i:0\tNM:i:0\tMD:Z:50\tYT:Z:UU" >> "$TMPDIR/example.sam"
+
+cat > "$TMPDIR/expected_summary.txt" <<EOL
+PASS	Basic Statistics	input_1.fq
+PASS	Per base sequence quality	input_1.fq
+FAIL	Per sequence quality scores	input_1.fq
+FAIL	Per base sequence content	input_1.fq
+FAIL	Per sequence GC content	input_1.fq
+FAIL	Per base N content	input_1.fq
+PASS	Sequence Length Distribution	input_1.fq
+PASS	Sequence Duplication Levels	input_1.fq
+FAIL	Overrepresented sequences	input_1.fq
+PASS	Adapter Content	input_1.fq
+EOL
+
+cat > "$TMPDIR/expected_summary2.txt" <<EOL
+PASS	Basic Statistics	input_2.fq
+PASS	Per base sequence quality	input_2.fq
+FAIL	Per sequence quality scores	input_2.fq
+FAIL	Per base sequence content	input_2.fq
+FAIL	Per sequence GC content	input_2.fq
+FAIL	Per base N content	input_2.fq
+PASS	Sequence Length Distribution	input_2.fq
+PASS	Sequence Duplication Levels	input_2.fq
+FAIL	Overrepresented sequences	input_2.fq
+PASS	Adapter Content	input_2.fq
+EOL
+
+cat > "$TMPDIR/expected_summary_sam.txt" <<EOL
+PASS	Basic Statistics	example.sam
+PASS	Per base sequence quality	example.sam
+FAIL	Per sequence quality scores	example.sam
+FAIL	Per base sequence content	example.sam
+WARN	Per sequence GC content	example.sam
+PASS	Per base N content	example.sam
+WARN	Sequence Length Distribution	example.sam
+PASS	Sequence Duplication Levels	example.sam
+FAIL	Overrepresented sequences	example.sam
+PASS	Adapter Content	example.sam
+EOL
+
+# Test 1: Run fastqc with default parameters
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "-> Run Test1: one input"
+"$meta_executable" \
+  --input "../input_1.fq" \
+  --html "*_fastqc.html" \
+  --zip "*_fastqc.zip" \
+  --summary "*_summary.txt" \
+  --data "*_data.txt" \
+  --quiet \
+
+assert_file_exists "input_1_fastqc.html"
+assert_file_exists "input_1_fastqc.zip"
+assert_file_exists "input_1_summary.txt"
+assert_file_not_empty "input_1_fastqc.html"
+assert_file_not_empty "input_1_fastqc.zip"
+assert_identical_content "input_1_summary.txt" "../expected_summary.txt"
+echo "- test succeeded -"
+
+popd > /dev/null
+
+
+# Test 2: Run fastqc with multiple inputs
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "-> Run Test2: two inputs"
+"$meta_executable" \
+  --input "../input_1.fq" \
+  --input "../input_2.fq" \
+  --html "*_fastqc.html" \
+  --zip "*_fastqc.zip" \
+  --summary "*_summary.txt" \
+  --data "*_data.txt" \
+  --quiet \
+
+# File 1
+assert_file_exists "input_1_fastqc.html"
+assert_file_exists "input_1_fastqc.zip"
+assert_file_exists "input_1_summary.txt"
+assert_file_not_empty "input_1_fastqc.html"
+assert_file_not_empty "input_1_fastqc.zip"
+assert_identical_content "input_1_summary.txt" "../expected_summary.txt"
+# File 2
+assert_file_exists "input_2_fastqc.html"
+assert_file_exists "input_2_fastqc.zip"
+assert_file_exists "input_2_summary.txt"
+assert_file_not_empty "input_2_fastqc.html"
+assert_file_not_empty "input_2_fastqc.zip"
+assert_identical_content "input_2_summary.txt" "../expected_summary2.txt"
+echo "- test succeeded -"
+
+popd > /dev/null
+
+# Test 3: Run fastqc with contaminants
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "-> Run Test3: contaminants"
+"$meta_executable" \
+  --input "../input_1.fq" \
+  --contaminants "../contaminants.txt" \
+  --html "*_fastqc.html" \
+  --zip "*_fastqc.zip" \
+  --summary "*_summary.txt" \
+  --data "*_data.txt" \
+  --quiet \
+
+assert_file_exists "input_1_fastqc.html"
+assert_file_exists "input_1_fastqc.zip"
+assert_file_exists "input_1_summary.txt"
+assert_file_not_empty "input_1_fastqc.html"
+assert_file_not_empty "input_1_fastqc.zip"
+assert_identical_content "input_1_summary.txt" "../expected_summary.txt"
+assert_file_contains "input_1_data.txt" "contaminant"
+echo "- test succeeded -"
+
+popd > /dev/null
+
+# Test 4: Run fastqc with sam file
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "-> Run Test4: sam file"
+"$meta_executable" \
+  --input "../example.sam" \
+  --format "sam" \
+  --html "*_fastqc.html" \
+  --zip "*_fastqc.zip" \
+  --summary "*_summary.txt" \
+  --data "*_data.txt" \
+  --quiet \
+
+assert_file_exists "example_fastqc.html"
+assert_file_exists "example_fastqc.zip"
+assert_file_exists "example_summary.txt"
+assert_file_not_empty "example_fastqc.html"
+assert_file_not_empty "example_fastqc.zip"
+assert_identical_content "example_summary.txt" "../expected_summary_sam.txt"
+echo "- test succeeded -"
+
+popd > /dev/null
+
+# Test 5: Run fastqc with multiple options
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "-> Run Test5: multiple options"
+"$meta_executable" \
+  --input "../input_1.fq" \
+  --contaminants "../contaminants.txt" \
+  --format "fastq" \
+  --nofilter \
+  --nogroup \
+  --min_length 10 \
+  --kmers 5 \
+  --html "*_fastqc.html" \
+  --zip "*_fastqc.zip" \
+  --summary "*_summary.txt" \
+  --data "*_data.txt" \
+  --quiet \
+# --casava \
+
+assert_file_exists "input_1_fastqc.html"
+assert_file_exists "input_1_fastqc.zip"
+assert_file_exists "input_1_summary.txt"
+assert_file_not_empty "input_1_fastqc.html"
+assert_file_not_empty "input_1_fastqc.zip"
+assert_identical_content "input_1_summary.txt" "../expected_summary.txt"
+assert_file_contains "input_1_data.txt" "contaminant"
+echo "- test succeeded -"
+
+popd > /dev/null
+
+echo "All tests succeeded!"
+exit 0

From 99dec5923bfb3da165601a3f13502d498395b14d Mon Sep 17 00:00:00 2001
From: Toni Verbeiren <toni.verbeiren@gmail.com>
Date: Fri, 6 Sep 2024 23:46:11 +0200
Subject: [PATCH 09/42] Bedtools genomecov (#150 and #128)

* Initial commit

* Update config.vsh.yaml

* Update script.sh

* update on test.sh

* bug fixing

* adding ibam option tests

* depthzero and strand option tests

* 5prime and max tests

* more tests

* Changelog

* Update config.vsh.yaml

* Update config.vsh.yaml

* Update script.sh

* Update test.sh

* TMPDIR

* Unset Variables

* par_trackopts multiple: true

* Minor update to CHANGELOG

---------

Co-authored-by: tgaspe <theodorogtc@gmail.com>
---
 CHANGELOG.md                                  |   2 +
 .../bedtools_genomecov/config.vsh.yaml        | 208 +++++++++++
 src/bedtools/bedtools_genomecov/help.txt      | 101 ++++++
 src/bedtools/bedtools_genomecov/script.sh     |  55 +++
 src/bedtools/bedtools_genomecov/test.sh       | 333 ++++++++++++++++++
 .../bedtools_genomecov/test_data/example.bam  | Bin 0 -> 334 bytes
 6 files changed, 699 insertions(+)
 create mode 100644 src/bedtools/bedtools_genomecov/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_genomecov/help.txt
 create mode 100644 src/bedtools/bedtools_genomecov/script.sh
 create mode 100644 src/bedtools/bedtools_genomecov/test.sh
 create mode 100644 src/bedtools/bedtools_genomecov/test_data/example.bam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 98e78c17..8f772450 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -29,6 +29,7 @@
 * `bedtools`:
   - `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
+  - `bedtools/bedtools_genomecov`: Compute the coverage of a feature file (bed/gff/vcf/bam) among a genome (PR #128).
   - `bedtools/bedtools_groupby`: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).
   - `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
   - `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
@@ -45,6 +46,7 @@
 
 * `fastqc`: High throughput sequence quality control analysis tool (PR #92).
 
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/bedtools/bedtools_genomecov/config.vsh.yaml b/src/bedtools/bedtools_genomecov/config.vsh.yaml
new file mode 100644
index 00000000..775587de
--- /dev/null
+++ b/src/bedtools/bedtools_genomecov/config.vsh.yaml
@@ -0,0 +1,208 @@
+name: bedtools_genomecov
+namespace: bedtools
+description: |
+  Compute the coverage of a feature file among a genome.
+keywords: [genome coverage, BED, GFF, VCF, BAM]
+links:
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html
+  repository: https://github.com/arq5x/bedtools2
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        direction: input
+        description: |
+          The input file (BED/GFF/VCF) to be used.
+        example: input.bed
+      
+      - name: --input_bam
+        alternatives: -ibam
+        type: file
+        description: |
+          The input file is in BAM format.
+          Note: BAM _must_ be sorted by positions.
+          '--genome' option is ignored if you use '--input_bam' option!
+
+      - name: --genome
+        alternatives: -g
+        type: file
+        direction: input
+        description: |
+          The genome file to be used.
+        example: genome.txt
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        direction: output
+        description: | 
+          The output BED file. 
+        required: true
+        example: output.bed
+  
+  - name: Options
+    arguments:
+
+      - name: --depth
+        alternatives: -d
+        type: boolean_true
+        description: |
+          Report the depth at each genome position (with one-based coordinates).
+          Default behavior is to report a histogram.
+
+      - name: --depth_zero
+        alternatives: -dz
+        type: boolean_true
+        description: |
+          Report the depth at each genome position (with zero-based coordinates).
+          Reports only non-zero positions.
+          Default behavior is to report a histogram.
+
+      - name: --bed_graph
+        alternatives: -bg
+        type: boolean_true
+        description: |
+          Report depth in BedGraph format. For details, see:
+          genome.ucsc.edu/goldenPath/help/bedgraph.html
+
+      - name: --bed_graph_zero_coverage
+        alternatives: -bga
+        type: boolean_true
+        description: |
+          Report depth in BedGraph format, as above (-bg).
+          However with this option, regions with zero 
+          coverage are also reported. This allows one to
+          quickly extract all regions of a genome with 0 
+          coverage by applying: "grep -w 0$" to the output.
+
+      - name: --split
+        type: boolean_true
+        description: |
+          Treat "split" BAM or BED12 entries as distinct BED intervals.
+          when computing coverage.
+          For BAM files, this uses the CIGAR "N" and "D" operations 
+          to infer the blocks for computing coverage.
+          For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds
+          fields (i.e., columns 10,11,12).
+
+      - name: --ignore_deletion
+        alternatives: -ignoreD
+        type: boolean_true
+        description: |
+          Ignore local deletions (CIGAR "D" operations) in BAM entries
+          when computing coverage.
+
+      - name: --strand
+        type: string
+        choices: ["+", "-"]
+        description: |
+          Calculate coverage of intervals from a specific strand.
+          With BED files, requires at least 6 columns (strand is column 6). 
+
+      - name: --pair_end_coverage
+        alternatives: -pc
+        type: boolean_true
+        description: |
+          Calculate coverage of pair-end fragments.
+          Works for BAM files only
+
+      - name: --fragment_size
+        alternatives: -fs
+        type: boolean_true
+        description: |
+          Force to use provided fragment size instead of read length
+          Works for BAM files only
+
+      - name: --du
+        type: boolean_true
+        description: |
+          Change strand af the mate read (so both reads from the same strand) useful for strand specific
+          Works for BAM files only
+
+      - name: --five_prime
+        alternatives: -5
+        type: boolean_true
+        description: |
+          Calculate coverage of 5" positions (instead of entire interval).
+
+      - name: --three_prime
+        alternatives: -3
+        type: boolean_true
+        description: |
+          Calculate coverage of 3" positions (instead of entire interval).
+
+      - name: --max
+        type: integer
+        min: 0
+        description: |
+          Combine all positions with a depth >= max into
+          a single bin in the histogram. Irrelevant
+          for -d and -bedGraph
+          - (INTEGER)
+
+      - name: --scale
+        type: double
+        min: 0
+        description: |
+          Scale the coverage by a constant factor.
+          Each coverage value is multiplied by this factor before being reported.
+          Useful for normalizing coverage by, e.g., reads per million (RPM).
+          - Default is 1.0; i.e., unscaled.
+          - (FLOAT)
+
+      - name: --trackline
+        type: boolean_true
+        description: |
+          Adds a UCSC/Genome-Browser track line definition in the first line of the output.
+          - See here for more details about track line definition:
+                http://genome.ucsc.edu/goldenPath/help/bedgraph.html
+          - NOTE: When adding a trackline definition, the output BedGraph can be easily
+                uploaded to the Genome Browser as a custom track,
+                BUT CAN NOT be converted into a BigWig file (w/o removing the first line).
+      
+      - name: --trackopts
+        type: string
+        description: |
+          Writes additional track line definition parameters in the first line.
+          - Example:
+            -trackopts 'name="My Track" visibility=2 color=255,30,30'
+            Note the use of single-quotes if you have spaces in your parameters.
+          - (TEXT)
+        multiple: true
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/bedtools/bedtools_genomecov/help.txt b/src/bedtools/bedtools_genomecov/help.txt
new file mode 100644
index 00000000..f13a71d3
--- /dev/null
+++ b/src/bedtools/bedtools_genomecov/help.txt
@@ -0,0 +1,101 @@
+```bash
+bedtools genomecov
+```
+
+Tool:    bedtools genomecov (aka genomeCoverageBed)
+Version: v2.30.0
+Summary: Compute the coverage of a feature file among a genome.
+
+Usage: bedtools genomecov [OPTIONS] -i <bed/gff/vcf> -g <genome>
+
+Options: 
+	-ibam		The input file is in BAM format.
+			Note: BAM _must_ be sorted by position
+
+	-d		Report the depth at each genome position (with one-based coordinates).
+			Default behavior is to report a histogram.
+
+	-dz		Report the depth at each genome position (with zero-based coordinates).
+			Reports only non-zero positions.
+			Default behavior is to report a histogram.
+
+	-bg		Report depth in BedGraph format. For details, see:
+			genome.ucsc.edu/goldenPath/help/bedgraph.html
+
+	-bga		Report depth in BedGraph format, as above (-bg).
+			However with this option, regions with zero 
+			coverage are also reported. This allows one to
+			quickly extract all regions of a genome with 0 
+			coverage by applying: "grep -w 0$" to the output.
+
+	-split		Treat "split" BAM or BED12 entries as distinct BED intervals.
+			when computing coverage.
+			For BAM files, this uses the CIGAR "N" and "D" operations 
+			to infer the blocks for computing coverage.
+			For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds
+			fields (i.e., columns 10,11,12).
+
+	-ignoreD	Ignore local deletions (CIGAR "D" operations) in BAM entries
+			when computing coverage.
+
+	-strand		Calculate coverage of intervals from a specific strand.
+			With BED files, requires at least 6 columns (strand is column 6). 
+			- (STRING): can be + or -
+
+	-pc		Calculate coverage of pair-end fragments.
+			Works for BAM files only
+	-fs		Force to use provided fragment size instead of read length
+			Works for BAM files only
+	-du		Change strand af the mate read (so both reads from the same strand) useful for strand specific
+			Works for BAM files only
+	-5		Calculate coverage of 5" positions (instead of entire interval).
+
+	-3		Calculate coverage of 3" positions (instead of entire interval).
+
+	-max		Combine all positions with a depth >= max into
+			a single bin in the histogram. Irrelevant
+			for -d and -bedGraph
+			- (INTEGER)
+
+	-scale		Scale the coverage by a constant factor.
+			Each coverage value is multiplied by this factor before being reported.
+			Useful for normalizing coverage by, e.g., reads per million (RPM).
+			- Default is 1.0; i.e., unscaled.
+			- (FLOAT)
+
+	-trackline	Adds a UCSC/Genome-Browser track line definition in the first line of the output.
+			- See here for more details about track line definition:
+			      http://genome.ucsc.edu/goldenPath/help/bedgraph.html
+			- NOTE: When adding a trackline definition, the output BedGraph can be easily
+			      uploaded to the Genome Browser as a custom track,
+			      BUT CAN NOT be converted into a BigWig file (w/o removing the first line).
+
+	-trackopts	Writes additional track line definition parameters in the first line.
+			- Example:
+			   -trackopts 'name="My Track" visibility=2 color=255,30,30'
+			   Note the use of single-quotes if you have spaces in your parameters.
+			- (TEXT)
+
+Notes: 
+	(1) The genome file should tab delimited and structured as follows:
+	 <chromName><TAB><chromSize>
+
+	For example, Human (hg19):
+	chr1	249250621
+	chr2	243199373
+	...
+	chr18_gl000207_random	4262
+
+	(2) The input BED (-i) file must be grouped by chromosome.
+	 A simple "sort -k 1,1 <BED> > <BED>.sorted" will suffice.
+
+	(3) The input BAM (-ibam) file must be sorted by position.
+	 A "samtools sort <BAM>" should suffice.
+
+Tips: 
+	One can use the UCSC Genome Browser's MySQL database to extract
+	chromosome sizes. For example, H. sapiens:
+
+	mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
+	"select chrom, size from hg19.chromInfo" > hg19.genome
+
diff --git a/src/bedtools/bedtools_genomecov/script.sh b/src/bedtools/bedtools_genomecov/script.sh
new file mode 100644
index 00000000..20fbd968
--- /dev/null
+++ b/src/bedtools/bedtools_genomecov/script.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset variables
+unset_if_false=(
+    par_input_bam
+    par_depth
+    par_depth_zero
+    par_bed_graph
+    par_bed_graph_zero_coverage
+    par_split
+    par_ignore_deletion
+    par_pair_end_coverage
+    par_fragment_size
+    par_du
+    par_five_prime
+    par_three_prime
+    par_trackline
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Create input array 
+IFS=";" read -ra trackopts <<< $par_trackopts
+
+bedtools genomecov \
+    ${par_depth:+-d} \
+    ${par_depth_zero:+-dz} \
+    ${par_bed_graph:+-bg} \
+    ${par_bed_graph_zero_coverage:+-bga} \
+    ${par_split:+-split} \
+    ${par_ignore_deletion:+-ignoreD} \
+    ${par_du:+-du} \
+    ${par_five_prime:+-5} \
+    ${par_three_prime:+-3} \
+    ${par_trackline:+-trackline} \
+    ${par_strand:+-strand "$par_strand"} \
+    ${par_max:+-max "$par_max"} \
+    ${par_scale:+-scale "$par_scale"} \
+    ${par_trackopts:+-trackopts "${trackopts[*]}"} \
+    ${par_input_bam:+-ibam "$par_input_bam"} \
+    ${par_input:+-i "$par_input"} \
+    ${par_genome:+-g "$par_genome"} \
+    ${par_pair_end_coverage:+-pc} \
+    ${par_fragment_size:+-fs} \
+    > "$par_output"
+    
\ No newline at end of file
diff --git a/src/bedtools/bedtools_genomecov/test.sh b/src/bedtools/bedtools_genomecov/test.sh
new file mode 100644
index 00000000..7e4487da
--- /dev/null
+++ b/src/bedtools/bedtools_genomecov/test.sh
@@ -0,0 +1,333 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+## VIASH START
+meta_executable="target/executable/bedtools/bedtools_intersect/bedtools_intersect"
+meta_resources_dir="src/bedtools/bedtools_intersect"
+## VIASH END
+
+# directory of the bam file
+test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create and populate input files
+printf "chr1\t248956422\nchr2\t198295559\nchr3\t242193529\n" > "$TMPDIR/genome.txt"
+printf "chr2\t128\t228\tmy_read/1\t37\t+\nchr2\t428\t528\tmy_read/2\t37\t-\n" > "$TMPDIR/example.bed"
+printf "chr2\t128\t228\tmy_read/1\t60\t+\t128\t228\t255,0,0\t1\t100\t0\nchr2\t428\t528\tmy_read/2\t60\t-\t428\t528\t255,0,0\t1\t100\t0\n" > "$TMPDIR/example.bed12"
+printf "chr2\t100\t103\n" > "$TMPDIR/example_dz.bed"
+
+# expected outputs
+cat > "$TMPDIR/expected_default.bed" <<EOF
+chr2	0	198295359	198295559	0.999999
+chr2	1	200	198295559	1.0086e-06
+chr1	0	248956422	248956422	1
+chr3	0	242193529	242193529	1
+genome	0	689445310	689445510	1
+genome	1	200	689445510	2.90088e-07
+EOF
+cat > "$TMPDIR/expected_ibam.bed" <<EOF
+chr2:172936693-172938111	0	1218	1418	0.858956
+chr2:172936693-172938111	1	200	1418	0.141044
+genome	0	1218	1418	0.858956
+genome	1	200	1418	0.141044
+EOF
+cat > "$TMPDIR/expected_ibam_pc.bed" <<EOF
+chr2:172936693-172938111	0	1018	1418	0.717913
+chr2:172936693-172938111	1	400	1418	0.282087
+genome	0	1018	1418	0.717913
+genome	1	400	1418	0.282087
+EOF
+cat > "$TMPDIR/expected_ibam_fs.bed" <<EOF
+chr2:172936693-172938111	0	1218	1418	0.858956
+chr2:172936693-172938111	1	200	1418	0.141044
+genome	0	1218	1418	0.858956
+genome	1	200	1418	0.141044
+EOF
+cat > "$TMPDIR/expected_dz.bed" <<EOF
+chr2	100	1
+chr2	101	1
+chr2	102	1
+EOF
+cat > "$TMPDIR/expected_strand.bed" <<EOF
+chr2	0	198295459	198295559	1
+chr2	1	100	198295559	5.04298e-07
+chr1	0	248956422	248956422	1
+chr3	0	242193529	242193529	1
+genome	0	689445410	689445510	1
+genome	1	100	689445510	1.45044e-07
+EOF
+cat > "$TMPDIR/expected_5.bed" <<EOF
+chr2	0	198295557	198295559	1
+chr2	1	2	198295559	1.0086e-08
+chr1	0	248956422	248956422	1
+chr3	0	242193529	242193529	1
+genome	0	689445508	689445510	1
+genome	1	2	689445510	2.90088e-09
+EOF
+cat > "$TMPDIR/expected_bg_scale.bed" <<EOF
+chr2	128	228	100
+chr2	428	528	100
+EOF
+cat > "$TMPDIR/expected_trackopts.bed" <<EOF
+track type=bedGraph name=example llama=Alpaco
+chr2	128	228	1
+chr2	428	528	1
+EOF
+cat > "$TMPDIR/expected_split.bed" <<EOF
+chr2	0	198295359	198295559	0.999999
+chr2	1	200	198295559	1.0086e-06
+chr1	0	248956422	248956422	1
+chr3	0	242193529	242193529	1
+genome	0	689445310	689445510	1
+genome	1	200	689445510	2.90088e-07
+EOF
+cat > "$TMPDIR/expected_ignoreD_du.bed" <<EOF
+chr2:172936693-172938111	0	1218	1418	0.858956
+chr2:172936693-172938111	1	200	1418	0.141044
+genome	0	1218	1418	0.858956
+genome	1	200	1418	0.141044
+EOF
+
+# Test 1: 
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed"
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_default.bed"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: ibam option 
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools_genomecov on BAM file with -ibam"
+"$meta_executable" \
+  --input_bam "$test_data/example.bam" \
+  --output "output.bed" \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_ibam.bed"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: depth option
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -dz"
+"$meta_executable" \
+  --input "../example_dz.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --depth_zero
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_dz.bed"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: strand option
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -strand"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --strand "-" \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_strand.bed"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: 5' end option
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -5"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --five_prime \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_5.bed"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: max option
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -max"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --max 100 \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_default.bed"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: bedgraph and scale option
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -bg and -scale"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --bed_graph \
+  --scale 100 \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_bg_scale.bed"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: trackopts option
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bedtools_genomecov on BED file with -bg and -trackopts"
+"$meta_executable" \
+  --input "../example.bed" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --bed_graph \
+  --trackopts "name=example" \
+  --trackopts "llama=Alpaco" \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_trackopts.bed"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+# Test 9: ibam pc options
+mkdir "$TMPDIR/test9" && pushd "$TMPDIR/test9" > /dev/null
+
+echo "> Run bedtools_genomecov on BAM file with -ibam, -pc"
+"$meta_executable" \
+  --input_bam "$test_data/example.bam" \
+  --output "output.bed" \
+  --fragment_size \
+  --pair_end_coverage \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_ibam_pc.bed"
+echo "- test9 succeeded -"
+
+popd > /dev/null
+
+# Test 10: ibam fs options
+mkdir "$TMPDIR/test10" && pushd "$TMPDIR/test10" > /dev/null
+
+echo "> Run bedtools_genomecov on BAM file with -ibam, -fs"
+"$meta_executable" \
+  --input_bam "$test_data/example.bam" \
+  --output "output.bed" \
+  --fragment_size \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_ibam_fs.bed"
+echo "- test10 succeeded -"
+
+popd > /dev/null
+
+# Test 11: split 
+mkdir "$TMPDIR/test11" && pushd "$TMPDIR/test11" > /dev/null
+
+echo "> Run bedtools_genomecov on BED12 file with -split"
+"$meta_executable" \
+  --input "../example.bed12" \
+  --genome "../genome.txt" \
+  --output "output.bed" \
+  --split \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_split.bed"
+echo "- test11 succeeded -"
+
+popd > /dev/null
+
+# Test 12: ignore deletion and du
+mkdir "$TMPDIR/test12" && pushd "$TMPDIR/test12" > /dev/null
+
+echo "> Run bedtools_genomecov on BAM file with -ignoreD and -du"
+"$meta_executable" \
+  --input_bam "$test_data/example.bam" \
+  --output "output.bed" \
+  --ignore_deletion \
+  --du \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected_ignoreD_du.bed"
+echo "- test12 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
diff --git a/src/bedtools/bedtools_genomecov/test_data/example.bam b/src/bedtools/bedtools_genomecov/test_data/example.bam
new file mode 100644
index 0000000000000000000000000000000000000000..ffc075ab83a83a98ed1edbf88b26cc27ad8946c6
GIT binary patch
literal 334
zcmb2|=3rp}f&Xj_PR>jWAq>SuUsA6mBqS7Y@IB%Aw%O~PhS4S?6Z1_bX2zRMuCZ>`
z;o;@Ato^fw$CpQUheTtRYNNz-r#8JXHa3Ry>s4lk0?m>~GxQF_-U<7&m>dP#pU;|5
z)~CHK)-&PMX8(zQnRkjz7tzu&Q_9lpm^;_nXXDZb**~)OH9hZA+GbYw!F2!1eU^u&
z=6?J8db>^n+vnS58VqGOpQde!^LhT^FPno$sK1R;RVb&i_o5|>-LG;Kg??MHCx&;~
ziZww?R#r16X1LX_ZFYQZ=WBLl9Y-y@V*W$>-;Wo3eOwoN_@m-GsXhDI<L7O#D!rR*
zAbon#WHIeSf_p=Iw@vX;QcA7~nW$-K`s?tN6@ejF_co|EotL|LtN#%rrst#?n85)E
FA^@Dlf^q-=

literal 0
HcmV?d00001


From 9f813862592fb10f8d15df59697bdaae82c7921a Mon Sep 17 00:00:00 2001
From: emmarousseau <emmarou1@icloud.com>
Date: Mon, 9 Sep 2024 08:19:44 +0200
Subject: [PATCH 10/42] Fq subsample (#147)

---
 CHANGELOG.md                            |   2 +
 src/fq_subsample/config.vsh.yaml        |  68 ++++++++++++++++++++++++
 src/fq_subsample/help.txt               |  20 +++++++
 src/fq_subsample/script.sh              |  26 +++++++++
 src/fq_subsample/test.sh                |  36 +++++++++++++
 src/fq_subsample/test_data/a.3.fastq.gz | Bin 0 -> 292 bytes
 src/fq_subsample/test_data/a.4.fastq.gz | Bin 0 -> 301 bytes
 7 files changed, 152 insertions(+)
 create mode 100644 src/fq_subsample/config.vsh.yaml
 create mode 100644 src/fq_subsample/help.txt
 create mode 100755 src/fq_subsample/script.sh
 create mode 100644 src/fq_subsample/test.sh
 create mode 100644 src/fq_subsample/test_data/a.3.fastq.gz
 create mode 100644 src/fq_subsample/test_data/a.4.fastq.gz

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8f772450..6534eed1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -142,6 +142,8 @@
     - `bedtools_getfasta`: extract sequences from a FASTA file for each of the
                            intervals defined in a BED/GFF/VCF file (PR #59).
 
+*  `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
+
 ## MINOR CHANGES
 
 * Uniformize component metadata (PR #23).
diff --git a/src/fq_subsample/config.vsh.yaml b/src/fq_subsample/config.vsh.yaml
new file mode 100644
index 00000000..2455a341
--- /dev/null
+++ b/src/fq_subsample/config.vsh.yaml
@@ -0,0 +1,68 @@
+name: fq_subsample
+description: fq subsample outputs a subset of records from single or paired FASTQ files.
+keywords: [fastq, subsample, subset]
+links:
+  homepage: https://github.com/stjude-rust-labs/fq/blob/master/README.md
+  documentation: https://github.com/stjude-rust-labs/fq/blob/master/README.md
+  repository: https://github.com/stjude-rust-labs/fq
+license: MIT
+
+argument_groups: 
+- name: "Input"
+  arguments: 
+  - name: "--input_1"
+    type: file
+    required: true
+    description: First input fastq file to subsample. Accepts both raw and gzipped FASTQ inputs.
+  - name: "--input_2"
+    type: file
+    description: Second input fastq files to subsample. Accepts both raw and gzipped FASTQ inputs.
+
+- name: "Output"
+  arguments: 
+  - name: "--output_1"
+    type: file
+    direction: output
+    description: Sampled read 1 fastq files. Output will be gzipped if ends in `.gz`.
+  - name: "--output_2"
+    type: file
+    direction: output
+    description: Sampled read 2 fastq files. Output will be gzipped if ends in `.gz`.
+
+- name: "Options"
+  arguments: 
+  - name: "--probability"
+    type: double
+    description: The probability a record is kept, as a percentage (0.0, 1.0). Cannot be used with `record-count`
+  - name: "--record_count"
+    type: integer
+    description: The exact number of records to keep. Cannot be used with `probability`
+  - name: "--seed"
+    type: integer
+    description: Seed to use for the random number generator
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:  
+  - type: docker
+    image: rust:1.81-slim
+    setup:
+      - type: docker
+        run: |
+          apt-get update && apt-get install -y git procps && \
+          git clone --depth 1 --branch v0.12.0 https://github.com/stjude-rust-labs/fq.git && \
+          cd fq && \
+          cargo install --locked --path . && \
+          mv target/release/fq /usr/local/bin/ && \
+          cd / && rm -rf /fq
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/fq_subsample/help.txt b/src/fq_subsample/help.txt
new file mode 100644
index 00000000..6f4a9acf
--- /dev/null
+++ b/src/fq_subsample/help.txt
@@ -0,0 +1,20 @@
+```
+fq subsample -h
+```
+
+Outputs a subset of records
+
+Usage: fq subsample [OPTIONS] --r1-dst <R1_DST> <--probability <PROBABILITY>|--record-count <RECORD_COUNT>> <R1_SRC> [R2_SRC]
+
+Arguments:
+  <R1_SRC>  Read 1 source. Accepts both raw and gzipped FASTQ inputs
+  [R2_SRC]  Read 2 source. Accepts both raw and gzipped FASTQ inputs
+
+Options:
+  -p, --probability <PROBABILITY>    The probability a record is kept, as a percentage (0.0, 1.0). Cannot be used with `record-count`
+  -n, --record-count <RECORD_COUNT>  The exact number of records to keep. Cannot be used with `probability`
+  -s, --seed <SEED>                  Seed to use for the random number generator
+      --r1-dst <R1_DST>              Read 1 destination. Output will be gzipped if ends in `.gz`
+      --r2-dst <R2_DST>              Read 2 destination. Output will be gzipped if ends in `.gz`
+  -h, --help                         Print help
+  -V, --version   
\ No newline at end of file
diff --git a/src/fq_subsample/script.sh b/src/fq_subsample/script.sh
new file mode 100755
index 00000000..bcc81b40
--- /dev/null
+++ b/src/fq_subsample/script.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+
+required_args=("-p" "--probability" "-n" "--record_count")
+
+# exclusive OR for required arguments $par_probability and $par_record_count
+if [[ -n $par_probability && -n $par_record_count ]] || [[ -z $par_probability && -z $par_record_count ]]; then
+    echo "FQ/SUBSAMPLE requires either --probability or --record_count to be specified"
+    exit 1
+fi
+
+
+fq subsample \
+    ${par_output_1:+--r1-dst "${par_output_1}"} \
+    ${par_output_2:+--r2-dst "${par_output_2}"} \
+    ${par_probability:+--probability "${par_probability}"} \
+    ${par_record_count:+--record-count "${par_record_count}"} \
+    ${par_seed:+--seed "${par_seed}"} \
+    ${par_input_1} \
+    ${par_input_2}
+
diff --git a/src/fq_subsample/test.sh b/src/fq_subsample/test.sh
new file mode 100644
index 00000000..1de48e95
--- /dev/null
+++ b/src/fq_subsample/test.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+echo ">>> Testing $meta_executable"
+
+echo ">>> Testing for paired-end reads"
+"$meta_executable" \
+    --input_1 $meta_resources_dir/test_data/a.3.fastq.gz \
+    --input_2 $meta_resources_dir/test_data/a.4.fastq.gz \
+    --record_count 3 \
+    --seed 1 \
+    --output_1  a.1.subsampled.fastq \
+    --output_2  a.2.subsampled.fastq 
+
+echo ">> Checking if the correct files are present"
+[ ! -f "a.1.subsampled.fastq" ] && echo "Subsampled FASTQ file for read 1 is missing!" && exit 1
+[ $(wc -l < a.1.subsampled.fastq) -ne 12 ] && echo "Subsampled FASTQ file for read 1 does not contain the expected number of records" && exit 1
+[ ! -f "a.2.subsampled.fastq" ] && echo "Subsampled FASTQ file for read 2 is missing" && exit 1
+[ $(wc -l < a.2.subsampled.fastq) -ne 12 ] && echo "Subsampled FASTQ file for read 2 does not contain the expected number of records" && exit 1
+
+rm a.1.subsampled.fastq a.2.subsampled.fastq
+
+echo ">>> Testing for single-end reads"
+"$meta_executable" \
+    --input_1 $meta_resources_dir/test_data/a.3.fastq.gz \
+    --record_count 3 \
+    --seed 1 \
+    --output_1  a.1.subsampled.fastq 
+
+    
+echo ">> Checking if the correct files are present"
+[ ! -f "a.1.subsampled.fastq" ] && echo "Subsampled FASTQ file is missing" && exit 1
+[ $(wc -l < a.1.subsampled.fastq) -ne 12 ] && echo "Subsampled FASTQ file does not contain the expected number of records" && exit 1
+
+echo ">>> Tests finished successfully"
+exit 0
+
diff --git a/src/fq_subsample/test_data/a.3.fastq.gz b/src/fq_subsample/test_data/a.3.fastq.gz
new file mode 100644
index 0000000000000000000000000000000000000000..3e38d06dc5213e2b60cf8feab54214ef6ae72095
GIT binary patch
literal 292
zcmV+<0o(o`iwFopgw<vM17R*RE@okKba4QslEF>{Aq<A^JOy_XfnjDC42Oj#_P+K7
zHqnFG#DnqR!H2hn!MH9aCN@m}7D)O{%i-a8T>TPku(#7<U6?kfdtjVCzn;!dFL!UJ
z_vgotIr>Qf5avvPnFZM+Apx**Oa!?txr3QV-KZjV->mD<O8wN~8`Bz`W!1Y5iLjO<
zLIINM*mr<9A<B{n6%L60U>8KoHv7#-z3Z4r{+?NKtIv%ALBp1?Z)Zpi_@6{>nZp{l
zpnEW6Vuo5f3k)_CS+Enz@c0PCa|geeO1hrf6{+00VDfRbahk2}!7qJ+#!w%N%nT$&
q=?+h;4lV-+mG4KNiZN0-nItN^Op$ogq>5BhG3pb)YY7nG0ssKu1BeL#

literal 0
HcmV?d00001

diff --git a/src/fq_subsample/test_data/a.4.fastq.gz b/src/fq_subsample/test_data/a.4.fastq.gz
new file mode 100644
index 0000000000000000000000000000000000000000..3164c6148650e36532545b7946efa9a16055db5d
GIT binary patch
literal 301
zcmV+|0n+{-iwFpdgVkmL17R*SE@okKba4QclD}%iFbs$HJcai{?fi98G@K%^_p4su
zpdHG=4W&beK793aE?fgCy*k8_6@xxL<?wtw4s9Pp43AA8>e!66TNB^7^ZV)idU^Ud
zeZIYXbyM3^!osXsSp}(z+L2i-5vyb?=8QX%SyifsYQ{=`tlNd^@PlcHb+G8JahHhM
z8WtRM#0J7;1BHM|@mewSy+pUQA?nAj9oxxW<1dbKR<vV0Mu37zgC;w2DdM>7_YiGA
zZiwo>i^DWVw<i0RKm!dJ8f*w+si@J~;IfF#H47|dWAgv8G;f*OX$Vu(1-B;qOfZ7~
z1|!pgQF7Qb(8XN&m&$O9Cr>k#hC~KOM9F)iI44T9dZJXoXJ35-@igEx-~s>uZ@7@Z

literal 0
HcmV?d00001


From 320d044fe45e565fbc9772640ebf6f39c5584b4a Mon Sep 17 00:00:00 2001
From: emmarousseau <emmarou1@icloud.com>
Date: Mon, 9 Sep 2024 08:49:14 +0200
Subject: [PATCH 11/42] Sortmerna (#146)

---
 CHANGELOG.md                              |   3 +
 src/sortmerna/config.vsh.yaml             | 290 ++++++++++++++++++++
 src/sortmerna/help.txt                    | 319 ++++++++++++++++++++++
 src/sortmerna/script.sh                   | 108 ++++++++
 src/sortmerna/test.sh                     | 101 +++++++
 src/sortmerna/test_data/rRNA/database1.fa |  24 ++
 src/sortmerna/test_data/rRNA/database2.fa |  16 ++
 src/sortmerna/test_data/reads_1.fq.gz     | Bin 0 -> 189 bytes
 src/sortmerna/test_data/reads_2.fq.gz     | Bin 0 -> 147 bytes
 src/sortmerna/test_data/script.sh         |   8 +
 10 files changed, 869 insertions(+)
 create mode 100644 src/sortmerna/config.vsh.yaml
 create mode 100644 src/sortmerna/help.txt
 create mode 100755 src/sortmerna/script.sh
 create mode 100644 src/sortmerna/test.sh
 create mode 100644 src/sortmerna/test_data/rRNA/database1.fa
 create mode 100644 src/sortmerna/test_data/rRNA/database2.fa
 create mode 100644 src/sortmerna/test_data/reads_1.fq.gz
 create mode 100644 src/sortmerna/test_data/reads_2.fq.gz
 create mode 100755 src/sortmerna/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6534eed1..5041f082 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -142,6 +142,9 @@
     - `bedtools_getfasta`: extract sequences from a FASTA file for each of the
                            intervals defined in a BED/GFF/VCF file (PR #59).
 
+* `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic 
+               data. (PR #146)
+
 *  `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
 
 ## MINOR CHANGES
diff --git a/src/sortmerna/config.vsh.yaml b/src/sortmerna/config.vsh.yaml
new file mode 100644
index 00000000..6477660f
--- /dev/null
+++ b/src/sortmerna/config.vsh.yaml
@@ -0,0 +1,290 @@
+name: sortmerna
+description: | 
+  Local sequence alignment tool for filtering, mapping and clustering. The main 
+  application of SortMeRNA is filtering rRNA from metatranscriptomic data.
+keywords: [sort, mRNA, rRNA, alignment, filtering, mapping, clustering]
+links:
+  homepage: https://sortmerna.readthedocs.io/en/latest/
+  documentation: https://sortmerna.readthedocs.io/en/latest/manual4.0.html
+  repository: https://github.com/sortmerna/sortmerna
+references: 
+  doi: 10.1093/bioinformatics/bts611
+license: GPL-3.0
+
+argument_groups:
+- name: "Input"
+  arguments: 
+  - name: "--paired"
+    type: boolean_true
+    description: |
+      Reads are paired-end. If a single reads file is provided, use this option 
+      to indicate the file contains interleaved paired reads when neither
+      'paired_in' | 'paired_out' | 'out2' | 'sout' are specified.
+  - name: "--input"
+    type: file
+    multiple: true
+    description: Input fastq
+  - name: "--ref"
+    type: file
+    multiple: true
+    description: Reference fasta file(s) for rRNA database.
+  - name: "--ribo_database_manifest"
+    type: file
+    description: Text file containing paths to fasta files (one per line) that will be used to create the database for SortMeRNA.
+
+- name: "Output"
+  arguments:     
+  - name: "--log"
+    type: file
+    direction: output
+    must_exist: false
+    example: $id.sortmerna.log
+    description: Sortmerna log file.
+  - name: "--output"
+    alternatives: ["--aligned"]
+    type: string
+    description: |
+      Directory and file prefix for aligned output. The appropriate extension: 
+      (fasta|fastq|blast|sam|etc) is automatically added.
+      If 'dir' is not specified, the output is created in the WORKDIR/out/.
+      If 'pfx' is not specified, the prefix 'aligned' is used.
+  - name: "--other"
+    type: string
+    description: Create Non-aligned reads output file with this path/prefix. Must be used with fastx. 
+
+- name: "Options"
+  arguments:
+  - name: "--kvdb"
+    type: string
+    description: Path to directory of the key-value database file, used for storing the alignment results.
+  - name: "--idx_dir"
+    type: string
+    description: Path to the directory for storing the reference index files.
+  - name: "--readb"
+    type: string
+    description: Path to the directory for storing pre-processed reads.
+  - name: "--fastx"
+    type: boolean_true
+    description: Output aligned reads into FASTA/FASTQ file
+  - name: "--sam"
+    type: boolean_true
+    description: Output SAM alignment for aligned reads.
+  - name: "--sq"
+    type: boolean_true
+    description: Add SQ tags to the SAM file
+  - name: "--blast"
+    type: string
+    description: | 
+      Blast options:
+      * '0'                    - pairwise
+      * '1'                    - tabular(Blast - m 8 format)
+      * '1 cigar'              - tabular + column for CIGAR
+      * '1 cigar qcov'         - tabular + columns for CIGAR and query coverage
+      * '1 cigar qcov qstrand' - tabular + columns for CIGAR, query coverage and strand
+    choices: ['0', '1', '1 cigar', '1 cigar qcov', '1 cigar qcov qstrand']
+  - name: "--num_alignments"
+    type: integer
+    description: |
+      Report first INT alignments per read reaching E-value. If Int = 0, all alignments will be output. Default: '0'
+    example: 0
+  - name: "--min_lis"
+    type: integer
+    description: |
+      search all alignments having the first INT longest LIS. LIS stands for Longest Increasing Subsequence, it is
+      computed using seeds’ positions to expand hits into longer matches prior to Smith-Waterman alignment. Default: '2'.
+    example: 2
+  - name: "--print_all_reads"
+    type: boolean_true
+    description: output null alignment strings for non-aligned reads to SAM and/or BLAST tabular files.
+  - name: "--paired_in"
+    type: boolean_true
+    description: |
+      In the case where a pair of reads is aligned with a score above the threshold, the output of the reads is controlled
+      by the following options:
+      * --paired_in and --paired_out are both false: Only one read per pair is output to the aligned fasta file.
+      * --paired_in is true and --paired_out is false: Both reads of the pair are output to the aligned fasta file.
+      * --paired_in is false and --paired_out is true: Both reads are output the the other fasta file (if it is specified).
+  - name: "--paired_out"
+    type: boolean_true
+    description: See description of --paired_in.
+  - name: "--out2"
+    type: boolean_true
+    description: |
+      Output paired reads into separate files. Must be used with '--fastx'. If a single reads file is provided, this options
+      implies interleaved paired reads. When used with 'sout', four (4) output files for aligned reads will be generated:
+      'aligned-paired-fwd, aligned-paired-rev, aligned-singleton-fwd, aligned-singleton-rev'. If 'other' option is also used,
+      eight (8) output files will be generated.
+  - name: "--sout"
+    type: boolean_true
+    description: |
+      Separate paired and singleton aligned reads. Must be used with '--fastx'. If a single reads file is provided,
+      this options implies interleaved paired reads. Cannot be used with '--paired_in' or '--paired_out'.
+  - name: "--zip_out"
+    type: string
+    description: |
+      Compress the output files. The possible values are: 
+      * '1/true/t/yes/y'
+      * '0/false/f/no/n'
+      *'-1' (the same format as input - default)
+      The values are Not case sensitive.
+    choices: ['1', 'true', 't', 'yes', 'y', '0', 'false', 'f', 'no', 'n', '-1']
+    example: "-1"
+  - name: "--match"
+    type: integer
+    description: |
+      Smith-Waterman score for a match (positive integer). Default: '2'.
+    example: 2
+  - name: "--mismatch"
+    type: integer
+    description: |
+      Smith-Waterman penalty for a mismatch (negative integer). Default: '-3'.
+    example: -3
+  - name: "--gap_open"
+    type: integer
+    description: |
+      Smith-Waterman penalty for introducing a gap (positive integer). Default: '5'.
+    example: 5
+  - name: "--gap_ext"
+    type: integer
+    description: |
+      Smith-Waterman penalty for extending a gap (positive integer). Default: '2'.
+    example: 2
+  - name: "--N"
+    type: integer
+    description: |
+      Smith-Waterman penalty for ambiguous letters (N’s) scored as --mismatch. Default: '-1'.\
+    example: -1
+  - name: "--a"
+    type: integer
+    description: |
+      Number of threads to use. Default: '1'.
+    example: 1
+  - name: "--e"
+    type: double
+    description: |
+      E-value threshold. Default: '1'.
+    example: 1
+  - name: "--F"
+    type: boolean_true
+    description: Search only the forward strand.
+  - name: "--R"
+    type: boolean_true
+    description: Search only the reverse-complementary strand.
+  - name: "--num_alignment"
+    type: integer
+    description: |
+       Report first INT alignments per read reaching E-value (--num_alignments 0 signifies all alignments will be output).
+       Default: '-1'
+    example: -1
+  - name: "--best"
+    type: integer
+    description: |
+      Report INT best alignments per read reaching E-value by searching --min_lis INT candidate alignments (--best 0
+      signifies all candidate alignments will be searched) Default: '1'.
+    example: 1
+  - name: "--verbose"
+    alternatives: ["-v"]
+    type: boolean_true
+    description: Verbose output.
+
+- name: "OTU picking options"
+  arguments:
+    - name: "--id"
+      type: double
+      description: |
+        %id similarity threshold (the alignment must still pass the E-value threshold). Default: '0.97'.
+      example: 0.97
+    - name: "--coverage"
+      type: double
+      description: |
+        %query coverage threshold (the alignment must still pass the E-value threshold). Default: '0.97'.
+      example: 0.97
+    - name: "--de_novo"
+      type: boolean_true
+      description: |
+        FASTA/FASTQ file for reads matching database < %id off (set using --id) and < %cov (set using --coverage)
+        (alignment must still pass the E-value threshold).
+    - name: "--otu_map"
+      type: boolean_true
+      description: |
+        Output OTU map (input to QIIME’s make_otu_table.py).
+
+- name: "Advanced options"
+  arguments:
+  - name: "--num_seed"
+    type: integer
+    description: |
+      Number of seeds matched before searching for candidate LIS. Default: '2'.
+    example: 2
+  - name: "--passes"
+    type: integer
+    multiple: true
+    description: |
+      Three intervals at which to place the seed on the read L,L/2,3 (L is the seed length set in ./indexdb_rna).
+  - name: "--edge"
+    type: string
+    description: |
+      The number (or percentage if followed by %) of nucleotides to add to each edge of the alignment region on the
+      reference sequence before performing Smith-Waterman alignment. Default: '4'.
+    example: 4
+  - name: "--full_search"
+    type: boolean_true
+    description: |
+      Search for all 0-error and 1-error seed off matches in the index rather than stopping after finding a 0-error match
+      (<1% gain in sensitivity with up four-fold decrease in speed).
+
+- name: "Indexing Options"
+  arguments:
+  - name: "--index"
+    type: integer
+    description: |
+      Create index files for the reference database. By default when this option is not used, the program checks the
+      reference index and builds it if not already existing.
+      This can be changed by using '-index' as follows:
+      * '-index 0' - skip indexing. If the index does not exist, the program will terminate
+                              and warn to build the index prior performing the alignment
+      * '-index 1' - only perform the indexing and terminate
+      * '-index 2' - the default behaviour, the same as when not using this option at all
+    example: 2
+    choices: [0, 1, 2]
+  - name: "-L"
+    type: double
+    description: |
+      Indexing seed length. Default: '18'
+    example: 18
+  - name: "--interval"
+    type: integer
+    description: |
+      Index every Nth L-mer in the reference database. Default: '1'
+    example: 1
+  - name: "--max_pos"
+    type: integer
+    description: |
+      Maximum number of positions to store for each unique L-mer. Set to 0 to store all positions. Default: '1000'
+    example: 1000
+  
+  
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+  
+engines:
+- type: docker
+  image: ubuntu:22.04
+  setup: 
+    - type: docker
+      run: |
+        apt-get update && \
+        apt-get install -y --no-install-recommends gzip cmake g++ wget && \
+        apt-get clean && \
+        wget --no-check-certificate https://github.com/sortmerna/sortmerna/releases/download/v4.3.6/sortmerna-4.3.6-Linux.sh && \
+        bash sortmerna-4.3.6-Linux.sh --skip-license
+runners: 
+- type: executable
+- type: nextflow 
\ No newline at end of file
diff --git a/src/sortmerna/help.txt b/src/sortmerna/help.txt
new file mode 100644
index 00000000..f0842707
--- /dev/null
+++ b/src/sortmerna/help.txt
@@ -0,0 +1,319 @@
+```
+sortmerna -h
+```
+
+
+  Program:      SortMeRNA version 4.3.6
+  Copyright:    2016-2020 Clarity Genomics BVBA:
+                Turnhoutseweg 30, 2340 Beerse, Belgium
+                2014-2016 Knight Lab:
+                Department of Pediatrics, UCSD, La Jolla
+                2012-2014 Bonsai Bioinformatics Research Group:
+                LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
+  Disclaimer:   SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
+                implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+                See the GNU Lesser General Public License for more details.
+  Contributors: Jenya Kopylova   jenya.kopylov@gmail.com
+                Laurent Noé      laurent.noe@lifl.fr
+                Pierre Pericard  pierre.pericard@lifl.fr
+                Daniel McDonald  wasade@gmail.com
+                Mikaël Salson    mikael.salson@lifl.fr
+                Hélène Touzet    helene.touzet@lifl.fr
+                Rob Knight       robknight@ucsd.edu
+
+  Usage:   sortmerna -ref FILE [-ref FILE] -reads FWD_READS [-reads REV_READS] [OPTIONS]:
+  -------------------------------------------------------------------------------------------------------------
+  | option            type-format           description                                          default      |
+  -------------------------------------------------------------------------------------------------------------
+
+    [REQUIRED]
+    --ref             PATH        Required  Reference file (FASTA) absolute or relative path.
+
+       Use mutliple times, once per a reference file
+
+
+    --reads           PATH        Required  Raw reads file (FASTA/FASTQ/FASTA.GZ/FASTQ.GZ).
+
+       Use twice for files with paired reads.
+       The file extensions are Not important. The program automatically
+       recognizes the file format as flat/compressed, fasta/fastq
+
+
+
+    [COMMON]
+    --workdir         PATH        Optional  Workspace directory                         USRDIR/sortmerna/run/
+
+       Default structure: WORKDIR/
+                              idx/   (References index)
+                              kvdb/  (Key-value storage for alignments)
+                              out/   (processing output)
+                              readb/ (pre-processed reads/index)
+
+
+    --kvdb            PATH        Optional  Directory for Key-value database            WORKDIR/kvdb
+
+       KVDB is used for storing the alignment results.
+
+
+    --idx-dir         PATH        Optional  Directory for storing Reference index.      WORKDIR/idx
+
+
+    --readb           PATH        Optional  Storage for pre-processed reads             WORKDIR/readb/
+
+       Directory storing the split reads, or the random access index of compressed reads
+
+
+    --fastx           BOOL        Optional  Output aligned reads into FASTA/FASTQ file
+    --sam             BOOL        Optional  Output SAM alignment for aligned reads.
+
+
+    --SQ              BOOL        Optional  Add SQ tags to the SAM file
+
+
+    --blast           STR         Optional  output alignments in various Blast-like formats
+
+       Sample values: '0'                    - pairwise
+                      '1'                    - tabular (Blast - m 8 format)
+                      '1 cigar'              - tabular + column for CIGAR
+                      '1 cigar qcov'         - tabular + columns for CIGAR and query coverage
+                      '1 cigar qcov qstrand' - tabular + columns for CIGAR, query coverage,
+                                               and strand
+
+
+    --aligned         STR/BOOL    Optional  Aligned reads file prefix [dir/][pfx]       WORKDIR/out/aligned
+
+       Directory and file prefix for aligned output i.e. each
+       output file goes into the specified directory with the given prefix.
+       The appropriate extension: (fasta|fastq|blast|sam|etc) is automatically added.
+       Both 'dir' and 'pfx' are optional.
+       The 'dir' can be a relative or an absolute path.
+       If 'dir' is not specified, the output is created in the WORKDIR/out/
+       If 'pfx' is not specified, the prefix 'aligned' is used
+       Examples:
+       '-aligned $MYDIR/dir_1/dir_2/1' -> $MYDIR/dir_1/dir_2/1.fasta
+       '-aligned dir_1/apfx'           -> $PWD/dir_1/apfx.fasta
+       '-aligned dir_1/'               -> $PWD/aligned.fasta
+       '-aligned apfx'                 -> $PWD/apfx.fasta
+       '-aligned  (no argument)'       -> WORKDIR/out/aligned.fasta
+
+
+    --other           STR/BOOL    Optional  Non-aligned reads file prefix [dir/][pfx]   WORKDIR/out/other
+
+       Directory and file prefix for non-aligned output i.e. each
+       output file goes into the specified directory with the given prefix.
+       The appropriate extension: (fasta|fastq|blast|sam|etc) is automatically added.
+       Must be used with 'fastx'.
+       Both 'dir' and 'pfx' are optional.
+       The 'dir' can be a relative or an absolute path.
+       If 'dir' is not specified, the output is created in the WORKDIR/out/
+       If 'pfx' is not specified, the prefix 'other' is used
+       Examples:
+       '-other $MYDIR/dir_1/dir_2/1' -> $MYDIR/dir_1/dir_2/1.fasta
+       '-other dir_1/apfx'           -> $PWD/dir_1/apfx.fasta
+       '-other dir_1/'               -> $PWD/dir_1/other.fasta
+       '-other apfx'                 -> $PWD/apfx.fasta
+       '-other  (no argument)'       -> aligned_out/other.fasta
+                                        i.e. the same output directory
+                                        as used for aligned output
+
+
+    --num_alignments  INT         Optional  Positive integer (INT >=0).
+
+       If used with '-no-best' reports first INT alignments per read reaching
+       E-value threshold, which allows to lower the CPU time and memory use.
+       Otherwise outputs INT best alignments.
+       If INT = 0, all alignments are output
+
+
+    --no-best         BOOL        Optional  Disable best alignments search                          False
+
+       The 'best' alignment is the highest scoring alignment out of All alignments of a read,
+       and the read can potentially be aligned (reaching E-value threshold) to multiple reference
+       sequences.
+       By default the program searches for best alignments i.e. performs an exhaustive search
+       over all references. Using '-no-best' will make the program to search just
+       the first N alignments, where N is set using '-num_alignments' i.e. 1 by default.
+
+
+    --min_lis         INT         Optional  Search only alignments that have the LIS                2
+                                            of at least N seeds long
+
+       LIS stands for Longest Increasing Subsequence. It is computed using seeds, which
+       are k-mers common to the read and the reference sequence. Sorted sequences of such seeds
+       are used to filter the candidate references prior performing the Smith-Waterman alignment.
+
+
+    --print_all_reads BOOL        Optional  Output null alignment strings for non-aligned reads     False
+                                            to SAM and/or BLAST tabular files
+
+    --paired          BOOL        Optional  Flags paired reads                                      False
+
+        If a single reads file is provided, use this option to indicate
+        the file contains interleaved paired reads when neither
+        'paired_in' | 'paired_out' | 'out2' | 'sout' are specified.
+
+
+    --paired_in       BOOL        Optional  Flags the paired-end reads as Aligned,                  False
+                                            when either of them is Aligned.
+
+        With this option both reads are output into Aligned FASTA/Q file
+        Must be used with 'fastx'.
+        Mutually exclusive with 'paired_out'.
+
+
+    --paired_out      BOOL        Optional  Flags the paired-end reads as Non-aligned,              False
+                                            when either of them is non-aligned.
+
+        With this option both reads are output into Non-Aligned FASTA/Q file
+        Must be used with 'fastx'.
+        Mutually exclusive with 'paired_in'.
+
+
+    --out2            BOOL        Optional  Output paired reads into separate files.                False
+
+       Must be used with 'fastx'.
+       If a single reads file is provided, this options implies interleaved paired reads
+       When used with 'sout', four (4) output files for aligned reads will be generated:
+       'aligned-paired-fwd, aligned-paired-rev, aligned-singleton-fwd, aligned-singleton-rev'.
+       If 'other' option is also used, eight (8) output files will be generated.
+
+
+    --sout            BOOL        Optional  Separate paired and singleton aligned reads.            False
+
+       To be used with 'fastx'.
+       If a single reads file is provided, this options implies interleaved paired reads
+       Cannot be used with 'paired_in' | 'paired_out'
+
+
+    --zip-out         STR/BOOL    Optional  Controls the output compression                        '-1'
+
+       By default the report files are produced in the same format as the input i.e.
+       if the reads files are compressed (gz), the output is also compressed.
+       The default behaviour can be overriden by using '-zip-out'.
+       The possible values: '1/true/t/yes/y'
+                            '0/false/f/no/n'
+                            '-1' (the same format as input - default)
+       The values are Not case sensitive i.e. 'Yes, YES, yEs, Y, y' are all OK
+       Examples:
+       '-reads freads.gz -zip-out n' : generate flat output when the input is compressed
+       '-reads freads.flat -zip-out' : compress the output when the input files are flat
+
+
+    --match           INT         Optional  SW score (positive integer) for a match.                2
+
+    --mismatch        INT         Optional  SW penalty (negative integer) for a mismatch.          -3
+
+    --gap_open        INT         Optional  SW penalty (positive integer) for introducing a gap.    5
+
+    --gap_ext         INT         Optional  SW penalty (positive integer) for extending a gap.      2
+
+    -e                DOUBLE      Optional  E-value threshold.                                      1
+
+       Defines the 'statistical significance' of a local alignment.
+       Exponentially correllates with the Minimal Alignment score.
+       Higher E-values (100, 1000, ...) cause More reads to Pass the alignment threshold
+
+
+    -F                BOOL        Optional  Search only the forward strand.                         False
+
+    -N                BOOL        Optional  SW penalty for ambiguous letters (N's) scored
+                                            as --mismatch
+
+    -R                BOOL        Optional  Search only the reverse-complementary strand.           False
+
+
+    [OTU_PICKING]
+    --id              INT         Optional  %%id similarity threshold (the alignment                0.97
+                                            must still pass the E-value threshold).
+
+    --coverage        INT         Optional  %%query coverage threshold (the alignment must          0.97
+                                            still pass the E-value threshold)
+
+    --de_novo_otu     BOOL        Optional  Output FASTA file with 'de novo' reads                  False
+
+       Read is 'de novo' if its alignment score passes E-value threshold, but both the identity
+       '-id', and the '-coverage' are below their corresponding thresholds
+       i.e. ID < %%id and COV < %%cov
+
+
+    --otu_map         BOOL        Optional  Output OTU map (input to QIIME's make_otu_table.py).    False
+                                            Cannot be used with 'no-best because
+                                            the grouping is done around the best alignment'
+
+
+    [ADVANCED]
+    --passes          INT,INT,INT Optional  Three intervals at which to place the seed on           L,L/2,3
+                                             the read (L is the seed length)
+
+    --edges           INT         Optional  Number (or percent if INT followed by %% sign) of       4
+                                            nucleotides to add to each edge of the read
+                                            prior to SW local alignment
+
+    --num_seeds       BOOL        Optional  Number of seeds matched before searching                2
+                                            for candidate LIS
+
+    --full_search     INT         Optional  Search for all 0-error and 1-error seed                 False
+                                            matches in the index rather than stopping
+                                            after finding a 0-error match (<1%% gain in
+                                            sensitivity with up four-fold decrease in speed)
+
+    --pid             BOOL        Optional  Add pid to output file names.                           False
+
+    -a                INT         Optional  DEPRECATED in favour of '-threads'. Number of           numCores
+                                            processing threads to use.
+                                            Automatically redirects to '-threads'
+
+    --threads         INT         Optional  Number of Processing threads to use                     2
+
+
+    [INDEXING]
+    --index           INT         Optional  Build reference database index                          2
+
+       By default when this option is not used, the program checks the reference index and
+       builds it if not already existing.
+       This can be changed by using '-index' as follows:
+       '-index 0' - skip indexing. If the index does not exist, the program will terminate
+                                and warn to build the index prior performing the alignment
+       '-index 1' - only perform the indexing and terminate
+       '-index 2' - the default behaviour, the same as when not using this option at all
+
+
+    -L                DOUBLE      Optional  Indexing: seed length.                                  18
+
+    -m                DOUBLE      Optional  Indexing: the amount of memory (in Mbytes) for          3072
+                                            building the index.
+
+    -v                BOOL        Optional  Produce verbose output when building the index          True
+
+    --interval        INT         Optional  Indexing: Positive integer: index every Nth L-mer in    1
+                                            the reference database e.g. '-interval 2'.
+
+    --max_pos         INT         Optional  Indexing: maximum (integer) number of positions to      1000
+                                            store for each unique L-mer.
+                                            If 0 - all positions are stored.
+
+
+    [HELP]
+    -h                BOOL        Optional  Print help information
+
+    --version         BOOL        Optional  Print SortMeRNA version number
+
+
+    [DEVELOPER]
+    --dbg_put_db      BOOL        Optional  
+    --cmd             BOOL        Optional  Launch an interactive session (command prompt)          False
+
+    --task            INT         Optional  Processing Task                                         4
+
+       Possible values: 0 - align. Only perform alignment
+                        1 - post-processing (log writing)
+                        2 - generate reports
+                        3 - align and post-process
+                        4 - all
+
+
+    --dbg-level       INT         Optional  Debug level                                             0
+
+      Controls verbosity of the execution trace. Default value of 0 corresponds to
+      the least verbose output.
+      The highest value currently is 2.
diff --git a/src/sortmerna/script.sh b/src/sortmerna/script.sh
new file mode 100755
index 00000000..8dda3d60
--- /dev/null
+++ b/src/sortmerna/script.sh
@@ -0,0 +1,108 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+unset_if_false=( par_fastx par_sq par_fastx par_print_all_reads par_paired_in par_paired_out
+                 par_F par_R par_verbose par_de_novo par_otu_map par_full_search par_out2
+                 par_sout par_sam par_paired )
+
+
+for var in "${unset_if_false[@]}"; do
+    if [ "${!var}" == "false" ]; then
+        unset $var
+    fi
+done
+
+reads=()
+IFS=";" read -ra input <<< "$par_input"
+if [ "${#input[@]}" -eq 2 ]; then
+    reads="--reads ${input[0]} --reads ${input[1]}"
+    # set paired to true in case it's not
+    par_paired=true
+else
+    reads="--reads ${input[0]}"
+    par_paired=false
+fi
+
+refs=()
+
+# check if references are input normally or through a manifest file
+if [[ ! -z "$par_ribo_database_manifest" ]]; then
+    while IFS= read -r path || [[ -n $path ]]; do
+        refs=$refs" --ref $path"
+    done < $par_ribo_database_manifest
+
+elif [[ ! -z "$par_ref" ]]; then
+    IFS=";" read -ra ref <<< "$par_ref"
+    # check if length is 2 and par_paired is set to true
+    if [[ "${#ref[@]}" -eq 2 && "$par_paired" == "true" ]]; then
+        refs="--ref ${ref[0]} --ref ${ref[1]}"
+    # check if length is 1 and par_paired is set to false
+    elif [[ "${#ref[@]}" -eq 1 && "$par_paired" == "false" ]]; then
+            refs="--ref $par_ref"      
+    else # if one reference provided but paired is set to true:
+        echo "Two reference fasta files are required for paired-end reads"
+            exit 1
+    fi
+else 
+    echo "No reference fasta file(s) provided"
+    exit 1
+fi
+
+
+sortmerna \
+    $refs \
+    $reads \
+    --workdir . \
+    ${par_output:+--aligned "${par_output}"} \
+    ${par_fastx:+--fastx} \
+    ${par_other:+--other "${par_other}"} \
+    ${par_kvdb:+--kvdb "${par_kvdb}"} \
+    ${par_idx_dir:+--idx-dir "${par_idx_dir}"} \
+    ${par_readb:+--readb "${par_readb}"} \
+    ${par_sam:+--sam} \
+    ${par_sq:+--sq} \
+    ${par_blast:+--blast "${par_blast}"} \
+    ${par_num_alignments:+--num_alignments "${par_num_alignments}"} \
+    ${par_min_lis:+--min_lis "${par_min_lis}"} \
+    ${par_print_all_reads:+--print_all_reads} \
+    ${par_paired_in:+--paired_in} \
+    ${par_paired_out:+--paired_out} \
+    ${par_out2:+--out2} \
+    ${par_sout:+--sout} \
+    ${par_zip_out:+--zip-out "${par_zip_out}"} \
+    ${par_match:+--match "${par_match}"} \
+    ${par_mismatch:+--mismatch "${par_mismatch}"} \
+    ${par_gap_open:+--gap_open "${par_gap_open}"} \
+    ${par_gap_ext:+--gap_ext "${par_gap_ext}"} \
+    ${par_N:+-N "${par_N}"} \
+    ${par_a:+-a "${par_a}"} \
+    ${par_e:+-e "${par_e}"} \
+    ${par_F:+-F} \
+    ${par_R:+-R} \
+    ${par_num_alignment:+--num_alignment "${par_num_alignment}"} \
+    ${par_best:+--best "${par_best}"} \
+    ${par_verbose:+--verbose} \
+    ${par_id:+--id "${par_id}"} \
+    ${par_coverage:+--coverage "${par_coverage}"} \
+    ${par_de_novo:+--de_novo} \
+    ${par_otu_map:+--otu_map} \
+    ${par_num_seed:+--num_seed "${par_num_seed}"} \
+    ${par_passes:+--passes "${par_passes}"} \
+    ${par_edge:+--edge "${par_edge}"} \
+    ${par_full_search:+--full_search} \
+    ${par_index:+--index "${par_index}"} \
+    ${par_L:+-L $par_L} \
+    ${par_interval:+--interval "${par_interval}"} \
+    ${par_max_pos:+--max_pos "${par_max_pos}"}
+
+
+if [ ! -z $par_log ]; then
+    mv "${par_output}.log" $par_log
+fi
+
+exit 0
+
diff --git a/src/sortmerna/test.sh b/src/sortmerna/test.sh
new file mode 100644
index 00000000..390b9307
--- /dev/null
+++ b/src/sortmerna/test.sh
@@ -0,0 +1,101 @@
+#!/bin/bash
+
+echo ">>> Testing $meta_functionality_name"
+
+find $meta_resources_dir/test_data/rRNA -type f > test_data/rrna-db.txt
+
+echo ">>> Testing for paired-end reads and database manifest"
+# out2 separates the read pairs into two files (one fwd and one rev)
+# paired_in outputs both reads of a pair
+# other is the output file for non-rRNA reads
+"$meta_executable" \
+    --output "rRNA_reads" \
+    --other "non_rRNA_reads" \
+    --input "$meta_resources_dir/test_data/reads_1.fq.gz;$meta_resources_dir/test_data/reads_2.fq.gz" \
+    --ribo_database_manifest test_data/rrna-db.txt \
+    --log test_log.log \
+    --paired_in \
+    --fastx \
+    --out2
+    
+
+echo ">> Checking if the correct files are present"
+[[ -f "rRNA_reads_fwd.fq.gz" ]] || [[ -f "rRNA_reads_rev.fq.gz" ]] || { echo "rRNA output fastq file is missing!"; exit 1; }
+[[ -s "rRNA_reads_fwd.fq.gz" ]] && [[ -s "rRNA_reads_rev.fq.gz" ]] || { echo "rRNA output fastq file is empty!"; exit 1; }
+[[ -f "non_rRNA_reads_fwd.fq.gz" ]] || [[ -f "non_rRNA_reads_rev.fq.gz" ]] || { echo "Non-rRNA output fastq file is missing!"; exit 1;}
+gzip -dk non_rRNA_reads_fwd.fq.gz
+gzip -dk non_rRNA_reads_rev.fq.gz
+[[ ! -s "non_rRNA_reads_fwd.fq" ]] && [[ ! -s "non_rRNA_reads_rev.fq" ]] || { echo "Non-rRNA output fastq file is not empty!"; exit 1;}
+
+rm -f rRNA_reads_fwd.fq.gz rRNA_reads_rev.fq.gz non_rRNA_reads_fwd.fq.gz non_rRNA_reads_rev.fq.gz test_log.log
+rm -rf kvdb/
+
+################################################################################
+echo ">>> Testing for paired-end reads and --ref and --paired_out argumens"
+"$meta_executable" \
+    --output "rRNA_reads" \
+    --other "non_rRNA_reads" \
+    --input "$meta_resources_dir/test_data/reads_1.fq.gz;$meta_resources_dir/test_data/reads_2.fq.gz" \
+    --ref "$meta_resources_dir/test_data/rRNA/database1.fa;$meta_resources_dir/test_data/rRNA/database2.fa" \
+    --log test_log.log \
+    --paired_out \
+    --fastx \
+    --out2
+
+echo ">> Checking if the correct files are present"
+[[ -f "rRNA_reads_fwd.fq.gz" ]] || [[ -f "rRNA_reads_rev.fq.gz" ]] || { echo "rRNA output fastq file is missing!"; exit 1; }
+gzip -dkf rRNA_reads_fwd.fq.gz
+[[ ! -s "rRNA_reads_fwd.fq" ]] && [[ ! -s "rRNA_reads_rev.fq" ]] || { echo "rRNA output fastq file is not empty!"; exit 1; }
+[[ -f "non_rRNA_reads_fwd.fq.gz" ]] || [[ -f "non_rRNA_reads_rev.fq.gz" ]] || { echo "Non-rRNA output fastq file is missing!"; exit 1;}
+gzip -dkf non_rRNA_reads_fwd.fq.gz
+gzip -dkf non_rRNA_reads_rev.fq.gz
+[[ -s "non_rRNA_reads_fwd.fq" ]] && [[ -s "non_rRNA_reads_rev.fq" ]] || { echo "Non-rRNA output fastq file is empty!"; exit 1; }
+
+rm -f rRNA_reads_fwd.fq.gz rRNA_reads_rev.fq.gz non_rRNA_reads_fwd.fq.gz non_rRNA_reads_rev.fq.gz test_log.log
+rm -rf kvdb/
+
+################################################################################
+
+echo ">>> Testing for single-end reads and --ref argument"
+"$meta_executable" \
+    --aligned "rRNA_reads" \
+    --other "non_rRNA_reads" \
+    --input $meta_resources_dir/test_data/reads_1.fq.gz \
+    --ref $meta_resources_dir/test_data/rRNA/database1.fa \
+    --log test_log.log \
+    --fastx
+
+echo ">> Checking if the correct files are present"
+[[ ! -f "rRNA_reads.fq.gz" ]] && echo "rRNA output fastq file is missing!" && exit 1
+gzip -dk rRNA_reads.fq.gz
+[[ -s "rRNA_reads.fq" ]] && echo "rRNA output fastq file is not empty!" && exit 1
+[[ ! -f "non_rRNA_reads.fq.gz" ]] && echo "Non-rRNA output fastq file is missing!" && exit 1
+[[ ! -s "non_rRNA_reads.fq.gz" ]] && echo "Non-rRNA output fastq file is empty!" && exit 1
+
+rm -f rRNA_reads.fq.gz non_rRNA_reads.fq.gz test_log.log
+rm -rf kvdb/
+
+################################################################################
+
+echo ">>> Testing for single-end reads with singleton output files"
+"$meta_executable" \
+    --aligned "rRNA_reads" \
+    --other "non_rRNA_reads" \
+    --input "$meta_resources_dir/test_data/reads_1.fq.gz;$meta_resources_dir/test_data/reads_2.fq.gz" \
+    --ribo_database_manifest test_data/rrna-db.txt \
+    --log test_log.log \
+    --fastx \
+    --sout
+
+echo ">> Checking if the correct files are present"
+[[ ! -f "rRNA_reads_paired.fq.gz" ]] && echo "Aligned paired fwd output fastq file is missing!" && exit 1
+[[ ! -f "rRNA_reads_singleton.fq.gz" ]] && echo "Aligned singleton fwd output fastq file is missing!" && exit 1
+[[ ! -f "non_rRNA_reads_fwd.fq" ]] && echo "Non-rRNA fwd output fastq file is missing!" && exit 1
+[[ ! -f "non_rRNA_reads_rev.fq" ]] && echo "Non-rRNA rev output fastq file is missing!" && exit 1
+[[ ! -f "non_rRNA_reads_singleton.fq.gz" ]] && echo "Non-rRNA singleton output fastq file is missing!" && exit 1
+[[ ! -f "non_rRNA_reads_paired.fq.gz" ]] && echo "Non-rRNA paired output fastq file is missing!" && exit 1
+
+
+
+echo ">>> All tests passed"
+exit 0
\ No newline at end of file
diff --git a/src/sortmerna/test_data/rRNA/database1.fa b/src/sortmerna/test_data/rRNA/database1.fa
new file mode 100644
index 00000000..bae23aba
--- /dev/null
+++ b/src/sortmerna/test_data/rRNA/database1.fa
@@ -0,0 +1,24 @@
+>AY846379.1.1791 Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Chlorophyceae;Sphaeropleales;Monoraphidium;Monoraphidium sp. Itas 9/21 14-6w
+CCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAAGAUUAAGCCAUGCAUGUCUAAGUAUAAACUGCUUAUACUGU
+GAAACUGCGAAUGGCUCAUUAAAUCAGUUAUAGUUUAUUUGAUGGUACCUCUACACGGAUAACCGUAGUAAUUCUAGAGC
+UAAUACGUGCGUAAAUCCCGACUUCUGGAAGGGACGUAUUUAUUAGAUAAAAGGCCGACCGAGCUUUGCUCGACCCGCGG
+UGAAUCAUGAUAACUUCACGAAUCGCAUAGCCUUGUGCUGGCGAUGUUUCAUUCAAAUUUCUGCCCUAUCAACUUUCGAU
+GGUAGGAUAGAGGCCUACCAUGGUGGUAACGGGUGACGGAGGAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGG
+CUACCACAUCCAAGGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGAUACGGGGAGGUAGUGACAAUAAAUAACAAUGC
+CGGGCAUUUCAUGUCUGGCAAUUGGAAUGAGUACAAUCUAAAUCCCUUAACGAGGAUCAAUUGGAGGGCAAGUCUGGUGC
+CAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGAUUUCGGGUG
+GGUUCCAGCGGUCCGCCUAUGGUGAGUACUGCUGUGGCCCUCCUUUUUGUCGGGGACGGGCUCCUGGGCUUCAUUGUCCG
+GGACUCGGAGUCGACGAUGAUACUUUGAGUAAAUUAGAGUGUUCAAAGCAAGCCUACGCUCUGAAUACUUUAGCAUGGAA
+UAUCGCGAUAGGACUCUGGCCUAUCUCGUUGGUCUGUAGGACCGGAGUAAUGAUUAAGAGGGACAGUCGGGGGCAUUCGU
+AUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACUACUGCGAAAGCAUUUGCCAAGGAUGUUUUCAUUA
+AUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUUAGAUACCGUCGUAGUCUCAACCAUAAACGAUGCCGACUAGGGAUUG
+GAGGAUGUUCUUUUGAUGACUUCUCCAGCACCUUAUGAGAAAUCAAAGUUUUUGGGUUCCGGGGGGAGUAUGGUCGCAAG
+GCUGAAACUUAAAGGAAUUGACGGAAGGGCACCACCAGGCGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAAACU
+UACCAGGUCCAGACAUAGUGAGGAUUGACAGAUUGAGAGCUCUUUCUUGAUUCUAUGGGUGGUGGUGCAUGGCCGUUCUU
+AGUUGGUGGGUUGCCUUGUCAGGUUGAUUCCGGUAACGAACGAGACCUCAGCCUGCUAAAUAUGUCACAUUCGCUUUUUG
+CGGAUGGCCGACUUCUUAGAGGGACUAUUGGCGUUUAGUCAAUGGAAGUAUGAGGCAAUAACAGGUCUGUGAUGCCCUUA
+GAUGUUCUGGGCCGCACGCGCGCUACACUGACGCAUUCAGCAAGCCUAUCCUUGACCGAGAGGUCUGGGUAAUCUUUGAA
+ACUGCGUCGUGAUGGGGAUAGAUUAUUGCAAUUAUUAGUCUUCAACGAGGAAUGCCUAGUAAGCGCAAGUCAUCAGCUUG
+CGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUUGGGUGUGCUGGUGAAGUGUUCGGAUUGG
+CAGAGCGGGUGGCAACACUUGCUUUUGCCGAGAAGUUCAUUAAACCCUCCCACCUAGAGGAAGGAGAAGUCGUAACAAGG
+UUUCCGUAGGUGAACCUGCAGAAG
\ No newline at end of file
diff --git a/src/sortmerna/test_data/rRNA/database2.fa b/src/sortmerna/test_data/rRNA/database2.fa
new file mode 100644
index 00000000..87b5bc99
--- /dev/null
+++ b/src/sortmerna/test_data/rRNA/database2.fa
@@ -0,0 +1,16 @@
+>AB001445.1.1538 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonas amygdali pv. morsprunorum
+AGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAGCGGCAGCACGGGUACUUGUAC
+CUGGUGGCGAGCGGCGGACGGGUGAGUAAUGCCUAGGAAUCUGCCUGGUAGUGGGGGAUAACGCUCGGAAACGGACGCUA
+AUACCGCAUACGUCCUACGGGAGAAAGCAGGGGACCUUCGGGCCUUGCGCUAUCAGAUGAGCCUAGGUCGGAUUAGCUAG
+UUGGUGAGGUAAUGGCUCACCAAGGCGACGAUCCGUAACUGGUCUGAGAGGAUGAUCAGUCACACUGGAACUGAGACACG
+GUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGAAAGCCUGAUCCAGCCAUGCCGCGUGUGUGA
+AGAAGGUCUUCGGAUUGUAAAGCACUUUAAGUUGGGAGGAAGGGCAGUUACCUAAUACGUAUCUGUUUUGACGUUACCGA
+CAGAAUAAGCACCGGCUAACUCUGUGCCAGCAGCCGCGGUAAUACAGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGU
+AAAGCGCGCGUAGGUGGUUUGUUAAGUUGAAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUCCAAAACUGGCAAGC
+UAGAGUAUGGUAGAGGGUGGUGGAAUUUCCUGUGUAGCGGUGAAAUGCGUAGAUAUAGGAAGGAACACCAGUGGCGAAGG
+CGACCACCUGGACUGAUACUGACACUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCC
+GUAAACGAUGUCAACUAGCCGUUGGGAGCCUUGAGCUCUUAGUGGCGCAGCUAACGCAUUAAGUUGACCGCCUGGGGAGU
+ACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACG
+CGAAGAACCUUACCAGGCCUUGACAUCCAAUGAAUCCUUUAGAGAUAGAGGAGUGCCUUCGGGAGCAUUGAGACAGGUGC
+UGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGUAACGAGCGCAACCCUUGUCCUUAGUUACCAG
+CACGUCAUGGUGGGCACUCUAAGGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCC
diff --git a/src/sortmerna/test_data/reads_1.fq.gz b/src/sortmerna/test_data/reads_1.fq.gz
new file mode 100644
index 0000000000000000000000000000000000000000..41c02a22dbbae13db84acf1e79bc4fc3fa8589e6
GIT binary patch
literal 189
zcmV;u07CyCiwFo$iqvKR19D|yWOH9JE@p86wU0dx!Y~Yl_nZQWu>(o}P^}JqwIX+b
zPO-%OPl6LsC<}stmpJjW<4E7M&Ycg<xuv0{WvQ>1S#Apj3L$tq`=RbA_@+MuTFFyl
z78alq=EMofi84dLb}3jyAYujED%nAymVqrZpNFl>Dky^%D&<ox3$Aj=I>v_(cRIcb
rrwC*NyuH|rwZ}M?)J<PfHX9|Ll<5=Yj_XIzKTzHQmEYsV%K-oY|JGAn

literal 0
HcmV?d00001

diff --git a/src/sortmerna/test_data/reads_2.fq.gz b/src/sortmerna/test_data/reads_2.fq.gz
new file mode 100644
index 0000000000000000000000000000000000000000..9d0f8d3f82dc114add66bde14727742aa60d87ee
GIT binary patch
literal 147
zcmV;E0BrvsiwFqp`$S~`19D|yWOH9KE@p86Rf{_g!!Qg(cb}p_#tgOcE2624VAw;N
z$wTjdl2T59whz#U6!ko|Im-B$be*)6;k9r1T~t&=BKxuqvq~J7o9LlYt68=T^x3Rh
zMGeS=Cf&2pc5v_jP&ehk104RrBv1Ym`T(a(7f3&JU*nzt7r<Yli4T(P4++Tt002W4
BLoEOR

literal 0
HcmV?d00001

diff --git a/src/sortmerna/test_data/script.sh b/src/sortmerna/test_data/script.sh
new file mode 100755
index 00000000..b2531248
--- /dev/null
+++ b/src/sortmerna/test_data/script.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+if [ ! -d /tmp/sortmerna_source ]; then
+  git clone --depth 2 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers.git /tmp/sortmerna_source
+fi
+
+# copy test data
+cp -r /tmp/sortmerna_source/bio/sortmerna/test/* .

From 8fe9d66b0c689776846dcb0ecb01a30f3ef1b66b Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Tue, 10 Sep 2024 15:51:12 +0200
Subject: [PATCH 12/42] Bcftools stats (#142)

* Initial Commit

* Adding options to config

* Update on script

* update

* Adding test 2 and 3

* Update on config and test

* adding more tests

* debugging and adding tests

* Adding last tests

* removing test_data dir

* Update CHANGELOG.md

* small changes

* small change in help file

* Requested changes

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                |   1 +
 src/bcftools/bcftools_stats/config.vsh.yaml | 240 +++++++++++++++
 src/bcftools/bcftools_stats/help.txt        |  35 +++
 src/bcftools/bcftools_stats/script.sh       |  56 ++++
 src/bcftools/bcftools_stats/test.sh         | 306 ++++++++++++++++++++
 5 files changed, 638 insertions(+)
 create mode 100644 src/bcftools/bcftools_stats/config.vsh.yaml
 create mode 100644 src/bcftools/bcftools_stats/help.txt
 create mode 100644 src/bcftools/bcftools_stats/script.sh
 create mode 100644 src/bcftools/bcftools_stats/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5041f082..2dd152bb 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -42,6 +42,7 @@
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
 * `bcftools`:
+  - `bcftools/bcftools_stats`: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
   - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
 
 * `fastqc`: High throughput sequence quality control analysis tool (PR #92).
diff --git a/src/bcftools/bcftools_stats/config.vsh.yaml b/src/bcftools/bcftools_stats/config.vsh.yaml
new file mode 100644
index 00000000..8fb57f7a
--- /dev/null
+++ b/src/bcftools/bcftools_stats/config.vsh.yaml
@@ -0,0 +1,240 @@
+name: bcftools_stats
+namespace: bcftools
+description: | 
+  Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats.
+  When two files are given, the program generates separate stats for intersection
+  and the complements. By default only sites are compared, -s/-S must given to include
+  also sample columns.
+keywords: [Stats, VCF, BCF]
+links:
+  homepage: https://samtools.github.io/bcftools/
+  documentation: https://samtools.github.io/bcftools/bcftools.html#stats
+  repository: https://github.com/samtools/bcftools
+  issue_tracker: https://github.com/samtools/bcftools/issues
+references:
+  doi: https://doi.org/10.1093/gigascience/giab008
+license: MIT/Expat, GNU
+requirements:
+  commands: [bcftools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        multiple: true
+        description: Input VCF/BCF file. Maximum of two files.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        direction: output
+        type: file
+        description: Output txt statistics file.
+        required: true
+         
+  - name: Options
+    arguments:
+      
+      - name: --allele_frequency_bins
+        alternatives: --af_bins
+        type: string
+        description: | 
+          Allele frequency bins, a list of bin values (0.1,0.5,1).
+        example: 0.1,0.5,1
+
+      - name: --allele_frequency_bins_file
+        alternatives: --af_bins_file
+        type: file
+        description: | 
+          Same as allele_frequency_bins, but in a file.
+          Format of file is one value per line. 
+          e.g. 
+            0.1
+            0.5
+            1
+
+      - name: --allele_frequency_tag
+        alternatives: --af_tag
+        type: string
+        description: | 
+          Allele frequency tag to use, by default estimated from AN,AC or GT.
+
+      - name: --first_allele_only
+        alternatives: --first_only
+        type: boolean_true
+        description: | 
+          Include only 1st allele at multiallelic sites.
+
+      - name: --collapse
+        alternatives: --c
+        type: string
+        choices: [ snps, indels, both, all, some, none ]
+        description: | 
+          Treat as identical records with <snps|indels|both|all|some|none>.
+          See https://samtools.github.io/bcftools/bcftools.html#common_options for details.
+
+      - name: --depth
+        alternatives: --d
+        type: string
+        description: | 
+          Depth distribution: min,max,bin size.
+        example: 0,500,1
+
+      - name: --exclude
+        alternatives: --e
+        type: string
+        description: | 
+          Exclude sites for which the expression is true.
+          See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
+        example: 'QUAL < 30 && DP < 10'
+
+      - name: --exons
+        alternatives: --E
+        type: file
+        description: | 
+          tab-delimited file with exons for indel frameshifts statistics. 
+          The columns of the file are CHR, FROM, TO, with 1-based, inclusive, positions. 
+          The file is BGZF-compressed and indexed with tabix (e.g. tabix -s1 -b2 -e3 file.gz).
+
+      - name: --apply_filters
+        alternatives: --f
+        type: string
+        description: | 
+          Require at least one of the listed FILTER strings (e.g. "PASS,.").
+
+      - name: --fasta_reference
+        alternatives: --F
+        type: file
+        description: | 
+          Faidx indexed reference sequence file to determine INDEL context.
+
+      - name: --include
+        alternatives: --i
+        type: string
+        description: | 
+          Select sites for which the expression is true.
+          See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
+        example: 'QUAL >= 30 && DP >= 10'
+      
+      - name: --split_by_ID
+        alternatives: --I
+        type: boolean_true
+        description: | 
+          Collect stats for sites with ID separately (known vs novel).
+
+      - name: --regions
+        alternatives: --r
+        type: string
+        description: | 
+          Restrict to comma-separated list of regions. 
+          Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
+        example: '20:1000000-2000000'
+
+      - name: --regions_file
+        alternatives: --R
+        type: file
+        description: | 
+          Restrict to regions listed in a file. 
+          Regions can be specified either on a VCF, BED, or tab-delimited file (the default). 
+          For more information check manual.
+
+      - name: --regions_overlap
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          This option controls how overlapping records are determined: 
+          set to 'pos' or '0' if the VCF record has to have POS inside a region (this corresponds to the default behavior of -t/-T); 
+          set to 'record' or '1' if also overlapping records with POS outside a region should be included (this is the default behavior of -r/-R, 
+          and includes indels with POS at the end of a region, which are technically outside the region); 
+          or set to 'variant' or '2' to include only true overlapping variation (compare the full VCF representation "TA>T-" vs the true sequence variation "A>-").
+
+      - name: --samples
+        alternatives: --s
+        type: string
+        description: | 
+          List of samples for sample stats, "-" to include all samples.
+
+      - name: --samples_file
+        alternatives: --S
+        type: file
+        description: | 
+          File of samples to include.
+          e.g. 
+            sample1    1
+            sample2    2
+            sample3    2
+
+      - name: --targets
+        alternatives: --t
+        type: string
+        description: | 
+          Similar as -r, --regions, but the next position is accessed by streaming the whole VCF/BCF 
+          rather than using the tbi/csi index. Both -r and -t options can be applied simultaneously: -r uses the 
+          index to jump to a region and -t discards positions which are not in the targets. Unlike -r, targets 
+          can be prefixed with "^" to request logical complement. For example, "^X,Y,MT" indicates that 
+          sequences X, Y and MT should be skipped. Yet another difference between the -t/-T and -r/-R is 
+          that -r/-R checks for proper overlaps and considers both POS and the end position of an indel, 
+          while -t/-T considers the POS coordinate only (by default; see also --regions-overlap and --targets-overlap). 
+          Note that -t cannot be used in combination with -T.
+          Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
+        example: '20:1000000-2000000'
+      
+      - name: --targets_file
+        alternatives: --T
+        type: file
+        description: | 
+          Similar to --regions_file option but streams rather than index-jumps.
+
+      - name: --targets_overlaps
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          Include if POS in the region (0), record overlaps (1), variant overlaps (2).
+
+      - name: --user_tstv
+        alternatives: --u
+        type: string
+        description: | 
+          Collect Ts/Tv stats for any tag using the given binning [0:1:100].
+          Format is <TAG[:min:max:n]>.
+          A subfield can be selected as e.g. 'PV4[0]', here the first value of the PV4 tag.
+          
+      
+      - name: --verbose 
+        alternatives: --v
+        type: boolean_true
+        description: | 
+          Produce verbose per-site and per-sample output.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bcftools, procps]
+      - type: docker
+        run: |
+          echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
+    test_setup:  
+      - type: apt  
+        packages: [tabix]
+
+runners:
+  - type: executable
+  - type: nextflow
+
diff --git a/src/bcftools/bcftools_stats/help.txt b/src/bcftools/bcftools_stats/help.txt
new file mode 100644
index 00000000..e702e838
--- /dev/null
+++ b/src/bcftools/bcftools_stats/help.txt
@@ -0,0 +1,35 @@
+```
+bcftools stats -h
+```
+
+About:   Parses VCF or BCF and produces stats which can be plotted using plot-vcfstats.
+         When two files are given, the program generates separate stats for intersection
+         and the complements. By default only sites are compared, -s/-S must given to include
+         also sample columns.
+Usage:   bcftools stats [options] <A.vcf.gz> [<B.vcf.gz>]
+
+Options:
+        --af-bins LIST               Allele frequency bins, a list (0.1,0.5,1) or a file (0.1\n0.5\n1)
+        --af-tag STRING              Allele frequency tag to use, by default estimated from AN,AC or GT
+    -1, --1st-allele-only            Include only 1st allele at multiallelic sites
+    -c, --collapse STRING            Treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
+    -d, --depth INT,INT,INT          Depth distribution: min,max,bin size [0,500,1]
+    -e, --exclude EXPR               Exclude sites for which the expression is true (see man page for details)
+    -E, --exons FILE.gz              Tab-delimited file with exons for indel frameshifts (chr,beg,end; 1-based, inclusive, bgzip compressed)
+    -f, --apply-filters LIST         Require at least one of the listed FILTER strings (e.g. "PASS,.")
+    -F, --fasta-ref FILE             Faidx indexed reference sequence file to determine INDEL context
+    -i, --include EXPR               Select sites for which the expression is true (see man page for details)
+    -I, --split-by-ID                Collect stats for sites with ID separately (known vs novel)
+    -r, --regions REGION             Restrict to comma-separated list of regions
+    -R, --regions-file FILE          Restrict to regions listed in a file
+        --regions-overlap 0|1|2      Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
+    -s, --samples LIST               List of samples for sample stats, "-" to include all samples
+    -S, --samples-file FILE          File of samples to include
+    -t, --targets REGION             Similar to -r but streams rather than index-jumps
+    -T, --targets-file FILE          Similar to -R but streams rather than index-jumps
+        --targets-overlap 0|1|2      Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
+    -u, --user-tstv TAG[:min:max:n]  Collect Ts/Tv stats for any tag using the given binning [0:1:100]
+                                       A subfield can be selected as e.g. 'PV4[0]', here the first value of the PV4 tag
+        --threads INT                Use multithreading with <int> worker threads [0]
+    -v, --verbose                    Produce verbose per-site and per-sample output
+
diff --git a/src/bcftools/bcftools_stats/script.sh b/src/bcftools/bcftools_stats/script.sh
new file mode 100644
index 00000000..119502fd
--- /dev/null
+++ b/src/bcftools/bcftools_stats/script.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_first_allele_only
+    par_split_by_ID
+    par_verbose 
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Create input array 
+IFS=";" read -ra input <<< $par_input
+
+# Check the size of the input array
+if [[ ${#input[@]} -gt 2 ]]; then
+    echo "Error: --input only takes a max of two files!"
+    exit 1
+fi
+
+# Execute bcftools stats with the provided arguments
+bcftools stats \
+    ${par_first_allele_only:+--1st-allele-only} \
+    ${par_split_by_ID:+--split-by-ID} \
+    ${par_verbose:+--verbose} \
+    ${par_allele_frequency_bins:+--af-bins "${par_allele_frequency_bins}"} \
+    ${par_allele_frequency_bins_file:+--af-bins "${par_allele_frequency_bins_file}"} \
+    ${par_allele_frequency_tag:+--af-tag "${par_allele_frequency_tag}"} \
+    ${par_collapse:+-c "${par_collapse}"} \
+    ${par_depth:+-d "${par_depth}"} \
+    ${par_exclude:+-e "${par_exclude}"} \
+    ${par_exons:+-E "${par_exons}"} \
+    ${par_apply_filters:+-f "${par_apply_filters}"} \
+    ${par_fasta_reference:+-F "${par_fasta_reference}"} \
+    ${par_include:+-i "${par_include}"} \
+    ${par_regions:+-r "${par_regions}"} \
+    ${par_regions_file:+-R "${par_regions_file}"} \
+    ${par_regions_overlap:+--regions-overlap "${par_regions_overlap}"} \
+    ${par_samples:+-s "${par_samples}"} \
+    ${par_samples_file:+-S "${par_samples_file}"} \
+    ${par_targets:+-t "${par_targets}"} \
+    ${par_targets_file:+-T "${par_targets_file}"} \
+    ${par_targets_overlaps:+--targets-overlap "${par_targets_overlaps}"} \
+    ${par_user_tstv:+-u "${par_user_tstv}"} \
+    "${input[@]}" \
+    > $par_output
+
diff --git a/src/bcftools/bcftools_stats/test.sh b/src/bcftools/bcftools_stats/test.sh
new file mode 100644
index 00000000..18f0256b
--- /dev/null
+++ b/src/bcftools/bcftools_stats/test.sh
@@ -0,0 +1,306 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+#test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/example.vcf"
+##fileformat=VCFv4.0
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##contig=<ID=19,length=58617616>
+##contig=<ID=20,length=58617616>
+##phasing=partial
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
+##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
+##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
+##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
+##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
+##FILTER=<ID=q10,Description="Quality below 10">
+##FILTER=<ID=s50,Description="Less than 50% of samples have data">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
+##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
+##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
+##ALT=<ID=CNV,Description="Copy number variable region">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
+19	111	.	A	C	9.6	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+19	112	.	A	G	10	.	.	GT:HQ	0|0:10,10	0|0:10,10	0/1:3,3
+20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
+20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:.,.
+20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:.,.
+20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:.:56,60	0|0:48:4:51,51	0/0:61:2:.,.
+20	1234567	microsat1	G	GA,GAC	50	PASS	NS=3;DP=9;AA=G;AN=6;AC=3,1	GT:GQ:DP	0/1:.:4	0/2:17:2	1/1:40:3
+20	1235237	.	T	.	.	.	.	GT	0/0	0|0	./.
+EOF
+
+bgzip -c $TMPDIR/example.vcf > $TMPDIR/example.vcf.gz
+tabix -p vcf $TMPDIR/example.vcf.gz
+
+cat <<EOF > "$TMPDIR/exons.bed"
+chr19	12345	12567
+chr20	23456	23789
+EOF
+
+# Compressing and indexing the exons file
+bgzip -c $TMPDIR/exons.bed > $TMPDIR/exons.bed.gz
+tabix -s1 -b2 -e3 $TMPDIR/exons.bed.gz
+
+# Create fai test file
+# cat <<EOF > "$TMPDIR/reference.fasta.fai"
+# 19	100	895464957	60	61
+# 20	10000	1083893029	60	61
+# EOF
+
+# Create allele frequency bins file
+cat <<EOF > "$TMPDIR/allele_frequency_bins.txt"
+0.1
+0.2
+0.3
+0.4
+0.5
+0.6
+0.7
+0.8
+0.9
+EOF
+
+# Test 1: Default Use
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bcftools_stats on VCF file"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  ../example.vcf"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: First allele only
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with first allele only"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --first_allele_only \
+  --allele_frequency_bins "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9" \
+  --allele_frequency_tag "AF" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  --1st-allele-only --af-bins 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 --af-tag AF ../example.vcf"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: Split by ID
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with split by ID"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --split_by_ID \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  --split-by-ID ../example.vcf"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: Collapse, Depth, Exclude
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with collapse, depth, and exclude"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --depth "0,500,1" \
+  --exclude "GT='mis'" \
+  --collapse "snps" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  -c snps -d 0,500,1 -e GT='mis' ../example.vcf"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: Exons, Apply Filters
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with exons, apply filters, and fasta reference"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --exons "../exons.bed.gz" \
+  --apply_filters "PASS" \
+#  --fasta_reference "../reference.fasta.fai" \
+
+# NOTE: fasta_reference option not included in testing because of error from bcftools stats.
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  -E ../exons.bed.gz -f PASS ../example.vcf"
+#assert_file_contains "stats.txt" "bcftools stats  -E ../exons.bed.gz -f PASS -F ../reference.fasta.fai ../example.vcf"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: Include, Regions
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with include and regions options"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --output "stats.txt" \
+  --include "GT='mis'" \
+  --regions "20:1000000-2000000" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  -i GT='mis' -r 20:1000000-2000000 ../example.vcf.gz"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: Regions Overlap, Samples
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with regions overlap, and samples options"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --regions_overlap "record" \
+  --samples "NA00001,NA00002" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  --regions-overlap record -s NA00001,NA00002 ../example.vcf"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: Targets, Targets File, Targets Overlaps
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with targets, targets file, and targets overlaps"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --targets "20:1000000-2000000" \
+  --targets_overlaps "pos" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  -t 20:1000000-2000000 --targets-overlap pos ../example.vcf"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+# Test 9: User TSTV and Verbose
+mkdir "$TMPDIR/test9" && pushd "$TMPDIR/test9" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with user TSTV and verbose"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --user_tstv "DP" \
+  --verbose \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  --verbose -u DP ../example.vcf"
+echo "- test9 succeeded -"
+
+popd > /dev/null
+
+# Test 10: Two vcf files
+mkdir "$TMPDIR/test10" && pushd "$TMPDIR/test10" > /dev/null
+
+echo "> Run bcftools_stats on two VCF files"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example.vcf.gz" \
+  --output "stats.txt" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  ../example.vcf.gz ../example.vcf.gz"
+echo "- test10 succeeded -"
+
+popd > /dev/null
+
+# Test 11: with allele frequency bins file option
+mkdir "$TMPDIR/test11" && pushd "$TMPDIR/test11" > /dev/null
+
+echo "> Run bcftools_stats on VCF file with allele frequency bins file option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "stats.txt" \
+  --allele_frequency_bins "../allele_frequency_bins.txt" \
+
+# checks
+assert_file_exists "stats.txt"
+assert_file_not_empty "stats.txt"
+assert_file_contains "stats.txt" "bcftools stats  --af-bins ../allele_frequency_bins.txt ../example.vcf"
+echo "- test11 succeeded -"
+
+popd > /dev/null
+
+
+echo "---- All tests succeeded! ----"
+exit 0
+
+

From c3ba4a78497f7518725bb7d3e213b2a9bcee511e Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Tue, 10 Sep 2024 15:53:13 +0200
Subject: [PATCH 13/42] Bcftools annotate (#143)

* Initial commit

* Update config.vsh.yaml

* changes in config file

* Update script.sh

* Help File

* Update script.sh

* Update test.sh

* bug fixing and adding tests

* Update test.sh

* Update test.sh

* adding 3rd test

* More tests

* Moreee tests

* Update test.sh

* small changes

* Update CHANGELOG.md

* Update config.vsh.yaml

* bug fixing on config

* Requested changes

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +-
 .../bcftools_annotate/config.vsh.yaml         | 250 ++++++++++++++
 src/bcftools/bcftools_annotate/help.txt       |  41 +++
 src/bcftools/bcftools_annotate/script.sh      |  54 ++++
 src/bcftools/bcftools_annotate/test.sh        | 305 ++++++++++++++++++
 5 files changed, 651 insertions(+), 1 deletion(-)
 create mode 100644 src/bcftools/bcftools_annotate/config.vsh.yaml
 create mode 100644 src/bcftools/bcftools_annotate/help.txt
 create mode 100644 src/bcftools/bcftools_annotate/script.sh
 create mode 100644 src/bcftools/bcftools_annotate/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2dd152bb..bb640d50 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -42,12 +42,12 @@
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
 * `bcftools`:
+  - `bcftools_annotate`: Add or remove annotations from a VCF/BCF file (PR #143).
   - `bcftools/bcftools_stats`: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
   - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
 
 * `fastqc`: High throughput sequence quality control analysis tool (PR #92).
 
-
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/bcftools/bcftools_annotate/config.vsh.yaml b/src/bcftools/bcftools_annotate/config.vsh.yaml
new file mode 100644
index 00000000..67e8f46e
--- /dev/null
+++ b/src/bcftools/bcftools_annotate/config.vsh.yaml
@@ -0,0 +1,250 @@
+name: bcftools_annotate
+namespace: bcftools
+description: | 
+  Add or remove annotations from a VCF/BCF file.
+keywords: [Annotate, VCF, BCF]
+links:
+  homepage: https://samtools.github.io/bcftools/
+  documentation: https://samtools.github.io/bcftools/bcftools.html#annotate
+  repository: https://github.com/samtools/bcftools
+  issue_tracker: https://github.com/samtools/bcftools/issues
+references:
+  doi: https://doi.org/10.1093/gigascience/giab008
+license: MIT/Expat, GNU
+requirements:
+  commands: [bcftools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [author]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        multiple: true
+        description: Input VCF/BCF file.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        direction: output
+        type: file
+        description: Output annotated file.
+        required: true
+         
+  - name: Options
+    description: | 
+      For examples on how to use use bcftools annotate see http://samtools.github.io/bcftools/howtos/annotate.html.
+      For more details on the options see https://samtools.github.io/bcftools/bcftools.html#annotate.
+    arguments:
+      
+      - name: --annotations
+        alternatives: --a
+        type: file
+        description: | 
+          VCF file or tabix-indexed FILE with annotations: CHR\tPOS[\tVALUE]+ . 
+
+      - name: --columns
+        alternatives: --c
+        type: string
+        description: | 
+          List of columns in the annotation file, e.g. CHROM,POS,REF,ALT,-,INFO/TAG. 
+          See man page for details.
+
+      - name: --columns_file
+        alternatives: --C
+        type: file
+        description: | 
+          Read -c columns from FILE, one name per row, with optional --merge_logic TYPE: NAME[ TYPE].
+
+      - name: --exclude
+        alternatives: --e
+        type: string
+        description: | 
+          Exclude sites for which the expression is true.
+          See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
+        example: 'QUAL >= 30 && DP >= 10'
+
+      - name: --force
+        type: boolean_true
+        description: | 
+          continue even when parsing errors, such as undefined tags, are encountered. 
+          Note this can be an unsafe operation and can result in corrupted BCF files. 
+          If this option is used, make sure to sanity check the result thoroughly.
+
+      - name: --header_line
+        alternatives: --H
+        type: string
+        description: | 
+          Header line which should be appended to the VCF header, can be given multiple times.
+
+      - name: --header_lines
+        alternatives: --h
+        type: file
+        description: | 
+          File with header lines to append to the VCF header.
+          For example:
+            ##INFO=<ID=NUMERIC_TAG,Number=1,Type=Integer,Description="Example header line">
+            ##INFO=<ID=STRING_TAG,Number=1,Type=String,Description="Yet another header line">
+
+      - name: --set_id
+        alternatives: --I
+        type: string
+        description: | 
+          Set ID column using a `bcftools query`-like expression, see man page for details.
+
+      - name: --include
+        type: string
+        description: | 
+          Select sites for which the expression is true.
+          See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
+        example: 'QUAL >= 30 && DP >= 10'
+      
+      - name: --keep_sites
+        alternatives: --k
+        type: boolean_true
+        description: | 
+          Leave --include/--exclude sites unchanged instead of discarding them.
+
+      - name: --merge_logic
+        alternatives: --l
+        type: string
+        choices: 
+        description: | 
+          When multiple regions overlap a single record, this option defines how to treat multiple annotation values.
+          See man page for more details.
+
+      - name: --mark_sites
+        alternatives: --m
+        type: string
+        description: | 
+          Annotate sites which are present ("+") or absent ("-") in the -a file with a new INFO/TAG flag.
+
+      - name: --min_overlap
+        type: string
+        description: | 
+          Minimum overlap required as a fraction of the variant in the annotation -a file (ANN), 
+          in the target VCF file (:VCF), or both for reciprocal overlap (ANN:VCF). 
+          By default overlaps of arbitrary length are sufficient. 
+          The option can be used only with the tab-delimited annotation -a file and with BEG and END columns present.
+
+      - name: --no_version
+        type: boolean_true
+        description: | 
+          Do not append version and command line information to the output VCF header.
+
+      - name: --output_type
+        alternatives: --O
+        type: string
+        choices: ['u', 'z', 'b', 'v']
+        description: | 
+          Output type:
+            u: uncompressed BCF
+            z: compressed VCF
+            b: compressed BCF
+            v: uncompressed VCF
+      
+      - name: --pair_logic
+        type: string
+        choices: ['snps', 'indels', 'both', 'all', 'some', 'exact']
+        description: | 
+          Controls how to match records from the annotation file to the target VCF. 
+          Effective only when -a is a VCF or BCF file. 
+          The option replaces the former uninuitive --collapse. 
+          See Common Options for more.
+      
+      - name: --regions
+        alternatives: --r
+        type: string
+        description: | 
+          Restrict to comma-separated list of regions. 
+          Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
+        example: '20:1000000-2000000'
+
+      - name: --regions_file
+        alternatives: --R
+        type: file
+        description: | 
+          Restrict to regions listed in a file. 
+          Regions can be specified either on a VCF, BED, or tab-delimited file (the default). 
+          For more information check manual.
+
+      - name: --regions_overlap
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          This option controls how overlapping records are determined: 
+          set to 'pos' or '0' if the VCF record has to have POS inside a region (this corresponds to the default behavior of -t/-T); 
+          set to 'record' or '1' if also overlapping records with POS outside a region should be included (this is the default behavior of -r/-R, 
+          and includes indels with POS at the end of a region, which are technically outside the region); 
+          or set to 'variant' or '2' to include only true overlapping variation (compare the full VCF representation "TA>T-" vs the true sequence variation "A>-").
+
+      - name: --rename_annotations 
+        type: file
+        description: | 
+          Rename annotations: TYPE/old\tnew, where TYPE is one of FILTER,INFO,FORMAT.
+
+      - name: --rename_chromosomes
+        type: file
+        description: | 
+          Rename chromosomes according to the map in file, with "old_name new_name\n" pairs 
+          separated by whitespaces, each on a separate line.
+
+      - name: --samples
+        type: string
+        description: | 
+          Subset of samples to annotate.
+          See also https://samtools.github.io/bcftools/bcftools.html#common_options.
+
+      - name: --samples_file
+        type: file
+        description: | 
+          Subset of samples to annotate in file format.
+          See also https://samtools.github.io/bcftools/bcftools.html#common_options.
+
+      - name: --single_overlaps
+        type: boolean_true
+        description: | 
+          Use this option to keep memory requirements low with very large annotation files. 
+          Note, however, that this comes at a cost, only single overlapping intervals are considered in this mode. 
+          This was the default mode until the commit af6f0c9 (Feb 24 2019).
+
+      - name: --remove
+        alternatives: --x
+        type: string
+        description: | 
+          List of annotations to remove. 
+          Use "FILTER" to remove all filters or "FILTER/SomeFilter" to remove a specific filter. 
+          Similarly, "INFO" can be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags except GT. 
+          To remove all INFO tags except "FOO" and "BAR", use "^INFO/FOO,INFO/BAR" (and similarly for FORMAT and FILTER). 
+          "INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bcftools, procps]
+      - type: docker
+        run: |
+          echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
+    test_setup:  
+      - type: apt  
+        packages: [tabix]
+
+runners:
+  - type: executable
+  - type: nextflow
+
diff --git a/src/bcftools/bcftools_annotate/help.txt b/src/bcftools/bcftools_annotate/help.txt
new file mode 100644
index 00000000..2d1c7807
--- /dev/null
+++ b/src/bcftools/bcftools_annotate/help.txt
@@ -0,0 +1,41 @@
+```
+bcftools annotate -h
+```
+
+annotate: option requires an argument -- 'h'
+
+About:   Annotate and edit VCF/BCF files.
+Usage:   bcftools annotate [options] VCF
+
+Options:
+   -a, --annotations FILE          VCF file or tabix-indexed FILE with annotations: CHR\tPOS[\tVALUE]+
+   -c, --columns LIST              List of columns in the annotation file, e.g. CHROM,POS,REF,ALT,-,INFO/TAG. See man page for details
+   -C, --columns-file FILE         Read -c columns from FILE, one name per row, with optional --merge-logic TYPE: NAME[ TYPE]
+   -e, --exclude EXPR              Exclude sites for which the expression is true (see man page for details)
+       --force                     Continue despite parsing error (at your own risk!)
+   -H, --header-line STR           Header line which should be appended to the VCF header, can be given multiple times
+   -h, --header-lines FILE         Lines which should be appended to the VCF header
+   -I, --set-id [+]FORMAT          Set ID column using a `bcftools query`-like expression, see man page for details
+   -i, --include EXPR              Select sites for which the expression is true (see man page for details)
+   -k, --keep-sites                Leave -i/-e sites unchanged instead of discarding them
+   -l, --merge-logic TAG:TYPE      Merge logic for multiple overlapping regions (see man page for details), EXPERIMENTAL
+   -m, --mark-sites [+-]TAG        Add INFO/TAG flag to sites which are ("+") or are not ("-") listed in the -a file
+       --min-overlap ANN:VCF       Required overlap as a fraction of variant in the -a file (ANN), the VCF (:VCF), or reciprocal (ANN:VCF)
+       --no-version                Do not append version and command line to the header
+   -o, --output FILE               Write output to a file [standard output]
+   -O, --output-type u|b|v|z[0-9]  u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
+       --pair-logic STR            Matching records by <snps|indels|both|all|some|exact>, see man page for details [some]
+   -r, --regions REGION            Restrict to comma-separated list of regions
+   -R, --regions-file FILE         Restrict to regions listed in FILE
+       --regions-overlap 0|1|2     Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
+       --rename-annots FILE        Rename annotations: TYPE/old\tnew, where TYPE is one of FILTER,INFO,FORMAT
+       --rename-chrs FILE          Rename sequences according to the mapping: old\tnew
+   -s, --samples [^]LIST           Comma separated list of samples to annotate (or exclude with "^" prefix)
+   -S, --samples-file [^]FILE      File of samples to annotate (or exclude with "^" prefix)
+       --single-overlaps           Keep memory low by avoiding complexities arising from handling multiple overlapping intervals
+   -x, --remove LIST               List of annotations (e.g. ID,INFO/DP,FORMAT/DP,FILTER) to remove (or keep with "^" prefix). See man page for details
+       --threads INT               Number of extra output compression threads [0]
+
+Examples:
+   http://samtools.github.io/bcftools/howtos/annotate.html
+
diff --git a/src/bcftools/bcftools_annotate/script.sh b/src/bcftools/bcftools_annotate/script.sh
new file mode 100644
index 00000000..18137bbf
--- /dev/null
+++ b/src/bcftools/bcftools_annotate/script.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_force
+    par_keep_sites
+    par_no_version
+    par_single_overlaps
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Execute bcftools annotate with the provided arguments
+bcftools annotate \
+    ${par_annotations:+-a "$par_annotations"} \
+    ${par_columns:+-c "$par_columns"} \
+    ${par_columns_file:+-C "$par_columns_file"} \
+    ${par_exclude:+-e "$par_exclude"} \
+    ${par_force:+--force} \
+    ${par_header_line:+-H "$par_header_line"} \
+    ${par_header_lines:+-h "$par_header_lines"} \
+    ${par_set_id:+-I "$par_set_id"} \
+    ${par_include:+-i "$par_include"} \
+    ${par_keep_sites:+-k} \
+    ${par_merge_logic:+-l "$par_merge_logic"} \
+    ${par_mark_sites:+-m "$par_mark_sites"} \
+    ${par_min_overlap:+--min-overlap "$par_min_overlap"} \
+    ${par_no_version:+--no-version} \
+    ${par_samples_file:+-S "$par_samples_file"} \
+    ${par_output_type:+-O "$par_output_type"} \
+    ${par_pair_logic:+--pair-logic "$par_pair_logic"} \
+    ${par_regions:+-r "$par_regions"} \
+    ${par_regions_file:+-R "$par_regions_file"} \
+    ${par_regions_overlap:+--regions-overlap "$par_regions_overlap"} \
+    ${par_rename_annotations:+--rename-annots "$par_rename_annotations"} \
+    ${par_rename_chromosomes:+--rename-chrs "$par_rename_chromosomes"} \
+    ${par_samples:+-s "$par_samples"} \
+    ${par_single_overlaps:+--single-overlaps} \
+    ${par_threads:+--threads "$par_threads"} \
+    ${par_remove:+-x "$par_remove"} \
+    -o $par_output \
+    $par_input
+    
+
+    
\ No newline at end of file
diff --git a/src/bcftools/bcftools_annotate/test.sh b/src/bcftools/bcftools_annotate/test.sh
new file mode 100644
index 00000000..39835c82
--- /dev/null
+++ b/src/bcftools/bcftools_annotate/test.sh
@@ -0,0 +1,305 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+#test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/example.vcf"
+##fileformat=VCFv4.1
+##contig=<ID=1,length=249250621,assembly=b37>
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1
+1	752567	llama	A	C	.	.	.	.	.
+1	752722	.	G	A	.	.	.	.	.
+EOF
+
+bgzip -c $TMPDIR/example.vcf > $TMPDIR/example.vcf.gz
+tabix -p vcf $TMPDIR/example.vcf.gz
+
+cat <<EOF > "$TMPDIR/annots.tsv"
+1	752567	752567	FooValue1	12345
+1	752722	752722	FooValue2	67890
+EOF
+
+cat <<EOF > "$TMPDIR/rename.tsv"
+INFO/.	Luigi
+EOF
+
+bgzip $TMPDIR/annots.tsv
+tabix -s1 -b2 -e3 $TMPDIR/annots.tsv.gz
+
+cat <<EOF > "$TMPDIR/header.hdr"
+##FORMAT=<ID=FOO,Number=1,Type=String,Description="Some description">
+##INFO=<ID=BAR,Number=1,Type=Integer,Description="Some description">
+EOF
+
+cat <<EOF > "$TMPDIR/rename_chrm.tsv"
+1	chr1
+2	chr2
+EOF
+
+# Test 1: Remove ID annotations
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bcftools_annotate remove annotations"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --remove "ID" \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "1	752567	.	A	C"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: Annotate with -a, -c and -h options
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bcftools_annotate with -a, -c and -h options"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --annotations "../annots.tsv.gz" \
+  --header_lines "../header.hdr" \
+  --columns "CHROM,FROM,TO,FMT/FOO,BAR" \
+  --mark_sites "BAR" \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" $(echo -e "1\t752567\tllama\tA\tC\t.\t.\tBAR=12345\tFOO\tFooValue1")
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: 
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bcftools_annotate with --set_id option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --set_id "+'%CHROM\_%POS\_%REF\_%FIRST_ALT'" \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "'1_752722_G_A'"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4:
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bcftools_annotate with --rename-annotations option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --rename_annotations "../rename.tsv"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "##bcftools_annotateCommand=annotate --rename-annots ../rename.tsv -o annotated.vcf"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: Rename chromosomes
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bcftools_annotate with --rename-chromosomes option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --rename_chromosomes "../rename_chrm.tsv"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "chr1"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: Sample option
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bcftools_annotate with -s option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --samples "SAMPLE1"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "##bcftools_annotateCommand=annotate -s SAMPLE1 -o annotated.vcf ../example.vcf"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: Single overlaps
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bcftools_annotate with --single-overlaps option"	
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --single_overlaps \
+  --keep_sites \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate -k --single-overlaps -o annotated.vcf ../example.vcf"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: Min overlap
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bcftools_annotate with --min-overlap option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --annotations "../annots.tsv.gz" \
+  --columns "CHROM,FROM,TO,FMT/FOO,BAR" \
+  --header_lines "../header.hdr" \
+  --min_overlap "1"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate -a ../annots.tsv.gz -c CHROM,FROM,TO,FMT/FOO,BAR -h ../header.hdr --min-overlap 1 -o annotated.vcf ../example.vcf"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+# Test 9: Regions
+mkdir "$TMPDIR/test9" && pushd "$TMPDIR/test9" > /dev/null
+
+echo "> Run bcftools_annotate with -r option"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --output "annotated.vcf" \
+  --regions "1:752567-752722"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate -r 1:752567-752722 -o annotated.vcf ../example.vcf.gz"
+echo "- test9 succeeded -"
+
+popd > /dev/null
+
+# Test 10: pair-logic
+mkdir "$TMPDIR/test10" && pushd "$TMPDIR/test10" > /dev/null
+
+echo "> Run bcftools_annotate with --pair-logic option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --pair_logic "all"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate --pair-logic all -o annotated.vcf ../example.vcf"
+echo "- test10 succeeded -"
+
+popd > /dev/null
+
+# Test 11: regions-overlap
+mkdir "$TMPDIR/test11" && pushd "$TMPDIR/test11" > /dev/null
+
+echo "> Run bcftools_annotate with --regions-overlap option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --regions_overlap "1"
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate --regions-overlap 1 -o annotated.vcf ../example.vcf"
+echo "- test11 succeeded -"
+
+popd > /dev/null
+
+# Test 12: include 
+mkdir "$TMPDIR/test12" && pushd "$TMPDIR/test12" > /dev/null
+
+echo "> Run bcftools_annotate with -i option"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --include "FILTER='PASS'" \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate -i FILTER='PASS' -o annotated.vcf ../example.vcf"
+echo "- test12 succeeded -"
+
+popd > /dev/null
+
+# Test 13: exclude
+mkdir "$TMPDIR/test13" && pushd "$TMPDIR/test13" > /dev/null
+
+echo "> Run bcftools_annotate with -e option"
+"$meta_executable" \
+  --annotations "../annots.tsv.gz" \
+  --input "../example.vcf" \
+  --output "annotated.vcf" \
+  --exclude "FILTER='PASS'" \
+  --header_lines "../header.hdr" \
+  --columns "CHROM,FROM,TO,FMT/FOO,BAR" \
+  --merge_logic "FOO:first" \
+
+# checks
+assert_file_exists "annotated.vcf"
+assert_file_not_empty "annotated.vcf"
+assert_file_contains "annotated.vcf" "annotate -a ../annots.tsv.gz -c CHROM,FROM,TO,FMT/FOO,BAR -e FILTER='PASS' -h ../header.hdr -l FOO:first -o annotated.vcf ../example.vcf"
+echo "- test13 succeeded -"
+
+popd > /dev/null
+
+
+echo "---- All tests succeeded! ----"
+exit 0
+

From dc7b33d51f274cb156b1f1b0fbdc6fed0b757720 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Tue, 10 Sep 2024 16:15:44 +0200
Subject: [PATCH 14/42] Bcftools Norm (#144)

* Initial Commit

* config and help.txt

* script.sh

* test template

* More tests and debugging

* test 5 and 6

* test 7, 8, 9

* Update test.sh

* fixing bug on config

* Changelog

* Update config.vsh.yaml

* Requested changes

* Bug fixing

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                               |   1 +
 src/bcftools/bcftools_norm/config.vsh.yaml | 194 +++++++++++++++++
 src/bcftools/bcftools_norm/help.txt        |  41 ++++
 src/bcftools/bcftools_norm/script.sh       |  49 +++++
 src/bcftools/bcftools_norm/test.sh         | 231 +++++++++++++++++++++
 5 files changed, 516 insertions(+)
 create mode 100644 src/bcftools/bcftools_norm/config.vsh.yaml
 create mode 100644 src/bcftools/bcftools_norm/help.txt
 create mode 100644 src/bcftools/bcftools_norm/script.sh
 create mode 100644 src/bcftools/bcftools_norm/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index bb640d50..25850193 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -42,6 +42,7 @@
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
 * `bcftools`:
+  - `bcftools_norm`: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).
   - `bcftools_annotate`: Add or remove annotations from a VCF/BCF file (PR #143).
   - `bcftools/bcftools_stats`: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
   - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
diff --git a/src/bcftools/bcftools_norm/config.vsh.yaml b/src/bcftools/bcftools_norm/config.vsh.yaml
new file mode 100644
index 00000000..5c525d3a
--- /dev/null
+++ b/src/bcftools/bcftools_norm/config.vsh.yaml
@@ -0,0 +1,194 @@
+name: bcftools_norm
+namespace: bcftools
+description: | 
+  Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; 
+  recover multiallelics from multiple rows. 
+keywords: [Normalize, VCF, BCF]
+links:
+  homepage: https://samtools.github.io/bcftools/
+  documentation: https://samtools.github.io/bcftools/bcftools.html#norm
+  repository: https://github.com/samtools/bcftools
+  issue_tracker: https://github.com/samtools/bcftools/issues
+references:
+  doi: https://doi.org/10.1093/gigascience/giab008
+license: MIT/Expat, GNU
+requirements:
+  commands: [bcftools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [author]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input VCF/BCF file.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        direction: output
+        type: file
+        description: Output normalized VCF/BCF file.
+        required: true
+         
+  - name: Options
+    arguments:
+      
+      - name: --atomize
+        alternatives: -a
+        type: boolean_true
+        description: |
+          Decompose complex variants (e.g., MNVs become consecutive SNVs).
+
+      - name: --atom_overlaps
+        type: string
+        choices: [".", "*"]
+        description: | 
+          Use the star allele (*) for overlapping alleles or set to missing (.).
+
+      - name: --check_ref
+        alternatives: -c
+        type: string
+        choices: ['e', 'w', 'x', 's']
+        description: | 
+          Check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites.
+
+      - name: --remove_duplicates
+        alternatives: -d
+        type: string
+        choices: ['snps', 'indels', 'both', 'all', 'exact', 'none']
+        description: Remove duplicate snps, indels, both, all, exact matches, or none (old -D option).
+
+      - name: --fasta_ref
+        alternatives: -f
+        type: file
+        description: Reference fasta sequence file.
+
+      - name: --force
+        type: boolean_true
+        description: | 
+          Try to proceed even if malformed tags are encountered. 
+          Experimental, use at your own risk.
+
+      - name: --keep_sum
+        type: string
+        description: | 
+          Keep vector sum constant when splitting multiallelics (see github issue #360).
+
+      - name: --multiallelics
+        alternatives: -m
+        type: string
+        choices: ['+snps', '+indels', '+both', '+any', '-snps', '-indels', '-both', '-any']
+        description: | 
+          Split multiallelics (-) or join biallelics (+), type: snps, indels, both, any [default: both].
+
+      - name: --no_version
+        type: boolean_true
+        description: Do not append version and command line information to the header.
+
+      - name: --do_not_normalize
+        alternatives: -N
+        type: boolean_true
+        description: Do not normalize indels (with -m or -c s).
+      
+      - name: --output_type
+        alternatives: --O
+        type: string
+        choices: ['u', 'z', 'b', 'v']
+        description: | 
+          Output type:
+            u: uncompressed BCF
+            z: compressed VCF
+            b: compressed BCF
+            v: uncompressed VCF
+      
+      - name: --old_rec_tag
+        type: string
+        description: Annotate modified records with INFO/STR indicating the original variant.
+
+      - name: --regions
+        alternatives: --r
+        type: string
+        description: | 
+          Restrict to comma-separated list of regions. 
+          Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
+        example: '20:1000000-2000000'
+
+      - name: --regions_file
+        alternatives: --R
+        type: file
+        description: | 
+          Restrict to regions listed in a file. 
+          Regions can be specified either on a VCF, BED, or tab-delimited file (the default). 
+          For more information check manual.
+
+      - name: --regions_overlap
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          This option controls how overlapping records are determined: 
+          set to 'pos' or '0' if the VCF record has to have POS inside a region (this corresponds to the default behavior of -t/-T); 
+          set to 'record' or '1' if also overlapping records with POS outside a region should be included (this is the default behavior of -r/-R, 
+          and includes indels with POS at the end of a region, which are technically outside the region); 
+          or set to 'variant' or '2' to include only true overlapping variation (compare the full VCF representation "TA>T-" vs the true sequence variation "A>-").
+
+      - name: --site_win
+        alternatives: -w
+        type: integer
+        description: | 
+          Buffer for sorting lines that changed position during realignment.
+
+      - name: --strict_filter
+        alternatives: -s
+        type: boolean_true
+        description: When merging (-m+), merged site is PASS only if all sites being merged PASS.
+
+      - name: --targets
+        alternatives: -t
+        type: string
+        description: Similar to --regions but streams rather than index-jumps.
+        example: '20:1000000-2000000'
+
+      - name: --targets_file
+        alternatives: -T
+        type: file
+        description: Similar to --regions_file but streams rather than index-jumps.
+
+      - name: --targets_overlap
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          Include if POS in the region (0), record overlaps (1), variant overlaps (2).
+          Similar to --regions_overlap.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bcftools, procps]
+      - type: docker
+        run: |
+          echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
+    test_setup:  
+      - type: apt  
+        packages: [tabix]
+
+runners:
+  - type: executable
+  - type: nextflow
+
+
diff --git a/src/bcftools/bcftools_norm/help.txt b/src/bcftools/bcftools_norm/help.txt
new file mode 100644
index 00000000..02e9761a
--- /dev/null
+++ b/src/bcftools/bcftools_norm/help.txt
@@ -0,0 +1,41 @@
+```
+bcftools norm -h
+```
+
+About:   Left-align and normalize indels; check if REF alleles match the reference;
+         split multiallelic sites into multiple rows; recover multiallelics from
+         multiple rows.
+Usage:   bcftools norm [options] <in.vcf.gz>
+
+Options:
+    -a, --atomize                   Decompose complex variants (e.g. MNVs become consecutive SNVs)
+        --atom-overlaps '*'|.       Use the star allele (*) for overlapping alleles or set to missing (.) [*]
+    -c, --check-ref e|w|x|s         Check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites [e]
+    -D, --remove-duplicates         Remove duplicate lines of the same type.
+    -d, --rm-dup TYPE               Remove duplicate snps|indels|both|all|exact
+    -f, --fasta-ref FILE            Reference sequence
+        --force                     Try to proceed even if malformed tags are encountered. Experimental, use at your own risk
+        --keep-sum TAG,..           Keep vector sum constant when splitting multiallelics (see github issue #360)
+    -m, --multiallelics -|+TYPE     Split multiallelics (-) or join biallelics (+), type: snps|indels|both|any [both]
+        --no-version                Do not append version and command line to the header
+    -N, --do-not-normalize          Do not normalize indels (with -m or -c s)
+        --old-rec-tag STR           Annotate modified records with INFO/STR indicating the original variant
+    -o, --output FILE               Write output to a file [standard output]
+    -O, --output-type u|b|v|z[0-9]  u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
+    -r, --regions REGION            Restrict to comma-separated list of regions
+    -R, --regions-file FILE         Restrict to regions listed in a file
+        --regions-overlap 0|1|2     Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
+    -s, --strict-filter             When merging (-m+), merged site is PASS only if all sites being merged PASS
+    -t, --targets REGION            Similar to -r but streams rather than index-jumps
+    -T, --targets-file FILE         Similar to -R but streams rather than index-jumps
+        --targets-overlap 0|1|2     Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
+        --threads INT               Use multithreading with <int> worker threads [0]
+    -w, --site-win INT              Buffer for sorting lines which changed position during realignment [1000]
+
+Examples:
+   # normalize and left-align indels
+   bcftools norm -f ref.fa in.vcf
+
+   # split multi-allelic sites
+   bcftools norm -m- in.vcf
+
diff --git a/src/bcftools/bcftools_norm/script.sh b/src/bcftools/bcftools_norm/script.sh
new file mode 100644
index 00000000..0f43e593
--- /dev/null
+++ b/src/bcftools/bcftools_norm/script.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_atomize
+    par_remove_duplicates
+    par_force
+    par_no_version
+    par_do_not_normalize
+    par_strict_filter
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Execute bcftools norm with the provided arguments
+bcftools norm \
+    ${par_atomize:+--atomize} \
+    ${par_atom_overlaps:+--atom-overlaps "$par_atom_overlaps"} \
+    ${par_check_ref:+-c "$par_check_ref"} \
+    ${par_remove_duplicates:+-d "$par_remove_duplicates"} \
+    ${par_fasta_ref:+-f "$par_fasta_ref"} \
+    ${par_force:+--force} \
+    ${par_keep_sum:+--keep-sum "$par_keep_sum"} \
+    ${par_multiallelics:+-m "$par_multiallelics"} \
+    ${par_no_version:+--no-version} \
+    ${par_do_not_normalize:+-N} \
+    ${par_old_rec_tag:+--old-rec-tag "$par_old_rec_tag"} \
+    ${par_regions:+-r "$par_regions"} \
+    ${par_regions_file:+-R "$par_regions_file"} \
+    ${par_regions_overlap:+--regions-overlap "$par_regions_overlap"} \
+    ${par_site_win:+-w "$par_site_win"} \
+    ${par_strict_filter:+-s} \
+    ${par_targets:+-t "$par_targets"} \
+    ${par_targets_file:+-T "$par_targets_file"} \
+    ${par_targets_overlap:+--targets-overlap "$par_targets_overlap"} \
+    ${meta_cpus:+--threads "$meta_cpus"} \
+    ${par_output_type:+-O "$par_output_type"} \
+    -o $par_output \
+    $par_input
+    
diff --git a/src/bcftools/bcftools_norm/test.sh b/src/bcftools/bcftools_norm/test.sh
new file mode 100644
index 00000000..254c7176
--- /dev/null
+++ b/src/bcftools/bcftools_norm/test.sh
@@ -0,0 +1,231 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+#test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/example.vcf"
+##fileformat=VCFv4.1
+##contig=<ID=1,length=249250621,assembly=b37>
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1
+1	752567	llama	G	C,A	.	.	.	.	1/2
+1	752722	.	G	A,AAA	.	.	.	.	./.
+EOF
+
+bgzip -c $TMPDIR/example.vcf > $TMPDIR/example.vcf.gz
+tabix -p vcf $TMPDIR/example.vcf.gz
+
+cat <<EOF > "$TMPDIR/reference.fa"
+>1
+ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG
+>2
+CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGAT
+EOF
+
+# Test 1: Remove ID annotations
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bcftools_norm"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --atom_overlaps "." \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "bcftools_normCommand=norm --atomize --atom-overlaps . -o normalized.vcf ../example.vcf"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: Check reference
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bcftools_norm with remove duplicates"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --remove_duplicates 'all' \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -d all -o normalized.vcf ../example.vcf"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3: Check reference and fasta reference
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bcftools_norm with check reference and fasta reference"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --fasta_ref "../reference.fa" \
+  --check_ref "e" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -c e -f ../reference.fa -o normalized.vcf ../example.vcf"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: Multiallelics
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bcftools_norm with multiallelics"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --multiallelics "-any" \
+  --old_rec_tag "wazzaaa" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm -m -any --old-rec-tag wazzaaa -o normalized.vcf ../example.vcf"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: Regions
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bcftools_norm with regions"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --output "normalized.vcf" \
+  --atomize \
+  --regions "1:752567-752722" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -r 1:752567-752722 -o normalized.vcf ../example.vcf.gz"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: Targets
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bcftools_norm with targets"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --targets "1:752567-752722" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -t 1:752567-752722 -o normalized.vcf ../example.vcf"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: Regions overlap
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bcftools_norm with regions overlap"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --regions_overlap "pos" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize --regions-overlap pos -o normalized.vcf ../example.vcf"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: Strict filter and targets overlap
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bcftools_norm with strict filter and targets overlap"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --atomize \
+  --strict_filter \
+  --targets_overlap "1" \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -s --targets-overlap 1 -o normalized.vcf ../example.vcf"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+# Test 9: Do not normalize
+mkdir "$TMPDIR/test9" && pushd "$TMPDIR/test9" > /dev/null
+
+echo "> Run bcftools_norm with do not normalize"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --output "normalized.vcf" \
+  --do_not_normalize \
+  --atomize \
+  &> /dev/null
+
+# checks
+assert_file_exists "normalized.vcf"
+assert_file_not_empty "normalized.vcf"
+assert_file_contains "normalized.vcf" "norm --atomize -N -o normalized.vcf ../example.vcf"
+echo "- test9 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
+
+

From bd8ca889d13784c5a7502bb977c6659fe420d973 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Tue, 10 Sep 2024 16:17:22 +0200
Subject: [PATCH 15/42] Bcftools Concat (#145)

* Initial Commint

* Create help.txt

* Update config.vsh.yaml

* Update config.vsh.yaml

* Update config.vsh.yaml

* Update script.sh

* add template for tests

* Update test.sh

* small changes in config file

* adding more tests

* adding more test

* Update CHANGELOG.md

---------

Co-authored-by: Jakub Majercik <57993790+jakubmajercik@users.noreply.github.com>
---
 CHANGELOG.md                                 |   5 +-
 src/bcftools/bcftools_concat/config.vsh.yaml | 172 ++++++++++++++
 src/bcftools/bcftools_concat/help.txt        |  36 +++
 src/bcftools/bcftools_concat/script.sh       |  54 +++++
 src/bcftools/bcftools_concat/test.sh         | 227 +++++++++++++++++++
 5 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 src/bcftools/bcftools_concat/config.vsh.yaml
 create mode 100644 src/bcftools/bcftools_concat/help.txt
 create mode 100644 src/bcftools/bcftools_concat/script.sh
 create mode 100644 src/bcftools/bcftools_concat/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 25850193..034e2422 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -42,8 +42,9 @@
 * `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).
 
 * `bcftools`:
-  - `bcftools_norm`: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).
-  - `bcftools_annotate`: Add or remove annotations from a VCF/BCF file (PR #143).
+  - `bcftools/bcftools_concat`: Concatenate or combine VCF/BCF files (PR #145).
+  - `bcftools/bcftools_norm`: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).
+  - `bcftools/bcftools_annotate`: Add or remove annotations from a VCF/BCF file (PR #143).
   - `bcftools/bcftools_stats`: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
   - `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).
 
diff --git a/src/bcftools/bcftools_concat/config.vsh.yaml b/src/bcftools/bcftools_concat/config.vsh.yaml
new file mode 100644
index 00000000..2bb32f1c
--- /dev/null
+++ b/src/bcftools/bcftools_concat/config.vsh.yaml
@@ -0,0 +1,172 @@
+name: bcftools_concat
+namespace: bcftools
+description: | 
+  Concatenate or combine VCF/BCF files. All source files must have the same sample
+  columns appearing in the same order. The program can be used, for example, to
+  concatenate chromosome VCFs into one VCF, or combine a SNP VCF and an indel
+  VCF into one. The input files must be sorted by chr and position. The files
+  must be given in the correct order to produce sorted VCF on output unless
+  the -a, --allow-overlaps option is specified. With the --naive option, the files
+  are concatenated without being recompressed, which is very fast.
+keywords: [Concatenate, VCF, BCF]
+links:
+  homepage: https://samtools.github.io/bcftools/
+  documentation: https://samtools.github.io/bcftools/bcftools.html#concat
+  repository: https://github.com/samtools/bcftools
+  issue_tracker: https://github.com/samtools/bcftools/issues
+references:
+  doi: https://doi.org/10.1093/gigascience/giab008
+license: MIT/Expat, GNU
+requirements:
+  commands: [bcftools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [author]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        multiple: true
+        description: Input VCF/BCF files to concatenate.
+      
+      - name: --file_list
+        alternatives: -f
+        type: file
+        description: Read the list of VCF/BCF files from a file, one file name per line.
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        direction: output
+        type: file
+        description: Output concatenated VCF/BCF file.
+        required: true
+         
+  - name: Options
+    arguments:
+      
+      - name: --allow_overlaps
+        alternatives: -a
+        type: boolean_true
+        description:  | 
+          First coordinate of the next file can precede last record of the current file.
+      
+      - name: --compact_PS
+        alternatives: -c
+        type: boolean_true
+        description: | 
+          Do not output PS tag at each site, only at the start of a new phase set block.
+      
+      - name: --remove_duplicates
+        alternatives: -d
+        type: string
+        choices: ['snps', 'indels', 'both', 'all', 'exact', 'none']
+        description: |
+          Output duplicate records present in multiple files only once: <snps|indels|both|all|exact>.
+              
+      - name: --ligate
+        alternatives: -l
+        type: boolean_true
+        description: Ligate phased VCFs by matching phase at overlapping haplotypes.
+      
+      - name: --ligate_force
+        type: boolean_true
+        description: Ligate even non-overlapping chunks, keep all sites.
+      
+      - name: --ligate_warn
+        type: boolean_true
+        description: Drop sites in imperfect overlaps.
+
+      - name: --no_version
+        type: boolean_true
+        description: Do not append version and command line information to the header.
+        
+      - name: --naive
+        alternatives: -n
+        type: boolean_true
+        description: Concatenate files without recompression, a header check compatibility is performed.
+      
+      - name: --naive_force
+        type: boolean_true
+        description: | 
+          Same as --naive, but header compatibility is not checked. 
+          Dangerous, use with caution.
+
+      - name: --output_type
+        alternatives: -O
+        type: string
+        choices: ['u', 'z', 'b', 'v']
+        description: | 
+          Output type:
+            u: uncompressed BCF
+            z: compressed VCF
+            b: compressed BCF
+            v: uncompressed VCF
+    
+      - name: --min_PQ
+        alternatives: -q
+        type: integer
+        description: Break phase set if phasing quality is lower than <int>.
+        example: 30
+
+      - name: --regions
+        alternatives: -r
+        type: string
+        description: | 
+          Restrict to comma-separated list of regions. 
+          Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
+        example: '20:1000000-2000000'
+
+      - name: --regions_file
+        alternatives: -R
+        type: file
+        description: | 
+          Restrict to regions listed in a file. 
+          Regions can be specified either on a VCF, BED, or tab-delimited file (the default). 
+          For more information check manual.
+
+      - name: --regions_overlap
+        type: string
+        choices: ['pos', 'record', 'variant', '0', '1', '2']
+        description: | 
+          This option controls how overlapping records are determined: 
+          set to 'pos' or '0' if the VCF record has to have POS inside a region (this corresponds to the default behavior of -t/-T); 
+          set to 'record' or '1' if also overlapping records with POS outside a region should be included (this is the default behavior of -r/-R, 
+          and includes indels with POS at the end of a region, which are technically outside the region); 
+          or set to 'variant' or '2' to include only true overlapping variation (compare the full VCF representation "TA>T-" vs the true sequence variation "A>-").
+
+      #PS: Verbose seems to be broken in this version of bcftools
+      # - name: --verbose
+      #   alternatives: -v
+      #   type: integer
+      #   choices: [0, 1]
+      #   description: Set verbosity level.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bcftools, procps]
+      - type: docker
+        run: |
+          echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
+    test_setup:  
+      - type: apt  
+        packages: [tabix]
+
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/bcftools/bcftools_concat/help.txt b/src/bcftools/bcftools_concat/help.txt
new file mode 100644
index 00000000..fc0f1914
--- /dev/null
+++ b/src/bcftools/bcftools_concat/help.txt
@@ -0,0 +1,36 @@
+```
+bcftools concat -h
+```
+
+concat: option requires an argument -- 'h'
+
+About:   Concatenate or combine VCF/BCF files. All source files must have the same sample
+         columns appearing in the same order. The program can be used, for example, to
+         concatenate chromosome VCFs into one VCF, or combine a SNP VCF and an indel
+         VCF into one. The input files must be sorted by chr and position. The files
+         must be given in the correct order to produce sorted VCF on output unless
+         the -a, --allow-overlaps option is specified. With the --naive option, the files
+         are concatenated without being recompressed, which is very fast.
+Usage:   bcftools concat [options] <A.vcf.gz> [<B.vcf.gz> [...]]
+
+Options:
+   -a, --allow-overlaps           First coordinate of the next file can precede last record of the current file.
+   -c, --compact-PS               Do not output PS tag at each site, only at the start of a new phase set block.
+   -d, --rm-dups STRING           Output duplicate records present in multiple files only once: <snps|indels|both|all|exact>
+   -D, --remove-duplicates        Alias for -d exact
+   -f, --file-list FILE           Read the list of files from a file.
+   -l, --ligate                   Ligate phased VCFs by matching phase at overlapping haplotypes
+       --ligate-force             Ligate even non-overlapping chunks, keep all sites
+       --ligate-warn              Drop sites in imperfect overlaps
+       --no-version               Do not append version and command line to the header
+   -n, --naive                    Concatenate files without recompression, a header check compatibility is performed
+       --naive-force              Same as --naive, but header compatibility is not checked. Dangerous, use with caution.
+   -o, --output FILE              Write output to a file [standard output]
+   -O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
+   -q, --min-PQ INT               Break phase set if phasing quality is lower than <int> [30]
+   -r, --regions REGION           Restrict to comma-separated list of regions
+   -R, --regions-file FILE        Restrict to regions listed in a file
+       --regions-overlap 0|1|2    Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
+       --threads INT              Use multithreading with <int> worker threads [0]
+   -v, --verbose 0|1              Set verbosity level [1]
+
diff --git a/src/bcftools/bcftools_concat/script.sh b/src/bcftools/bcftools_concat/script.sh
new file mode 100644
index 00000000..5614cd1b
--- /dev/null
+++ b/src/bcftools/bcftools_concat/script.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=(
+    par_allow_overlaps
+    par_compact_PS
+    par_ligate
+    par_ligate_force
+    par_ligate_warn
+    par_no_version
+    par_naive
+    par_naive_force
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Check to see whether the par_input or the par_file_list is set
+if [[ -z "${par_input}" && -z "${par_file_list}" ]]; then
+    echo "Error: One of the parameters '--input' or '--file_list' must be used."
+    exit 1
+fi
+
+# Create input array 
+IFS=";" read -ra input <<< $par_input
+
+# Execute bcftools concat with the provided arguments
+bcftools concat \
+    ${par_allow_overlaps:+-a} \
+    ${par_compact_PS:+-c} \
+    ${par_remove_duplicates:+-d "$par_remove_duplicates"} \
+    ${par_ligate:+-l} \
+    ${par_ligate_force:+--ligate-force} \
+    ${par_ligate_warn:+--ligate-warn} \
+    ${par_no_version:+--no-version} \
+    ${par_naive:+-n} \
+    ${par_naive_force:+--naive-force} \
+    ${par_output_type:+--O "$par_output_type"} \
+    ${par_min_PQ:+-q "$par_min_PQ"} \
+    ${par_regions:+-r "$par_regions"} \
+    ${par_regions_file:+-R "$par_regions_file"} \
+    ${par_regions_overlap:+--regions-overlap "$par_regions_overlap"} \
+    ${meta_cpus:+--threads "$meta_cpus"} \
+    -o $par_output \
+    ${par_file_list:+-f "$par_file_list"} \
+    ${input[@]} \
\ No newline at end of file
diff --git a/src/bcftools/bcftools_concat/test.sh b/src/bcftools/bcftools_concat/test.sh
new file mode 100644
index 00000000..3c1c7bb6
--- /dev/null
+++ b/src/bcftools/bcftools_concat/test.sh
@@ -0,0 +1,227 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Exit on error
+set -eo pipefail
+
+#test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+# Create directories for tests
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Create test data
+cat <<EOF > "$TMPDIR/example.vcf"
+##fileformat=VCFv4.1
+##contig=<ID=1,length=249250621,assembly=b37>
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1
+1	752567	llama	G	C,A	15	.	.	.	1/2
+1	752752	.	G	A,AAA	20	.	.	.	./.
+EOF
+
+bgzip -c $TMPDIR/example.vcf > $TMPDIR/example.vcf.gz
+tabix -p vcf $TMPDIR/example.vcf.gz
+
+cat <<EOF > "$TMPDIR/example_2.vcf"
+##fileformat=VCFv4.1
+##contig=<ID=1,length=249250621,assembly=b37>
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1
+1	752569	cat	G	C,A	15	.	.	.	1/2
+1	752739	.	G	A,AAA	20	.	.	.	./.
+EOF
+
+bgzip -c $TMPDIR/example_2.vcf > $TMPDIR/example_2.vcf.gz
+tabix -p vcf $TMPDIR/example_2.vcf.gz
+
+cat <<EOF > "$TMPDIR/file_list.txt"
+$TMPDIR/example.vcf.gz
+$TMPDIR/example_2.vcf.gz
+EOF
+
+# Test 1: Default test
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bcftools_concat default test"
+"$meta_executable" \
+  --input "../example.vcf" \
+  --input "../example_2.vcf" \
+  --output "concatenated.vcf" \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -o concatenated.vcf ../example.vcf ../example_2.vcf"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2: Allow overlaps, compact PS and remove duplicates
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bcftools_concat test with allow overlaps, and remove duplicates"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf" \
+  --allow_overlaps \
+  --remove_duplicates 'none' \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -a -d none -o concatenated.vcf ../example.vcf.gz ../example_2.vcf.gz"  
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+
+# Test 3: Ligate, ligate force and ligate warn
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bcftools_concat test with ligate, ligate force and ligate warn"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf" \
+  --ligate \
+  --compact_PS \
+  &> /dev/null
+
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -c -l -o concatenated.vcf ../example.vcf.gz ../example_2.vcf.gz"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4: file list with ligate force and ligate warn
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bcftools_concat test with file list, ligate force and ligate warn"
+"$meta_executable" \
+  --file_list "../file_list.txt" \
+  --output "concatenated.vcf" \
+  --ligate_force \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat --ligate-force -o concatenated.vcf -f ../file_list.txt"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5: ligate warn and naive
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bcftools_concat test with ligate warn and naive"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf.gz" \
+  --ligate_warn \
+  --naive \
+  &> /dev/null
+
+bgzip -d concatenated.vcf.gz
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "##fileformat=VCFv4.1"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6: minimal PQ
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bcftools_concat test with minimal PQ"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf" \
+  --min_PQ 20 \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -q 20 -o concatenated.vcf ../example.vcf.gz ../example_2.vcf.gz"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7: regions
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bcftools_concat test with regions"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf" \
+  --allow_overlaps \
+  --regions "1:752569-752739" \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -a -r 1:752569-752739 -o concatenated.vcf ../example.vcf.gz ../example_2.vcf.gz"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: regions overlap
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bcftools_concat test with regions overlap"
+"$meta_executable" \
+  --input "../example.vcf.gz" \
+  --input "../example_2.vcf.gz" \
+  --output "concatenated.vcf" \
+  --allow_overlaps \
+  --regions_overlap 'pos' \
+  &> /dev/null
+
+# checks
+assert_file_exists "concatenated.vcf"
+assert_file_not_empty "concatenated.vcf"
+assert_file_contains "concatenated.vcf" "concat -a --regions-overlap pos -o concatenated.vcf ../example.vcf.gz ../example_2.vcf.gz"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
+
+
+

From 3f6a1b52f8aedb15ec3bd6e243de3267a94e4e2e Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Fri, 13 Sep 2024 09:08:23 +0200
Subject: [PATCH 16/42] Umitools prepare for rsem (#148)

---
 CHANGELOG.md                                  |   3 +-
 .../umi_tools_prepareforrsem/config.vsh.yaml  | 107 +++++++
 .../umi_tools_prepareforrsem/help.txt         |  54 ++++
 .../prepare-for-rsem.py                       | 271 ++++++++++++++++++
 .../umi_tools_prepareforrsem/script.sh        |  32 +++
 .../umi_tools_prepareforrsem/test.sh          |  55 ++++
 .../test_data/log.log                         | 103 +++++++
 .../test_data/test.bam                        | Bin 0 -> 11123 bytes
 .../test_data/test.sam                        | 119 ++++++++
 .../test_data/test_dedup.bam                  | Bin 0 -> 18822 bytes
 .../test_data/test_dedup.sam                  | 201 +++++++++++++
 11 files changed, 944 insertions(+), 1 deletion(-)
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/config.vsh.yaml
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/help.txt
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/prepare-for-rsem.py
 create mode 100755 src/umi_tools/umi_tools_prepareforrsem/script.sh
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test.sh
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test_data/log.log
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test_data/test.bam
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test_data/test.sam
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.bam
 create mode 100644 src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.sam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 034e2422..d88d0996 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -137,7 +137,8 @@
     - `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTA (PR #53).
 
 * `umi_tools`:
-    -`umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).
+    - `umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).
+    - `umi_tools/umi_tools_prepareforrsem`: Fix paired-end reads in name sorted BAM file to prepare for RSEM (PR #148).
 
 * `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).
 
diff --git a/src/umi_tools/umi_tools_prepareforrsem/config.vsh.yaml b/src/umi_tools/umi_tools_prepareforrsem/config.vsh.yaml
new file mode 100644
index 00000000..ceac2052
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/config.vsh.yaml
@@ -0,0 +1,107 @@
+name: "umi_tools_prepareforrsem"
+namespace: "umi_tools"
+description: Make the output from umi-tools dedup or group compatible with RSEM
+keywords: [umi_tools, rsem, bam, sam]
+links:
+  homepage: https://umi-tools.readthedocs.io/en/latest/
+  documentation: https://umi-tools.readthedocs.io/en/latest/reference/extract.html
+  repository: https://github.com/CGATOxford/UMI-tools
+references: 
+  doi: 10.1101/gr.209601.116
+license: MIT
+
+argument_groups:
+- name: "Input"
+  arguments:  
+  - name: "--input"
+    alternatives: ["-I", "--stdin"]
+    type: file
+    required: true
+    example: $id.transcriptome.bam
+
+- name: "Output"
+  arguments:    
+  - name: "--output"
+    alternatives: ["-S", "--stdout"]
+    type: file
+    direction: output
+    example: $id.transcriptome_sorted.bam
+  - name: "--log"
+    alternatives: ["-L"]
+    type: file
+    direction: output
+    description: File with logging information [default = stdout].
+  - name: "--error"
+    alternatives: ["-E"]
+    type: file
+    direction: output
+    description: File with error information [default = stderr].
+  - name: "--log2stderr"
+    type: boolean_true
+    description: Send logging information to stderr [default = False].
+  - name: "--temp_dir"
+    type: string
+    description: |
+      Directory for temporary files. If not set, the bash environmental variable 
+      TMPDIR is used.
+  - name: "--compresslevel"
+    type: integer
+    description: |
+      Level of Gzip compression to use. Default (6) matchesGNU gzip rather than python 
+      gzip default (which is 9).
+
+- name: "Options"
+  arguments:
+  - name: "--tags"
+    type: string
+    description: |
+      Comma-seperated list of tags to transfer from read1 to read2 (Default: 'UG,BX')
+    example: "UG,BX"
+  - name: "--sam"
+    type: boolean_true
+    description: Input and output SAM rather than BAM.
+  - name: "--timeit"
+    type: string
+    description: |
+      Store timeing information in file [none].
+  - name: "--timeit_name"
+    type: string
+    description: |
+      Name in timing file for this class of jobs [all].
+  - name: "--timeit_header"
+    type: boolean_true
+    description: Add header for timing information [none].
+  - name: "--verbose"
+    alternatives: ["-v"]
+    type: integer
+    description: |
+      Loglevel [1]. The higher, the more output.
+  - name: "--random_seed"
+    type: integer
+    description: |
+      Random seed to initialize number generator with [none].
+  
+
+resources:
+  - type: bash_script
+    path: script.sh
+  # copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/prepare-for-rsem.py
+  - path: prepare-for-rsem.py
+test_resources:
+  - type: bash_script
+    path: test.sh  
+  - type: file
+    path: test_data
+  
+engines:
+  - type: docker
+    image: quay.io/biocontainers/umi_tools:1.1.5--py38h0020b31_3
+    setup:
+      - type: docker
+        run: |
+          umi_tools -v | sed 's/ version//g' > /var/software_versions.txt
+
+
+runners: 
+- type: executable
+- type: nextflow
\ No newline at end of file
diff --git a/src/umi_tools/umi_tools_prepareforrsem/help.txt b/src/umi_tools/umi_tools_prepareforrsem/help.txt
new file mode 100644
index 00000000..efaf4de6
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/help.txt
@@ -0,0 +1,54 @@
+```
+umi_tools prepare-for-rsem --help
+```
+
+prepare_for_rsem - make output from dedup or group compatible with RSEM
+
+Usage: umi_tools prepare_for_rsem [OPTIONS] [--stdin=IN_BAM] [--stdout=OUT_BAM]
+
+       note: If --stdout is ommited, standard out is output. To
+             generate a valid BAM file on standard out, please
+             redirect log with --log=LOGFILE or --log2stderr 
+
+For full UMI-tools documentation, see https://umi-tools.readthedocs.io/en/latest/
+
+Options:
+  --version             show program's version number and exit
+
+  RSEM preparation specific options:
+    --tags=TAGS         Comma-seperated list of tags to transfer from read1 to
+                        read2
+    --sam               input and output SAM rather than BAM
+
+  input/output options:
+    -I FILE, --stdin=FILE
+                        file to read stdin from [default = stdin].
+    -L FILE, --log=FILE
+                        file with logging information [default = stdout].
+    -E FILE, --error=FILE
+                        file with error information [default = stderr].
+    -S FILE, --stdout=FILE
+                        file where output is to go [default = stdout].
+    --temp-dir=FILE     Directory for temporary files. If not set, the bash
+                        environmental variable TMPDIR is used[default = None].
+    --log2stderr        send logging information to stderr [default = False].
+    --compresslevel=COMPRESSLEVEL
+                        Level of Gzip compression to use. Default (6)
+                        matchesGNU gzip rather than python gzip default (which
+                        is 9)
+
+  profiling options:
+    --timeit=TIMEIT_FILE
+                        store timeing information in file [none].
+    --timeit-name=TIMEIT_NAME
+                        name in timing file for this class of jobs [all].
+    --timeit-header     add header for timing information [none].
+
+  common options:
+    -v LOGLEVEL, --verbose=LOGLEVEL
+                        loglevel [1]. The higher, the more output.
+    -h, --help          output short help (command line options only).
+    --help-extended     Output full documentation
+    --random-seed=RANDOM_SEED
+                        random seed to initialize number generator with
+                        [none].
\ No newline at end of file
diff --git a/src/umi_tools/umi_tools_prepareforrsem/prepare-for-rsem.py b/src/umi_tools/umi_tools_prepareforrsem/prepare-for-rsem.py
new file mode 100644
index 00000000..b53d30ac
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/prepare-for-rsem.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+
+"""
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Credits
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This script is a clone of the "prepare-for-rsem.py" script written by
+Ian Sudbury, Tom Smith and other contributors to the UMI-tools package:
+https://github.com/CGATOxford/UMI-tools
+
+It has been included here to address problems encountered with
+Salmon quant and RSEM as discussed in the issue below:
+https://github.com/CGATOxford/UMI-tools/issues/465
+
+When the "umi_tools prepare-for-rsem" command becomes available in an official
+UMI-tools release this script will be replaced and deprecated.
+
+Commit:
+https://github.com/CGATOxford/UMI-tools/blob/bf8608d6a172c5ca0dcf33c126b4e23429177a72/umi_tools/prepare-for-rsem.py
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+prepare_for_rsem - make the output from dedup or group compatible with RSEM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The SAM format specification states that the mnext and mpos fields should point
+to the primary alignment of a read's mate. However, not all aligners adhere to
+this standard. In addition, the RSEM software requires that the mate of a read1
+appears directly after it in its input BAM. This requires that there is exactly
+one read1 alignment for every read2 and vice versa.
+
+In general (except in a few edge cases) UMI tools outputs only the read2 to that
+corresponds to the read specified in the mnext and mpos positions of a selected
+read1, and only outputs this read once, even if multiple read1s point to it.
+This makes UMI-tools outputs incompatible with RSEM. This script takes the output
+from dedup or groups and ensures that each read1 has exactly one read2 (and vice
+versa), that read2 always appears directly after read1,and that pairs point to
+each other (note this is technically not valid SAM format). Copy any specified
+tags from read1 to read2 if they are present (by default, UG and BX, the unique
+group and correct UMI tags added by _group_)
+
+Input must to name sorted.
+
+
+https://raw.githubusercontent.com/CGATOxford/UMI-tools/master/LICENSE
+
+"""
+
+from umi_tools import Utilities as U
+from collections import defaultdict, Counter
+import pysam
+import sys
+
+
+usage = """
+prepare_for_rsem - make output from dedup or group compatible with RSEM
+
+Usage: umi_tools prepare_for_rsem [OPTIONS] [--stdin=IN_BAM] [--stdout=OUT_BAM]
+
+       note: If --stdout is omited, standard out is output. To
+             generate a valid BAM file on standard out, please
+             redirect log with --log=LOGFILE or --log2stderr """
+
+
+def chunk_bam(bamfile):
+    """Take in a iterator of pysam.AlignmentSegment entries and yield
+    lists of reads that all share the same name"""
+
+    last_query_name = None
+    output_buffer = list()
+
+    for read in bamfile:
+        if last_query_name is not None and last_query_name != read.query_name:
+            yield (output_buffer)
+            output_buffer = list()
+
+        last_query_name = read.query_name
+        output_buffer.append(read)
+
+    yield (output_buffer)
+
+
+def copy_tags(tags, read1, read2):
+    """Given a  list of tags, copies the values of these tags from read1
+    to read2, if the tag is set"""
+
+    for tag in tags:
+        try:
+            read1_tag = read1.get_tag(tag, with_value_type=True)
+            read2.set_tag(tag, value=read1_tag[0], value_type=read1_tag[1])
+        except KeyError:
+            pass
+
+    return read2
+
+
+def pick_mate(read, template_dict, mate_key):
+    """Find the mate of read in the template dict using key. It will retrieve
+    all reads at that key, and then scan to pick the one that refers to _read_
+    as it's mate. If there is no such read, it picks a first one it comes to"""
+
+    mate = None
+
+    # get a list of secondary reads at the correct alignment position
+    potential_mates = template_dict[not read.is_read1][mate_key]
+
+    # search through one at a time to find a read that points to the current read
+    # as its mate.
+    for candidate_mate in potential_mates:
+        if (
+            candidate_mate.next_reference_name == read.reference_name
+            and candidate_mate.next_reference_start == read.pos
+        ):
+            mate = candidate_mate
+
+    # if no such read is found, then pick any old secondary alignment at that position
+    # note: this happens when UMI-tools outputs the wrong read as something's pair.
+    if mate is None and len(potential_mates) > 0:
+        mate = potential_mates[0]
+
+    return mate
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv
+
+    # setup command line parser
+    parser = U.OptionParser(version="%prog version: $Id$", usage=usage, description=globals()["__doc__"])
+    group = U.OptionGroup(parser, "RSEM preparation specific options")
+
+    group.add_option(
+        "--tags",
+        dest="tags",
+        type="string",
+        default="UG,BX",
+        help="Comma-separated list of tags to transfer from read1 to read2",
+    )
+    group.add_option(
+        "--sam", dest="sam", action="store_true", default=False, help="input and output SAM rather than BAM"
+    )
+
+    parser.add_option_group(group)
+
+    # add common options (-h/--help, ...) and parse command line
+    (options, args) = U.Start(
+        parser, argv=argv, add_group_dedup_options=False, add_umi_grouping_options=False, add_sam_options=False
+    )
+
+    skipped_stats = Counter()
+
+    if options.stdin != sys.stdin:
+        in_name = options.stdin.name
+        options.stdin.close()
+    else:
+        in_name = "-"
+
+    if options.sam:
+        mode = ""
+    else:
+        mode = "b"
+
+    inbam = pysam.AlignmentFile(in_name, "r" + mode)
+
+    if options.stdout != sys.stdout:
+        out_name = options.stdout.name
+        options.stdout.close()
+    else:
+        out_name = "-"
+
+    outbam = pysam.AlignmentFile(out_name, "w" + mode, template=inbam)
+
+    options.tags = options.tags.split(",")
+
+    for template in chunk_bam(inbam):
+        assert len(set(r.query_name for r in template)) == 1
+        current_template = {True: defaultdict(list), False: defaultdict(list)}
+
+        for read in template:
+            key = (read.reference_name, read.pos, not read.is_secondary)
+            current_template[read.is_read1][key].append(read)
+
+        output = set()
+
+        for read in template:
+            mate = None
+
+            # if this read is a non_primary alignment, we first want to check if it has a mate
+            # with the non-primary alignment flag set.
+
+            mate_key_primary = True
+            mate_key_secondary = (read.next_reference_name, read.next_reference_start, False)
+
+            # First look for a read that has the same primary/secondary status
+            # as read (i.e. secondary mate for secondary read, and primary mate
+            # for primary read)
+            mate_key = (read.next_reference_name, read.next_reference_start, read.is_secondary)
+            mate = pick_mate(read, current_template, mate_key)
+
+            # If none was found then look for the opposite (primary mate of secondary
+            # read or seconadary mate of primary read)
+            if mate is None:
+                mate_key = (read.next_reference_name, read.next_reference_start, not read.is_secondary)
+                mate = pick_mate(read, current_template, mate_key)
+
+            # If we still don't have a mate, then their can't be one?
+            if mate is None:
+                skipped_stats["no_mate"] += 1
+                U.warn(
+                    "Alignment {} has no mate -- skipped".format(
+                        "\t".join(map(str, [read.query_name, read.flag, read.reference_name, int(read.pos)]))
+                    )
+                )
+                continue
+
+            # because we might want to make changes to the read, but not have those changes reflected
+            # if we need the read again,we copy the read. This is only way I can find to do this.
+            read = pysam.AlignedSegment().from_dict(read.to_dict(), read.header)
+            mate = pysam.AlignedSegment().from_dict(mate.to_dict(), read.header)
+
+            # Make it so that if our read is secondary, the mate is also secondary. We don't make the
+            # mate primary if the read is primary because we would otherwise end up with mulitple
+            # primary alignments.
+            if read.is_secondary:
+                mate.is_secondary = True
+
+            # In a situation where there is already one mate for each read, then we will come across
+            # each pair twice - once when we scan read1 and once when we scan read2. Thus we need
+            # to make sure we don't output something already output.
+            if read.is_read1:
+                mate = copy_tags(options.tags, read, mate)
+                output_key = str(read) + str(mate)
+
+                if output_key not in output:
+                    output.add(output_key)
+                    outbam.write(read)
+                    outbam.write(mate)
+                    skipped_stats["pairs_output"] += 1
+
+            elif read.is_read2:
+                read = copy_tags(options.tags, mate, read)
+                output_key = str(mate) + str(read)
+
+                if output_key not in output:
+                    output.add(output_key)
+                    outbam.write(mate)
+                    outbam.write(read)
+                    skipped_stats["pairs_output"] += 1
+
+            else:
+                skipped_stats["skipped_not_read_12"] += 1
+                U.warn(
+                    "Alignment {} is neither read1 nor read2 -- skipped".format(
+                        "\t".join(map(str, [read.query_name, read.flag, read.reference_name, int(read.pos)]))
+                    )
+                )
+                continue
+
+    if not out_name == "-":
+        outbam.close()
+
+    U.info(
+        "Total pairs output: {}, Pairs skipped - no mates: {},"
+        " Pairs skipped - not read1 or 2: {}".format(
+            skipped_stats["pairs_output"], skipped_stats["no_mate"], skipped_stats["skipped_not_read12"]
+        )
+    )
+    U.Stop()
+
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv))
diff --git a/src/umi_tools/umi_tools_prepareforrsem/script.sh b/src/umi_tools/umi_tools_prepareforrsem/script.sh
new file mode 100755
index 00000000..d6b3775f
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/script.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+set -eo pipefail
+
+unset_if_false=(
+    par_sam
+    par_error
+    par_log2stderr
+    par_timeit_header )
+
+for var in "${unset_if_false[@]}"; do
+    test_val="${!var}"
+    [[ "$test_val" == "false" ]] && unset $var
+done
+
+umi_tools prepare-for-rsem \
+    ${par_log:+--log "${par_log}"} \
+    ${par_tags:+--tags "${par_tags}"} \
+    ${par_sam:+--sam} \
+    --stdin="${par_input}" \
+    ${par_output:+--stdout "${par_output}"} \
+    ${par_error:+--error "${par_error}"} \
+    ${par_temp_dir:+--temp-dir "${par_temp_dir}"} \
+    ${par_log2stderr:+--log2stderr} \
+    ${par_verbose:+--verbose "${par_verbose}"} \
+    ${par_random_seed:+--random-seed "${par_random_seed}"} \
+    ${par_compresslevel:+--compresslevel "${par_compresslevel}"}
+    ${par_timeit:+--timeit "${par_timeit}"} \
+    ${par_timeit_name:+--timeit-name "${par_timeit_name}"} \
+    ${par_timeit_header:+--timeit-header}
+
+
diff --git a/src/umi_tools/umi_tools_prepareforrsem/test.sh b/src/umi_tools/umi_tools_prepareforrsem/test.sh
new file mode 100644
index 00000000..c94a202d
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/test.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+
+test_dir="$meta_resources_dir/test_data"
+apt-get -q update && apt-get -q install -y samtools
+
+################################################################################
+echo ">>> Test 1: with --sam:"
+
+"${meta_executable}" \
+    --input "$test_dir/test_dedup.sam" \
+    --output "$test_dir/test_output.sam" \
+    --sam
+
+echo ">>> Check if output is present"
+[[ ! -f "$test_dir/test_output.sam" ]] && echo "Output file not found" && exit 1
+[[ ! -s "$test_dir/test_output.sam" ]] && echo "Output file is empty" && exit 1
+
+echo ">>> Check if output is correct"
+# use diff but ignoring the header lines (which start with @) as they may differ slightly
+diff <(grep -v "^@" "$test_dir/test_output.sam") <(grep -v "^@" "$test_dir/test.sam") && echo "Output is correct" || (echo "Output is incorrect" && exit 1)
+
+################################################################################
+echo ">>> Test 2: without --sam:"
+
+"${meta_executable}" \
+    --input "$test_dir/test_dedup.bam" \
+    --output "$test_dir/test_output.bam"
+
+echo ">>> Check if output is present"
+[[ ! -f "$test_dir/test_output.bam" ]] && echo "Output file not found" && exit 1
+[[ ! -s "$test_dir/test_output.bam" ]] && echo "Output file is empty" && exit 1
+
+echo ">>> Check if output is correct"
+diff <(samtools view "$test_dir/test_output.bam") <(samtools view "$test_dir/test.bam") || (echo "Output is incorrect" && exit 1)
+
+################################################################################
+echo ">>> Test 3: with --log:"
+
+"${meta_executable}" \
+    --log "$test_dir/test_log.log" \
+    --input "$test_dir/test_dedup.sam" \
+    --output "$test_dir/test_output.sam" \
+    --sam
+
+echo ">>> Check if output is present"
+[[ ! -f "$test_dir/test_output.sam" ]] && echo "Output file not found" && exit 1
+[[ ! -s "$test_dir/test_output.sam" ]] && echo "Output file is empty" && exit 1
+[[ ! -f "$test_dir/test_log.log" ]] && echo "Log file not found" && exit 1
+[[ ! -s "$test_dir/test_log.log" ]] && echo "Log file is empty" && exit 1
+
+echo ">>> Check if log file is correct"
+diff <(grep -v '^#' "$test_dir/test_log.log" | sed 's/^[0-9-]* [0-9:]*,[0-9]\{3\} //') <(grep -v '^#' "$test_dir/log.log" | sed 's/^[0-9-]* [0-9:]*,[0-9]\{3\} //') || (echo "Log file is incorrect" && exit 1)
+
+echo ">>> All test succeeded"
+exit 0
\ No newline at end of file
diff --git a/src/umi_tools/umi_tools_prepareforrsem/test_data/log.log b/src/umi_tools/umi_tools_prepareforrsem/test_data/log.log
new file mode 100644
index 00000000..e4b56e57
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/test_data/log.log
@@ -0,0 +1,103 @@
+# UMI-tools version: 1.1.5
+# output generated by prepare-for-rsem.py --log test_data/log.log --sam --stdin test_data/test_dedup.sam --stdout jnfgioeurg.sam
+# job started at Tue Sep 10 06:43:30 2024 on 4855b4607095 -- 07ae7548-56e8-4772-9b48-7406710fd838
+# pid: 28, system: Linux 6.10.0-linuxkit #1 SMP PREEMPT_DYNAMIC Wed Jul 17 10:54:05 UTC 2024 x86_64
+# compresslevel                           : 6
+# log2stderr                              : False
+# loglevel                                : 1
+# random_seed                             : None
+# sam                                     : True
+# short_help                              : None
+# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
+# stdin                                   : <_io.TextIOWrapper name='test_data/test_dedup.sam' mode='r' encoding='UTF-8'>
+# stdlog                                  : <_io.TextIOWrapper name='test_data/log.log' mode='a' encoding='UTF-8'>
+# stdout                                  : <_io.TextIOWrapper name='jnfgioeurg.sam' mode='w' encoding='UTF-8'>
+# tags                                    : UG,BX
+# timeit_file                             : None
+# timeit_header                           : None
+# timeit_name                             : all
+# tmpdir                                  : None
+2024-09-10 06:43:30,918 WARNING Alignment ERR5069949.114870	99	MT192765.1	642 has no mate -- skipped
+2024-09-10 06:43:30,918 WARNING Alignment ERR5069949.147998	163	MT192765.1	673 has no mate -- skipped
+2024-09-10 06:43:30,919 WARNING Alignment ERR5069949.114870	147	MT192765.1	747 has no mate -- skipped
+2024-09-10 06:43:30,920 WARNING Alignment ERR5069949.147998	83	MT192765.1	918 has no mate -- skipped
+2024-09-10 06:43:30,921 WARNING Alignment ERR5069949.184542	99	MT192765.1	1054 has no mate -- skipped
+2024-09-10 06:43:30,921 WARNING Alignment ERR5069949.184542	147	MT192765.1	1254 has no mate -- skipped
+2024-09-10 06:43:30,922 WARNING Alignment ERR5069949.376959	99	MT192765.1	4104 has no mate -- skipped
+2024-09-10 06:43:30,924 WARNING Alignment ERR5069949.376959	147	MT192765.1	4189 has no mate -- skipped
+2024-09-10 06:43:30,925 WARNING Alignment ERR5069949.532979	99	MT192765.1	5567 has no mate -- skipped
+2024-09-10 06:43:30,926 WARNING Alignment ERR5069949.540529	163	MT192765.1	5569 has no mate -- skipped
+2024-09-10 06:43:30,926 WARNING Alignment ERR5069949.532979	147	MT192765.1	5620 has no mate -- skipped
+2024-09-10 06:43:30,927 WARNING Alignment ERR5069949.540529	83	MT192765.1	5658 has no mate -- skipped
+2024-09-10 06:43:30,930 WARNING Alignment ERR5069949.856527	99	MT192765.1	10117 has no mate -- skipped
+2024-09-10 06:43:30,931 WARNING Alignment ERR5069949.870926	99	MT192765.1	10117 has no mate -- skipped
+2024-09-10 06:43:30,931 WARNING Alignment ERR5069949.856527	147	MT192765.1	10198 has no mate -- skipped
+2024-09-10 06:43:30,931 WARNING Alignment ERR5069949.885966	99	MT192765.1	10229 has no mate -- skipped
+2024-09-10 06:43:30,932 WARNING Alignment ERR5069949.870926	147	MT192765.1	10244 has no mate -- skipped
+2024-09-10 06:43:30,932 WARNING Alignment ERR5069949.885966	147	MT192765.1	10276 has no mate -- skipped
+2024-09-10 06:43:30,932 WARNING Alignment ERR5069949.937422	99	MT192765.1	10421 has no mate -- skipped
+2024-09-10 06:43:30,933 WARNING Alignment ERR5069949.937422	147	MT192765.1	10590 has no mate -- skipped
+2024-09-10 06:43:30,934 WARNING Alignment ERR5069949.1066259	99	MT192765.1	11336 has no mate -- skipped
+2024-09-10 06:43:30,935 WARNING Alignment ERR5069949.1062611	163	MT192765.1	11426 has no mate -- skipped
+2024-09-10 06:43:30,936 WARNING Alignment ERR5069949.1067032	163	MT192765.1	11433 has no mate -- skipped
+2024-09-10 06:43:30,936 WARNING Alignment ERR5069949.1062611	83	MT192765.1	11453 has no mate -- skipped
+2024-09-10 06:43:30,936 WARNING Alignment ERR5069949.1066259	147	MT192765.1	11479 has no mate -- skipped
+2024-09-10 06:43:30,937 WARNING Alignment ERR5069949.1067032	83	MT192765.1	11480 has no mate -- skipped
+2024-09-10 06:43:30,938 WARNING Alignment ERR5069949.1258508	163	MT192765.1	12424 has no mate -- skipped
+2024-09-10 06:43:30,939 WARNING Alignment ERR5069949.1261808	99	MT192765.1	12592 has no mate -- skipped
+2024-09-10 06:43:30,940 WARNING Alignment ERR5069949.1258508	83	MT192765.1	12637 has no mate -- skipped
+2024-09-10 06:43:30,940 WARNING Alignment ERR5069949.1261808	147	MT192765.1	12653 has no mate -- skipped
+2024-09-10 06:43:30,941 WARNING Alignment ERR5069949.1372331	163	MT192765.1	13010 has no mate -- skipped
+2024-09-10 06:43:30,941 WARNING Alignment ERR5069949.1372331	83	MT192765.1	13131 has no mate -- skipped
+2024-09-10 06:43:30,942 WARNING Alignment ERR5069949.1552198	99	MT192765.1	13943 has no mate -- skipped
+2024-09-10 06:43:30,943 WARNING Alignment ERR5069949.1561137	163	MT192765.1	13990 has no mate -- skipped
+2024-09-10 06:43:30,943 WARNING Alignment ERR5069949.1552198	147	MT192765.1	14026 has no mate -- skipped
+2024-09-10 06:43:30,944 WARNING Alignment ERR5069949.1561137	83	MT192765.1	14080 has no mate -- skipped
+2024-09-10 06:43:30,947 WARNING Alignment ERR5069949.2098070	99	MT192765.1	17114 has no mate -- skipped
+2024-09-10 06:43:30,947 WARNING Alignment ERR5069949.2064910	99	MT192765.1	17122 has no mate -- skipped
+2024-09-10 06:43:30,947 WARNING Alignment ERR5069949.2125592	99	MT192765.1	17179 has no mate -- skipped
+2024-09-10 06:43:30,947 WARNING Alignment ERR5069949.2064910	147	MT192765.1	17179 has no mate -- skipped
+2024-09-10 06:43:30,948 WARNING Alignment ERR5069949.2098070	147	MT192765.1	17269 has no mate -- skipped
+2024-09-10 06:43:30,948 WARNING Alignment ERR5069949.2125592	147	MT192765.1	17288 has no mate -- skipped
+2024-09-10 06:43:30,948 WARNING Alignment ERR5069949.2185111	163	MT192765.1	17405 has no mate -- skipped
+2024-09-10 06:43:30,949 WARNING Alignment ERR5069949.2151832	163	MT192765.1	17415 has no mate -- skipped
+2024-09-10 06:43:30,949 WARNING Alignment ERR5069949.2176303	99	MT192765.1	17441 has no mate -- skipped
+2024-09-10 06:43:30,949 WARNING Alignment ERR5069949.2151832	83	MT192765.1	17452 has no mate -- skipped
+2024-09-10 06:43:30,949 WARNING Alignment ERR5069949.2205229	99	MT192765.1	17475 has no mate -- skipped
+2024-09-10 06:43:30,950 WARNING Alignment ERR5069949.2216307	163	MT192765.1	17503 has no mate -- skipped
+2024-09-10 06:43:30,950 WARNING Alignment ERR5069949.2176303	147	MT192765.1	17518 has no mate -- skipped
+2024-09-10 06:43:30,950 WARNING Alignment ERR5069949.2185111	83	MT192765.1	17536 has no mate -- skipped
+2024-09-10 06:43:30,951 WARNING Alignment ERR5069949.2205229	147	MT192765.1	17584 has no mate -- skipped
+2024-09-10 06:43:30,951 WARNING Alignment ERR5069949.2216307	83	MT192765.1	17600 has no mate -- skipped
+2024-09-10 06:43:30,952 WARNING Alignment ERR5069949.2270078	163	MT192765.1	17969 has no mate -- skipped
+2024-09-10 06:43:30,953 WARNING Alignment ERR5069949.2270078	83	MT192765.1	18102 has no mate -- skipped
+2024-09-10 06:43:30,953 WARNING Alignment ERR5069949.2328704	163	MT192765.1	18285 has no mate -- skipped
+2024-09-10 06:43:30,954 WARNING Alignment ERR5069949.2342766	99	MT192765.1	18396 has no mate -- skipped
+2024-09-10 06:43:30,954 WARNING Alignment ERR5069949.2328704	83	MT192765.1	18411 has no mate -- skipped
+2024-09-10 06:43:30,954 WARNING Alignment ERR5069949.2361683	99	MT192765.1	18425 has no mate -- skipped
+2024-09-10 06:43:30,954 WARNING Alignment ERR5069949.2342766	147	MT192765.1	18468 has no mate -- skipped
+2024-09-10 06:43:30,955 WARNING Alignment ERR5069949.2361683	147	MT192765.1	18512 has no mate -- skipped
+2024-09-10 06:43:30,955 WARNING Alignment ERR5069949.2415814	99	MT192765.1	18597 has no mate -- skipped
+2024-09-10 06:43:30,955 WARNING Alignment ERR5069949.2385514	99	MT192765.1	18602 has no mate -- skipped
+2024-09-10 06:43:30,956 WARNING Alignment ERR5069949.2417063	99	MT192765.1	18648 has no mate -- skipped
+2024-09-10 06:43:30,956 WARNING Alignment ERR5069949.2388984	99	MT192765.1	18653 has no mate -- skipped
+2024-09-10 06:43:30,956 WARNING Alignment ERR5069949.2385514	147	MT192765.1	18684 has no mate -- skipped
+2024-09-10 06:43:30,956 WARNING Alignment ERR5069949.2388984	147	MT192765.1	18693 has no mate -- skipped
+2024-09-10 06:43:30,957 WARNING Alignment ERR5069949.2431709	99	MT192765.1	18748 has no mate -- skipped
+2024-09-10 06:43:30,957 WARNING Alignment ERR5069949.2415814	147	MT192765.1	18764 has no mate -- skipped
+2024-09-10 06:43:30,957 WARNING Alignment ERR5069949.2417063	147	MT192765.1	18765 has no mate -- skipped
+2024-09-10 06:43:30,958 WARNING Alignment ERR5069949.2431709	147	MT192765.1	18776 has no mate -- skipped
+2024-09-10 06:43:30,959 WARNING Alignment ERR5069949.2668880	99	MT192765.1	23124 has no mate -- skipped
+2024-09-10 06:43:30,960 WARNING Alignment ERR5069949.2674295	163	MT192765.1	23133 has no mate -- skipped
+2024-09-10 06:43:30,960 WARNING Alignment ERR5069949.2668880	147	MT192765.1	23145 has no mate -- skipped
+2024-09-10 06:43:30,960 WARNING Alignment ERR5069949.2674295	83	MT192765.1	23203 has no mate -- skipped
+2024-09-10 06:43:30,963 WARNING Alignment ERR5069949.2953930	99	MT192765.1	25344 has no mate -- skipped
+2024-09-10 06:43:30,963 WARNING Alignment ERR5069949.2972968	163	MT192765.1	25425 has no mate -- skipped
+2024-09-10 06:43:30,963 WARNING Alignment ERR5069949.2953930	147	MT192765.1	25464 has no mate -- skipped
+2024-09-10 06:43:30,964 WARNING Alignment ERR5069949.2972968	83	MT192765.1	25518 has no mate -- skipped
+2024-09-10 06:43:30,966 WARNING Alignment ERR5069949.3273002	163	MT192765.1	28442 has no mate -- skipped
+2024-09-10 06:43:30,966 WARNING Alignment ERR5069949.3277445	99	MT192765.1	28508 has no mate -- skipped
+2024-09-10 06:43:30,966 WARNING Alignment ERR5069949.3273002	83	MT192765.1	28543 has no mate -- skipped
+2024-09-10 06:43:30,966 WARNING Alignment ERR5069949.3277445	147	MT192765.1	28573 has no mate -- skipped
+2024-09-10 06:43:30,968 INFO Total pairs output: 56, Pairs skipped - no mates: 82, Pairs skipped - not read1 or 2: 0
+# job finished in 0 seconds at Tue Sep 10 06:43:30 2024 --  4.44  0.25  0.00  0.00 -- 07ae7548-56e8-4772-9b48-7406710fd838
diff --git a/src/umi_tools/umi_tools_prepareforrsem/test_data/test.bam b/src/umi_tools/umi_tools_prepareforrsem/test_data/test.bam
new file mode 100644
index 0000000000000000000000000000000000000000..7793c7e3635e9c6a428647be7d067194f1d9d522
GIT binary patch
literal 11123
zcmV-(D~!}1iwFb&00000{{{d;LjnM70fmy^PQox4#fx{vm)Hx?@l^({R~wKZ$u_4G
zZkO&4xP+~^Ry5xGYCe}?$QGBOcc=Zn{!Y$Gr?%Vsx<bgho8z^|fQ)f)&tf6UBrCW|
z`&RIP1CO~+VA9FhAb=f@QS+9Xed}?7mvaW#nX+9L0rnlXbexsD^lTisOr=s`f5jqR
z#v)0fZ~4`OJS(!C?<w}ZMz1|a2}}m1IRZR$>VY=k@0u*NBTVWES6tt4skknADPwV<
z`eJ5>LjYrB7$bu~Xb0}k=>Xw2EkvHhWK-}q;zdu21``6QF3I-epG8_Po&!mqD<O9B
z7^u<yuW&%1%5|P=hb(>520^6NV7@De<awJ)pv|py7gvf>X#>$<_dF}p<U|*xN)r<C
zymK6NnA1z<1WxvJ*b(cw>M*P-JgGE!-FF(?Tr{A67(_wKp=z#&5V}SPHNO>oDuliP
znC$J!<pKZzABzYC000000RIL6LPG)o=_$2+d#o*2dEY*M#WpaxYdqU57)oX)gS&7M
zoc99^y)$<l1_}waQR1{vFf@tk1d1?(7LnpLT`H?FO46D@#UJ!mp_SWGnyL+jDrrG=
z^FUDn1<?wEf)EN-Xe$!sfq>Zged{rgwP){h&b`-j?>%RqHM93U>o?!?_kC;4Hx~A|
zb!)L;cfBm$THL$%Q8)JY@NfCV6HnZI&pij*hbJfdC(qwKIXpaG{KAckMWNe5G&*J?
zVR4?9T^@IdR)x|!PTH=nN+n8Js#>e+Vj1CBMrBl*ziCQ8mZcev(k#nT{8?r=8D-@%
zi^>cq{XcxgXjZ1~=g7V}!)W&4&ZU3%;O?cxJum$mJBuHD<)!PZdoR83Z~xejUw_N3
zFMjJ!U0S^HO_y$Z^P4X%-u9O2Rd0Im()-%2VQO}|zqEM&2!e0m`^t1*f6HPaC6}BD
zreoE$Do(1rOk~`(f{RY&ji`0p=uQ`HtMa<!d6(pQCmRivnzpTqiZwFfO_CHE(=|xA
zBkNy9DUIVyN?$U}*&-Iv3y(wT-b$yk$o@pvN4PA-=`=N;L|11R1Ex#!onF!ZmC{$d
z#t^+7BY07+BDq!FdvCD?C_nP>mtN$ceA)md&b`cG`S|sVMeIb!WBECZy?81*I|m0R
z`}+n=#kgd;$yvp;ieu3+)l|GK7}rYbF4l~9x#W2gOP$Ag(&&mMj3q+xMrhu&3DYGP
z0KX`+6fmS2&5{6HM$088l$8!CI$;i)Iej(28gPs#Ib;nwB}C8u5V^Y&c`2mNUM`1Q
zm1dJWw;b-J`Og66e?AS&TW7)iTo2|Tm;~^Dp9<y(%r=f2R%!u=62U93@+vPh5Ce*;
z25wx_K+c<nsV-*_Nv;cl8&Gi8u}bic6|89$WAv*>p;Yz~=Uz_E3~Cy=@yvg>7t%1K
zk>E`+n5HRRK&S8`HJ@j$OO4!GMK70WMAw?D(lP{d0WhCOU=lVCOf~}Zd0XFfz`g7S
z59S*euNjTKCIIvB<X~rS@kv;<0Fa8YT34`PQZ=&TUDc^hc6qDHF3y>9f^|)g$Vn4s
z{hDCbmp(q7^ixKb?c<X<Z_}I~naugafi_<0oY()_SA~REn+$V)&4xK&Kc4f6QyTUZ
z%z4LJpoiU@FJSC#f%7^zI^Nw`{PNB4UQ%L4Oq6Y$u$J)(9!tP>t2*Je>NsPuRxmv+
z7ZsO6=0&WUTvu9FDp6GnI~4N^B(5o$-^^Oo%*bqJEeIqRqa?{B;@+zFC`DtqjDn@^
zl~u9jnF00C{e3{UJyw$210~rSDaq|yFWee{@)qa4iviI02CmS8_ep?P3mzxBin~VU
zRU(;6bXh2++a|A+XmiP!&^0R(*(o?o9mlz7YOc6W0CbZ`Rm)tSS$VQ+bj`>(h3|9d
zO-u~mU}E^zD;#=%_o;)4`G7<3wx=x?gzgBvz3r3zo$bY2V6Ur$Rk|T46i+&qGtsIx
zk6BgLi9%e#VZUQZ3K=uqD#TK(=$K9rJuni)M7JF|#wt-442F@Bo@{BRW8}?82)$n#
zO?L!=d)2G%;52&=N1;R1UvJ%IWJ41p`V@Ci+(K`)UXuPjxN~p372^5)0MFY;cs_sY
z<y!-G7SBg||3?`A`r{K-c6gHi(_BN|qyTa)j}cff@_-OVRwV*KL(5><^0v{f<aJUo
zrDdU6j99E{l&+JihA)(5Ed`^=&_vSIsH_oC`vo(M(j`q5FKw`p4H2pHfCHjd)r|^A
zgc8|-G%DTO41&>c2cjXuGe4EfrH$_h&1@CTxo9GuL=!hUH1ECuG+$&5;{%If?EO>2
zxOaGXa&)k`eA8k<h}8<3EMi1ywX7?61BKwLt@Aj6L~_mws5GM5ylD|)AO(o$RBG;6
z&R&&fWp)M?_Ga$&p;!F;RIY9qd5F(XfN(}`48#LdBX_kxoYkgfS8!%}oB@bi#z1&`
z^~3S1z4{`?o;kHw`-cbn2fM~z#e$5w%<Gmj)yfVAU341o<b`5Y%r#6AE4avoLOv*@
zB=gq6eko|Ah$%sAIRf@hvy0aZAFpfC`uC(}&}9PV<<kA%S!yqcyxKy;Zd}o@k~HkI
z&HwfqcH2n9%HCdmF<!M-gz@iB?Ulom>{X9O1^dGxPc7#%7m4JBEMg=+1!9c0N;F5*
zu3|=4=0LoI^dVNM8rXgY%O&$T024-qLXj++2I*_S>;sXqoDKS~<ZwC~pg9+?kI{Ty
zkLGpBYAX2?w=EW*3D7(`Io>`pXbPq{;1Mx`x1`p(YGYZD&{dsCv=+z}+nBd`4ad;M
zF}wpD1CNWC!5uVOLYQ#|v0Wv-O?Dz<z|CxOY?5T>3)TUf&5oW0v~_RtX!29heEKY!
zgj|?Q62d!yt`BIo;B)de!5597BhSkm(MCh~0b~h<M%<^6rwa*IM^&}rE!-xPEEdSu
zyO>wxZgZ`9`()%s!;y~thx>U-saQ40($Q@>E1Jm!%}Wl==Pw7@{HG4hKLDC{PJQXY
z_RjwC;UXi>s6k|$w3R~ehF}gq#JUBc!Rwqt;SMn}t6HI0soEB}31twb_zw{ZXrqMH
z3Q)!_2%%{#M^jJT%!au~=>Wi4j%@bjDerN9_%k7l<<{Et6z4p_@dwGOP!hCXn0jId
zqY3chBmoB`8U;86QVXSW0Y8)!QnXAaadKv$uZd93e84Fm>&T|mhTkTfwBfdW-yE2s
zOvV$L^k81ReemWU%nxAf{{~<l?Hz4fFx$8Sy;gIiZdyX%1ab~|p9b`1ktif0f+IHO
zU^U^#fprT$NmgZB<7%0+w&)Uu@8sakZ@(<B)e+4PpCY1JXgvYu7sg;-2QZ&WV7}!4
z{ODaX&gYq1o)i8+59T9_UtA0TCxL=nFkeNkoLB;>Yo5f75|rt(q!W3?@x3i1C}G~}
z4scZrh7(>1i~?+B4x=fF0OfGwttJZM0+C^9KPgWyVl*9DJD-h=l8(*jT)9j4TK<Jl
zFz*ZmbIb|fx%EYd+rK0%cdfB}<!J0bhFBi%9Um_~3x4WsAU@<d@O1LKrt@>?zY?by
zP0*lVkmmQ2Fo^!(TF&Is2Pd<0Hf%h*SB7eL-r0GT>VNZWNHCYf(2z=5LYv6^G!5om
zo?&*kj%QamJ$ThKJcD@e;%64a*gu@!yThHGo!z~~XTN8$;1U82z9(mrCmJypOg5_;
z-5?T6I6^eoZK)*kfljta)wzV`U>Hf1NN`W5Vw$DH#KJkPvlH2sNV=EtU7+;#F-qT_
zqIB~JrEgDAaxySR>1U=gaP3eUeAm?IZkl(xerT=ji2RrfeT~5ZN^cKkDL^S2qf~hZ
z^=ymMb&#c>b141xUqTof$P&2dz2ilLvzKqiJHQ)Oh|UX+yb?O2G-^Lh2PY1ISb`*p
z4o>3&cp*crYE<WP0iEi&f?9#);fbw#Ep0I@vh+)8t%pgGQet(Cv_%e<)H-Hgf3u~Z
zDSlCDNig!}&ZHQ3ZuwmMbxg5kBSHJUFvq-=jOn-k-#6w8Y`6A~i2(DW);lHu`*<K~
z7DRGPNWPJbOJqKpOT<O2Qe~1LC<2>7qL;)uXUI22RV2veWsaIAPZR_)0G?m4VrMhT
zsAmX!bb5)I@L;Yi8M(Ja7En)dd+IywFE=D2-JH|~?Z(Oz--rP!4llE1<cdt>9Otmi
z0!Hk@2+ORGm@a?cYKD0S#=betFb|Ipw@)m;h%x{O#Y7-(qH3B_9myPrhIUeb{6M9*
z?iwmmD%1zSfJ?+pEa5GFq^lTEhu7~^%zJ?dH7_%0jB}VQu<`4TYy$szZJ-y-@1904
z-Zr5Z|Jv37oEJ<_CK13FrU|A4(~#d?f=s+^!71s6M6LwUQ72$<6@(2%bY$f*93szS
ztyw8S*P<RJa)#8so!*HHqhS7=^InN5`+;cgLi2T;VXp81qw7}Soa4UJ3HR-}x%Yhk
zAe$b}-zXM~e?67WqvOMaL(9!EB<yv85;|~$#^*@N>rQIa4s<6#(1GbjM&40gq*_Lp
zRwxv=v64+**02W^!yQ3d)wU0r77NRITlG%;jNFus+Gb-3&Q6!QE4x8G%uCAHFI-_Q
zfv^ixO~Uv`rsC=FBwepLj3M~9R_jEeKnTnL9^lR4d=Rq(x>%XYAkS;vwwio4v>bFW
ztSWfY4sWURHV(>L+1Vi2z-IcNQ~z=r?)CY?GIPAA{pwAv|KwRTH}fA}*>8C6s&sx;
zEEX@Fy7H5qlf$DOBb_|qQpGTsorY~hHV#-6%BkRwnndUDP<alIPMi@8sj3RCskNp?
zB%J54s;Z3}qVhnGUL0hibOO2@qida1HbrCg81Be?#!#M>9T+uweEc-wyk#Ps-VqUW
zp>Pt!pO`vz2PX+9I3d+`yoFzfWP-Z9a(vIggJgZ=Gm+x(bvFGnbi!m~k+r9M=ZC#_
z?sv$t-h$478OkP~$mW-QNO|XwZvK!P`?nbTg+Mls_D=S;7vH273EhFss1ph*!Li3p
zOIj|dN+@!$M$qycpj}K&0W=yaosoHku+n#5+4Fv~Ygmktul=!row)R~nBgB)pSf!$
z<NVoZYnY6embZ*1{gyyl>;xCzL(-z*V_OPU>Mm(9EnP4&YSc=FB69);h`JgOro<iD
zuS#_A4~f4m<<%YQaL8<&V_(wW5E05OzOt5Lxb&50_^z`akvA^h;ShP<Us){vEI{P=
zaDN}>2#8GYEF!bEND_?kri1zj4JAT$!4!~__*YZ^i{?roh;xOj@lB5W9A=KxmB@!l
z2iis=WSXLuro3(?%(n2E>r=|-aS6cr%^)<p^fiF<moMLP*G!e~Yu>mJ5Mfn{0J|m7
z2MZ1v7v8j7(wsZ66bf`d;;{})AgDreiUKqhtOsxrLP2U|jk3GsfKEzPfJM{|(u@*#
zRXO<}@1s&X_sn&D**+PYU`0M~wPmZJWv>S|>YEBiC9UBu_*c<1$XGR3Uh{tQG~;sn
zY#HT$0?gY2l9k5ZA9Rs{u|7Q6vn;FNycRV=Rs^{<kVWv1M4#hJB>@&th5@Jcgi1g)
zA$NI~h?X)i$qNFphFMY=w=Anu$farA^qq<p6iaPIk<V0pM%AM+7F7R1t{F6tUNcHY
zE;YS*sqGdughT2KHd#Iz0-E}Kl=TG7?Wt(e*aJa-+6c|&8lvesW6car#EYq!*=A(v
zZ6cOAY?r=|#^<rwxbn12vmp}m66miQ>wC2ND(i2Mm(3h#>Tw1pf(-2X5t?G_*$&Om
z5}MBOy>l^){i6`g-R+~Jqs3*I!K(t(N`qx7qs+WAseRhF#HKDq$eiDgvME_GZciE=
z%Lzgb(xn~t7`+)7J$LLU{&+oC`^OW6oaY=P^v+OwuMtA$D=}r~JUy~>VI!AEXt4pI
z?;RuLoyCtlf3R<!UV0D4emz9!`1t7f!0=0`5qAh1I#k$eq?H}~XvOmmBqmp?flF=S
zTdCjy=n$O(tAxj3o1l$~QBM|JFe0!<>x(0PDYaB~>imH1{~g?DRQAY{|9}NdEw2>W
z#`)ea8xv*=_-X%Edzoo^H{R9JROY`mz8ItAjg7bh%8v!h3hsR>DdeXf{qS9@SmK;F
zZ|~{B9LyU5{K8NnEts~2VKZon1x-276fj5ds`93-i7l2aPf(|;62@z9kW3)qx)!og
zuX>u5V?Ua@ws%`i$yUT}o@&pc-p#DYXPCiy)&1CM&LWz17I(hDqdBm3ImYe}(A?SE
z-8-<(f)yyS)Kb<cva~wUh3Fudyaq-}#NbF0hS;U56O`6L8bifFHX{fY)EWZz6N0Tk
z^3`DMR_2#x)3TEz?XsyS&4TYUr_Oo=)m4w?`%Xji#tE99U&?zl2fB_B{^x<BJ2WYU
z&*P*4zb+wSJSm{(2%NhD{4g?2Fd#rs)EzZiN|oe|Xi=Y6EpbfLrK+Ln8gkhRVnT=J
zuc*ATW7e*w<FmmPL}+C_H4vP0jXkq?hAI4VmaqG@k#u@IU;BcUD0^q;VCQJ>&{86>
zfx0RXE9W&-IBp?xNXU32st!`C;)3xO75Jo<RY%>x3GfGsU_Ws-r({mU4B)HdKQmw1
z=<^ecLpoZIF>QO>L=_K%X*o*&C+k0gtE15edinh7X?U_(JYPE_pM>$Bh4MMzc`gZT
zU}@Lm|Ik{`y~4$<mBTgu+Dm5YL&&KA2=kw&y{pP5KHjubp9RmSPw@P<!|s1<503o}
zqp>%HcpjhZ9_%hM>Uu>LEtfEbvQ&ziL{LGBHB!L5CMSiO5zE1mB(SGEk#JmB*-_&&
zxDBT<|NFwQ=CSn)VIG^ze&2fH@$5eoYUfqV{t8U`*$=NxH2ZYJ?5`Wo{-sWrj(3K8
z|2{YSdocDNf_uNae|UhR;YI3Zp$0$Rq2#P(&6}bF`5|NYP1e*Hr<EP1WG&%6b<#oA
z5RX(Scy$_)NYO}Y_JxO4`8DnRoBO0}!oEb)%jfAU0QqRBMq)!#s_4zqJw23<^k#|R
z{h>gR9F)e)Uk!SwjUfgs5ddP0Wtm5_U%hB&nD!H0@p@0nF$xczU_N@9S-NS~%>T39
zRkvATEDrG8+dbYnwq~9+9APdblP5_F#T2SVBm&stM3x;zCynIDM5+X2RKYX~g;Jw}
zEn13E5G5<*bBOz5lnqU|i_|_fHrLiuOe!Bap-Qu58VwQiw2Cu0#8En%&7oADGB`SY
zR=0WEES}Zws@o)t|9_zBBRty}#cP>^6sh98fJFnnhaW)*OKR-{Pt01bd5rQlODfG`
zh7+=CBuX+G^#{mioukaviul4<**$FQm^k;-Y-BNjF{FIi$UkJG5mTge(YUK6q^aX{
z?C)>|c)q|d!R_&U<3cTlF%{yuw{v`avbcCN@qIBBQ5hl%1RJ#=jb{SXL5F;e2vtE&
zLjhQ7qO7Q)M<l9gfGcVb63!)y=1u05Ihm0q3spKML$fkiUbqVV>Y3D-$ajX2-MpbT
zc=Hr8N9v6sdsnE^*A}uVAlYOxn~rF(nNfKbi#ey@UTdET$lkdLvg;?1d9D50j1~Od
z-PP1~Z-4v1t;9K#eFK0J+*Q+5d5kg-oO#<y>PAth_fVs+Xu71n;1@HP{wD**ICalf
zU-dNSsbH-ut)4RK@%$&}^C~{z`Q4Ef^Z^fH945C8PYQU>>@!ZqbaTKHZJc(w@fgK2
z6P6(jE*q_MaK{{!n**L7a8P!8gJ?eDpu8VruLwkQe`oi2?_{xUdi3KGL={5_iDZmu
zzP14^ib8d#kf4<s?gAMt0%%Sy230te;TY|SA%P*21!8T)L6Tpzk&TrGW8;aMw+mYx
zq3q`CU!Jz|yf+5tsO|SJyioqD5Xn2X)^^h3oae9Z-y4L|^H+rKe+q=sA?dbrh*6cr
z-@M6@kH9Y%4TE*$4T`p`6S`4t%}dmF1+FSu1VKCj`I_S#IcrdX;IdHjH%oAq=bG|z
z;}ugOcNINn7tHNQU3a>#*4ex_at_^IKa1w()nVRbUf3TTvp1P{W9+v=G^y{}20Dng
z3(+><>vJOb5VHWZvQuRXIshO^1x<$@bCB3&$%<TwE&-blejw*1HT9H@#??U6@>iR=
zt1!^XHeEJ0rqRd(N1j`a9RHp636GQgNty#ZfAutznarBZyL&tz=}jhKd=%=r#gj~?
z0$b8(&?Sh!Td2Ay>MAZkN9Ao^qgoQP4s~J{1M#|1G6Bs}w6toyt(zPIEl_Yr%G2`L
z&LYi9ANfRX;!9f%h^Cb_N4KtN^pD&Xwt{!=_64(>(9($JubyTyZ=IM-ujt=8@Bz?3
zRR2E4?hE(8**iXQI|>nrCleIcm02|hIP)APv0}(>x|o~A|2U;kmhTv3w}DCEq}~aG
zoK{lmxCV|j3}<GVYpA3_Tj7{sFkSlfu4DwFrQ-uVe%5mOG$!&G4&A#Bz;uW|*GluK
zWItiGfaY_7<_9+^`Yp4HPRNBB4I%t!xCxF$(-74qZ}O_34VkVM);^=j1|7BkQdfh|
z(lYe-S(L6Z4K6NUlMMBy)kg5C^?7|PzqBUD+#BHe!LU#Fj$n~;JnM#TK3T2sJ&Liz
zttp;I+xrK{c4<=u%LjsjVj&d!L>+^I1}mZIXGUB`Q371e5SoilQcommp@Kp0iSfW6
z>N>5Ewc7-7Y3DQNV0x7v*1g!FrM)=n#k61W7>$K4Ez_(gLu@_I7WDf8+n;5_T{1WH
zfe-H7yWBc$ZPRkHw#hd_K6<iR;v<AV{rFVZ9hyeh^M+HjsRXD*<T8~&0^Ul=MVtdw
z2&CWj19GT|x>lSa)m9uu<qEMBB}TlKah{mp-WE1UWDK&Y@w`jZT2B!Zf#SRD9j$Dy
zPKPFu%&YXs^JHOIIkPT28h>+pHkoy`%X6;zey)S_IWHVq802CYd-oF|oF{vSM~e%{
z9UJo6njtwXSc@nV7#666GB|FOlnbt**s5t+BNC!8px!yJ;Oi3#P7w(_WVb%++Y-!<
zb8fNAxZfsnJ<+xgsn^%Y>8tfo&K!hI4l74CilQ>Np61SRW7oHUR{4P^UNqCq@U010
z&N7a{l8aE2uO(P3$<3-7pB87%8^>(&Eg0>OaPiR$Sl>Ga>(xGwxyw4`>qjZ}yD;{K
zP_mAW!AzJX+cB-;q79G)io{4q5>(q|46Ovk*ybg|aN0#C)<_g%su0AC>AHlj^0uhK
zo9A&%D5^wtAlTq^5`MAwi&nJKY%LC}KDPI~lX2&j*CsTl&>y`B9s$*dH|=>b4t9n;
zF9^extOjEr3Wd<(Y8KHH@wI|<MjX!<@C7+cId1B5#A4r<ocix0zQ{Y0emNd}BG7#J
z%Cef@sR+%NPl%w)7&{E4^yGN^V8;?cC?058R8*j%Eum<qzm|%glvax>r!Xf0rigpm
zIu^8a2I+T_pl;ks+Rm&IQWn4!0nkG$xcFIR&S&<HVwA89)-GEGb$z6Gw!=&3r%uai
zo;thg<8qJZAf1HqrvvG9c$%D>B}76r3BH+Tn*=#DwT9HRK07XxmX>aV38Jk+Xx)<3
zRJ4gXNR1p6X2B`b0f!-y%}mghBC_tsNlD+g)29x@)UAt-%zC*`U0J*4Z`Wi)Up`4}
z{!6`**z9ia?Hz90ekPRUbyLV1>MG(wi@K|!^e8vf8e_aJA-+&}sZ_!mip`}~1vDGj
zO4`|z$D&5y3G#^0Ox^Nsn|!Bcr>vz*RHC7?*m&iM-Al7RtsY17Zod!<TM(2_?8xd}
zj0V{(fac{*Xx=nI)3@3Ya$YulAWjGeiH%3|RfZ1&NnW+NzG_%qUNf23bdu@Z>+mfe
z&Y75T@JpjMe`>qSB4<mYwBI__y$xVqJ}nOClQ{hM9B?mdJeY4ZKHHDo6M(sMNTly#
zL0erG31I0!_O_5Nl=dqYOWAY?zLhSzh9O66O5IXS>-K_+T<W&;Y4T_n*rhji{XVj`
zJ=>%ht_#t$CSsCzhdm6J*9y#~%U>N&@rb7Qh?$lj`)HtjZi?ACrdXKYF||{4Q%l_n
zKu-olo+McMZ2;ZY!S=3}k?kN{8jiu}#mX!{yng%65Br4b*X|xXptmvG82j?$(^>BB
z?``kecmiVHao)j$#i*CFyv9onON=UUr6s(fp#4me9EQEgrO-8JV4_hyjs^9DC)z|7
zNvA93oA}H&o@La#FSm|v6s|c<nD6unV%e+k0nWzU%g(t_@0D8vbl%AHZB!Wh{%N-F
z;iQ0qOiaZ)#1f^93nf%lq3+fKD-DOpAxwD%XGl$zD8=cfqP!K@n-$MpoHVqf1+rS+
z=4;!iu5_oEb;L=}g=VV-KWFIEo&z%pKifJVPwFGI7yZCsXgr`ljj_)KhGzHR=-}9{
za~=zrF5)(!_A`VK)T&S+T4Q+wfu*hT+C+k;>SERC7_|{1N$G3RB>)$tCjQV>gcRB6
zYq>@^=gzuqFeenX?a%IYfVBOw)9{REBeb9H@f-*g!uSP&RdaZnu3qY;sRX#Cgu|Vo
zaD*~k)5^R<c$=6YHx{sXine$ILlrB~)NqA>P!_62NCBb`wE+0Tev+B@R%2Aw??rjd
zqq_csT}<)VY3lxz*$C~uuNb7$qxpvz`%buj@9vK8krpI}u{I6T)O?P_3~?>^Jk~(u
zbY-?7YZMr$9DE)}Fd<{mc@Rr#r&NHMQ$*2flZ{yy;J2r7&cybly1sie&E=RyiL4)q
zCVRZtGuD^(Xs3NYL*#Z)Ht%s;+b><nmRiS4n?=wN(ERsfG;aW!HxQbC@`*3rHIv)k
z(4+Z>uUM5$LiomTE9n7ENMfryq^*FZfyAM74BAeaZSh#HV}bm(rS@}$+FdP^GN)Z?
zi-a6h3u8cO7}rhPHnG|9!tNAnw~}@%0;yXG(S%tYv2@r_s!EX+MH!i22H|!OO6|{?
zi@v-s&s>;dv9g<zWqm;GNq^7Pj%9NN&07OB$Nl@aZZ!_g&%b(*O}}T}mtVb7l;1_L
zyK``07?`#LVjz(_;su0iJCx*j%v&Lvs$eLLN}#4Ia#PU#9L}Bul2TvMArz>R4&(yE
zMb{=18$KbBQ@0R*Za;*yHzQ_Kugu(!Xufm5+`thGjLVl_y(*Q2?frpNIvh<F)`8rE
zR>zvQbE%OhfIm@9oKyQ8Z94>p8Ku!e5(8)i7fEdmUxoAmM1C$gE)uBJ>n0oO?2|mH
zuacC#>W}frR{hT!XwQ3$*Fik{>{*yc^H=U$jW@gdR_dO+hX>ooN46>tr^0vz00k}b
zq9RkmJ5DB(bQqR0Cq}jI@}`FGso>X94W(y>;Ei#PY7>lSBWh6obvep^X884iY6_3>
z(%1lx@@fu`<|j|{KTn(Gue$qI!%af?w?ZAaXqxs|M3hAhzXEYX4T%=fRj3{n5KLlN
zxByj%*Q%o>4?5QhK+<NpQ0^pVyc7kkiV!H^^RsvD55uR?qwC&R&V^2?-*994Tg>4Z
zMn@MW(a}pBo?ri)1LNYi%%ZWk2GV)7x4n0~yZDOPFAKqU+fuBPgPzJ8DOD+G2jNQK
zFVg~vU)dnIM@lG?j?t1RMSK+kcG%b&wNK=WhB5MFi*Z1iFLtK3%;9u?9>9>|Pd!Fw
z)c>si950dEK=9|s2tI2=`S@8=1f4{V5&XG8BA)~V*9hYoRy<{+B{yq&nrwh-?u6Or
z*`=noJ!-k^H+!+=K@*|>d?<s@+K}-)tLLe{O$crgf`7f|slJ1;pIVW@{r#hT!&Ait
zCm34J>sq!Fib_u@;9V!mx&faDstV&dFQE1{anh*96VeEPI%G>a?&$eS0-#>7)w#%%
znC0ME&Dt5O!`T!SUxD+jou0%PKWS;Y{CdM44xa09SNkVTy>Prndyc1i%7mx#Z8qQO
z#qnSCZ8ik(`&Pu!fJtpOS0#i>NFgX<m%7fgNzYQ$*Wf4SY`Ul`>hZB*<FH|e-=2al
zNqZoB;x%>Kq@*_SHoLi5KFLG|OM1i?g`;zu2Z#-FDf$&ne?GE}Zy1ezXjLwckM_)(
zo^vBvp6$p^8(*_{)>^h_gpaui*7V#kx1#CKM|N#I`!}o#(iPACNvU5<j=2XLJ+{`Y
zUwX_}b1%Kt&VF3YU101v;k`fFJw7>FEL=60cF17}&RAaLh-?w##)#HfPV@|m<Gf(R
zDNw&EA`wk_94G3K>=d}g1Z6XD@AR}4PK&SEP%&IxZt9L89XKbNS@3;wEN9>0Wa>9g
zYYrO6Z&@D=FJ8M^FC(nK6iSuF)$#`=_${gGg!atIb4q7vhiF8;9JHZgU>6v`Em%cY
zp!P@)v*8+pc@fj)L?Y5I3j9p+cJ-S~&&15mta&>&`7P$q+`J{?O%Bbc|E(e8^$!}M
zu?IplkB<%x4$YHBY9)x5E@@wLB3NYH6gsA5{#1+<pk5Vn4-|PAG_+G-8)V&Fk)EDE
z@WuZDEPgQTzW6ktc>8tFoS{-KeR_h#uFYA;NW4GP*Q-QgQpPhgG`&&w=W_gQ<@I@R
zusU`c5?f;={1dkBd)*)t{_$iq_HZB*$9ubnM~CM5WNjYjjM{0TRV*)sqBU3o#lM_P
z6}aUNSrbkv&I{PAR;ZY$xkgkdL^IGMjjp<(^KLSg(=bTOsL$}%r0}Db7B}D?7d2u)
zJT-5PAh1)R&%b>GTRc3U%%Hbz<!FrF+XFee_UKt*G92d8Zp!7M$ow+1H1i7tr)y~Z
z-21f$>8$9TVX2I-tUU+lsZHqJI6=>6=gIx6-6g;B`o-eb5WQn0=lk}7KeYY;)CN?r
zp?xhCD`31qWYP-3m^yb_(ZJS&1V&<x%7*F~HR<3=TKGZRF(X|Uxz3Y|ezod!dIMrv
z7COmjGaRQMr(Eqd`RnJ)&U^0Y^{d{JF#gj}mo1*;ExUp|Anh}VGFq%ep$h7L0_Vn9
z9V4g^C8`yQHlgbnZ_J7!xYZakS%H4zRg62)aa8?hIL#~BOXCc9rr{e9(=XjKRENCC
zE*+aiGvtn*G2xDGc4+?9A0I5^PdYR&V(ce_$8v4&93Af-FMb(50(37csxqb)+PbX;
z1eFvW<dAu^1#gkccdaffc#@>0{e%<E980t;uh;?fKy3H`dt&a%TwhqfU%?l=zodD4
zJLml+;q!~Zv)UXM<S#Ml7)X?{6iKO|dy+Oz6_rY4345m`l(5vO`BxQ^fD&aKz>$sn
zVbXHcw%)gLI4)w^PBWSKCJ?icmXXzZ{f1|hjr##U@7JF_IFbs#{*1;R38i9pcXw~c
ztUudu?;#~I?d~YR-Il7Qr@;V*TqFgj;Gz^Aim)v%QUWpUxvQ<`Vq@0@VAKC`{2S1V
zx!%G%HnghWu{A-`$<!FhH-$3wWFT2iE2}=zc1B|br8$E2^kYUNm-82nI=3F)gyc;V
zBz^Av-Jh^Bc^zc(b)&H#4rTJ-XnPkm>l-b<gMgz(u>j>%WJa`uCQ;w?=rLG*(I9?R
zbxynUCZKPW^s64%Ky$T+iRE7!+NO`Jf7r!GCbQq8**`Lw{l5w2=;~%)K52))`!xG}
z!|bmc&;AatMSo{to!-3|#$+)2o!#A&qwU2v=~*9nQo+yDGn6_cFtyA(=v9rP9H)NW
zrmA(B*PvhWMu8}8k=@ihPC8uJa8+MG@pkEYEr!j4GT)IkYIku_r(dyPwsPx3_Hn3h
z^Icb-3$lW2Bx7zl$EAHB^^P%8kFV!w9xum8x#;L0jF5VFpd}8euUtJ+PGOd#nUpiH
zM`Ukbo$Wy_%kicOhPJdHOJ(=*Q{T<oQ@S(s!1>5TKJN(i>AP#Uy1MjTA4UGJ2UaVP
zAH`S|O4RZGp{wVj_?6J=^Csb_EV7d23T!WIFwAisqnMkE5)4%fykr+kP%ka&YNkg#
zq32YvgcZn~HL7Md6r#k2C|T+WSlTC&xeXYp`5^5V45niU({kud7_Es3V$w77UO|_^
zCr*nZZ=S8^e)NIWn4A#)Z=uu;XaXad04Az-OD44dDMgP-sTn<SqSCbINfmPxCvXn3
z*yN%@L=LGFN+kkRssK5dD5dB73^pg(lb}5v4Gp_^xZ+}eNj6?Z;yOYy`vVnzQlE&L
zO<D2vENCt#Xnx5d_d9O}n$LOp<k4Lx7<)z_o4flbhr7Gx=~g8@FP$}AQxw#8RZzna
zirv(`5tl;bB9D>RTrkA}>Hs#Is@oKLvYae(jk6<|{PWPAiLx6PJ8`)`*rM~&yD-0Z
z{MYY$@l0^}uI1J+mD}#H!t%D!l)oMbi<{m>W1>{t(q0BygDK9tinI@ZOiL-m1+zwe
zRubb-l*p4CaGO;jib}igC^r#vV@#2RnuwRS&ScKpCkokHow>&gndc0-&B4tpb9HK;
zSQOb!Ps@Hgdfzz}Ekj^#tnR(^F#vP<A8*eA^D)m9JmtYvrr=NCx>&qAxbJ(YnI72H
z^#yD(5NW{4bsZ4lNd8ewP)Sa!BRVGX28LbasPfS>S?Tfcw3joZC%n`s{<k`=q?E0Q
zyN4!LcDzsNu*TtZi?Zg5ek0gv26NhCekW$XbQ)7|<1AA^*o9V!F#bp&Xbw-ZN;x%;
z05~l=QB&)KqRtUf(39Y6z+2G1rAk!Lr!J`lLy%r4U<dL9x1p&4B`dWqVX(?;T*uj1
zl^7q>{dC-lgVcYWLvu4z@E19ArW&A#001A02m}BC000301^_}s0stET0{{R300000
F000^eh;IM@

literal 0
HcmV?d00001

diff --git a/src/umi_tools/umi_tools_prepareforrsem/test_data/test.sam b/src/umi_tools/umi_tools_prepareforrsem/test_data/test.sam
new file mode 100644
index 00000000..6465827d
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/test_data/test.sam
@@ -0,0 +1,119 @@
+@HD	VN:1.6	SO:coordinate
+@SQ	SN:MT192765.1	LN:29829
+@RG	ID:1	LB:lib1	PL:ILLUMINA	SM:test	PU:barcode1
+@PG	ID:minimap2	PN:minimap2	VN:2.17-r941	CL:minimap2 -ax sr tests/data/fasta/sarscov2/GCA_011545545.1_ASM1154554v1_genomic.fna tests/data/fastq/dna/sarscov2_1.fastq.gz tests/data/fastq/dna/sarscov2_2.fastq.gz
+@PG	ID:samtools	PN:samtools	PP:minimap2	VN:1.11	CL:samtools view -Sb sarscov2_aln.sam
+@PG	ID:samtools.1	PN:samtools	PP:samtools	VN:1.11	CL:samtools sort -o sarscov2_paired_aln.sorted.bam sarscov2_paired_aln.bam
+@PG	ID:samtools.2	PN:samtools	PP:samtools.1	VN:1.20	CL:samtools view -h test_data/test_dedup.bam
+ERR5069949.29668	83	MT192765.1	267	60	89M	=	121	-235	CCTTGTCCCTGGTTACAACTAGAAACCACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAG	E////6/E/EE/EE/<<///6EEE/////<AAA<A<A6AE/E/AE6A/EAEEEAEEEAEEEEEA/AEAE<EEEAEEE////6EEAA/AA	s1:i:173	s2:i:0	RG:Z:1	NM:i:3	AS:i:148	de:f:0.0337	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:148
+ERR5069949.29668	163	MT192765.1	121	60	150M	=	267	235	TATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCTTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTT	AAA/E/EEEEEEEEEAEEEEEEEEE/</E/E/EE<E/EEAAEA/E/EE//EA/EEEEEA/AEEE/EEEEE/E/EA/EE/EEE<E/E///E<AEE<<EEE/<EEEAA///AE/6A///A/AE/EAEE</EAEAE///AA/EEAEE/AAEAA	s1:i:173	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.155944	83	MT192765.1	1023	60	150M	=	978	-195	TGAAATTAAATTGGCAAAGAAATTTGACACCTTCAATGGGGAATGTCCAAATTTTGTATTTCACTTAAATTCCATAATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCC	EA<EA/<A/6A/AEA/6/66/AAAEAEEE/EEA/6AAAAAAEE</AAEEEEAAEEEAA/EEE//A/EEEEE/AE/EEE6AEEEE/A/EAEEEEE/EEAEEEAE/AEA66AEEEEEEEEE<AEEEAEEEEEEEEE6EEEEEEEAEEAAAAA	s1:i:183	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.155944	163	MT192765.1	978	60	150M	=	1023	195	GTACACGGAACGTTCTGAAAAGAGCTATGAATTGCAGACACCTTTTGAAATTAAATTGGCAAAGAAATTTGACACCTTCAATGGGGAATGTCCAAATTTTGTATTTCCCTTAAATTCCATAATCAAGACTATTCAACCAAGGGTTGAAAA	AAAA/EEEEEEAEEEEEEEEEEEE/EEEEEEEEEE/EAEEEEEEEEEEEEEEEAEEEAEE/AEEEEEEAAEEEEEEAEAEEEE/AEE/<EAE/E<EEA<<<AAEEAEEE<AA<EE/EAAEEEE<<<EEEA/AEAEE6</EEA<AEEE<<E	s1:i:183	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.169513	99	MT192765.1	1098	60	92M	=	1098	92	AATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EE6EEEE	s1:i:92	s2:i:0	RG:Z:1	NM:i:0	AS:i:184	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:184
+ERR5069949.169513	147	MT192765.1	1098	48	92M	=	1098	-92	AATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATG	EEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:32	s2:i:92	RG:Z:1	NM:i:0	AS:i:184	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:184
+ERR5069949.257821	83	MT192765.1	2834	49	139M	=	2833	-140	CCTATACAGTTGAACTCGGTACAGAAGTAAATGAGTTCGCCTGTGTTGTGGCAGATGCTGTCATAAAAACTTTGCAACCAGTATCTGAATTACTTACACCACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATA	A/AE<EE<EA</EAEAAA<AEEAEE/A/E<<E</E</EEEAAE/EE<E/EEEAEEEEEEE/AEEEEEEEEEEE/EEEE<EEEE/EE/EAEEE6EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:121	s2:i:48	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.257821	163	MT192765.1	2833	60	140M	=	2834	140	GCCTATACAGTTGAACTCGGTACAGAAGTAAATGAGTTCGCCTGTGTTGTGGCAGATGCTGTCATAAAAACTTTGCAACCAGTATCTGAATTACTTACACCACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEAEEEEEE	s1:i:121	s2:i:0	RG:Z:1	NM:i:0	AS:i:280	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:280
+ERR5069949.309410	99	MT192765.1	3184	60	151M	=	3348	314	GAAGAAGATTGGTTAGATGATGATAGTCAACAAACTGTTGGTCAACAAGACGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTCAACCTCAATTAGAGATGGAACTTACACCAGTTGTTCAGACTATTGAAGTGA	AAAAA//EEEEA6EEEAE</EEE/EEEEE/EE6EEEEEEEEEEEEEEEAEEAAEEEEEEEEEAEEEEEE/EEAEEEEEAEAEEE/EEAEEE<AEEEAA////EEEEEEEEA//A/EE/EAAEA/AE<EE/E//E/</AEAEAE/AEA/AEA	s1:i:274	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.309410	147	MT192765.1	3348	60	150M	=	3184	-314	TTATTTAAAACTTACTGACAATGTATACATTAAAAATGCAGACATTGTGGAAGAAGCTAAAAAGGTAAAACCAACAGTGGTTGTTAATGCAGCCAATGTTTACCTTAAACATGGAGGAGGTGTTGCAGGAGCCTTAAATACGGCTACTAA	E//EEAEA<<EAAE/AAAAEAAAAEA</A/<6/E/<A<//AE/EEAAE<EEEAEEEEEEAEE/EEAEEEEEE/<E/EEE6EEAE/<EE//E</</EE/EEAAEE/EAA/EEEEAEEEEE///EA/EEEEEEEE//E66EE/E/EEA/AAA	s1:i:274	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.366975	83	MT192765.1	4166	59	106M	=	4166	-106	CTAAAAAGGCTGGTGGCACTACTGAAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAATGGTTACAC	EEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEAEEEEEEEEEE<AEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:4	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.366975	163	MT192765.1	4166	60	106M	=	4166	106	CTAAAAAGGCTGGTGGCACTACTGAAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAATGGTTACAC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.465452	99	MT192765.1	4695	60	151M	=	4827	282	ACCTGATGCTGTTACAGCGTATAATGGTTATCTTACTTCTTCTTCTAAAACACCTGAAGAACATTTTATTGAAACCATCTCACTTGCTGGTTCTTATAAAGATTGGTCCTATTCTGGACAATCTACACAACTAGGTATAGAATTTCTTAAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE/EEAEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEE<EEEEAAAEAEEEEEEAA6AAEEEEEA<EEEEE</EEAEE/EE	s1:i:261	s2:i:0	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.465452	147	MT192765.1	4827	60	150M	=	4695	-282	AGGTATAGAATTTCTTAAGAGAGGTGATAAAAGTGTATATTACACTAGTAATCCTACCACATTCCACCTAGATGGTGAAGTTATCACCTTTGACAATCTTAAGACACTTCTTTCTTTGAGAGAAGTGAGGACTATTAAGGTGTTTACAAC	AAAEEEEEEEEEEEEAA/<EA<AA/EAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE/EEA/EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:261	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.479807	83	MT192765.1	5123	60	150M	=	4968	-305	CTAATGATGACACTCTACGTGTTGAGGCTTTTGAGTACTACCACACAACTGATCCTAGTTTTCTGGGTAGGTACATGTCAGCATTAAATCACACTAAAAAGTGGAAATACCCACAAGTTAATGGTTTAACTTCTATTAAATGGGCAGATA	AA/EEEEAAAEAEEEAAAEEA/AAEAAEE/AAAEAAAAEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:280	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.479807	163	MT192765.1	4968	60	150M	=	5123	305	GTTTACAACAGTAGACAACATTAACCTCCACACGCAAGTTGTGGACATGTCAATGACATATGGACAACAGTTTGGTCCAACTTATTTGGATGGAGCTGATGTTACTAAAATAAAACCTCATAATTCACATGAAGGTAAAACATTTTATGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<AEEEEEEEEEE<EEE<AEE/EEEEEEEEEAEEE<AA/EAA<AEEEEEEEAEEAAA	s1:i:280	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.501486	83	MT192765.1	5423	60	146M	=	5355	-214	TAGGTGAGTTAGGTGATGTTAGAGAAACAATGAGTTACTTGTTTCAACATGCCAATTTAGATTCTTGCAAAAGAGTCTTGAACGTGGTGTGTAAAACTTGTGGACAACAGCAGACAACCCTTAAGGGTGTAGAAGCTGTTATGTAC	EAAAAEAEEEE6E<AEEEEEEEE<EEEEEEAAEE/EEEEEE/<EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:207	s2:i:0	RG:Z:1	NM:i:0	AS:i:292	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.501486	163	MT192765.1	5355	60	150M	=	5423	214	TTACAGAGCAAGGGCTGGTGAAGCTGCTAACTTTTGTGCACTTATCTTAGCCTACTGTAATAAGACAGTAGGTGAGTTAGGTGATGTTAGAGAAACAATGAGTTACTTGTTTCAACATGCCAATTTAGATTCTTGCAAAAGAGTCTTGAA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEAA/EEE/<AEEAAEAEA</EEEAEAAAAAEE	s1:i:207	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.573706	99	MT192765.1	5697	60	150M	=	5784	236	GTACGAACTTAAGCATGGTACATTTACTTGTGCTAGTGAGTACACTGGTAATTACCAGTGTGGTCACTATAAACATATATCTTCTAAAGAAACTTTGTATTGCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTAT	AAAAA6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAAEEEAEEEEEEEEEEE	s1:i:214	s2:i:0	RG:Z:1	NM:i:2	AS:i:282	de:f:0.0133	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:282
+ERR5069949.573706	147	MT192765.1	5784	60	149M	=	5697	-236	AGAAACTTTGTATTGCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGTTACACAACAACCATAAAACCAGTTACTTATAAATTGGATGGTGTTGTTTGTACAGAAA	AA<E<EEEEEEEEA<AEEEAEEAA<<EEE<AEEEEEEAEAAAAEAEAEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:214	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.576388	83	MT192765.1	5798	50	77M	=	5798	-77	GCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGT	EA/AEEE/<EEEEEEEEEEEAA<EEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEAEE6/EEEAEEEEEEEEEA6AAA	s1:i:62	s2:i:0	RG:Z:1	NM:i:0	AS:i:154	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:154
+ERR5069949.576388	163	MT192765.1	5798	60	77M	=	5798	77	GCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGT	AAAAA6EEAEEEEEAEEAEEAEEEEEEA6EEEEAEEAEEEEE6EEEEEEAEEEEA///A<<EEEEEEEEEAEEEEEE	s1:i:62	s2:i:0	RG:Z:1	NM:i:0	AS:i:154	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:154
+ERR5069949.611123	83	MT192765.1	6481	48	125M	=	6481	-125	ATTATACTTAAACCAGCAAATAATAGTTTAAAAATTACAGAAGAGGTTGGCCACACAGATCTAATGGCTGCTTATGTAGACAATTCTAGTCTTACTATTAAGAAACCTAATGAATTATCTAGAGT	EEEAEEEEEEEEEEEA<EEEAEEEEA/EEEEEEEEEAEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:50	s2:i:117	RG:Z:1	NM:i:0	AS:i:250	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:250
+ERR5069949.611123	163	MT192765.1	6481	60	125M	=	6481	125	ATTATACTTAAACCAGCAAATAATAGTTTAAAAATTACAGAAGAGGTTGGCCACACAGATCTAATGGCTGCTTATGTAGACAATTCTAGTCTTACTATTAAGAAACCTAATGAATTATCTAGAGT	AAAAAEEEEEA6EEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEE<EEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEE/EEEEEEEA/AAEAAEAAEAE	s1:i:117	s2:i:0	RG:Z:1	NM:i:0	AS:i:250	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:250
+ERR5069949.651338	83	MT192765.1	7745	60	4S138M	=	7629	-254	ACTCTGAAGAATGGTTCCATCCATCTTTACTTTGTTAAAGCTGGTCAAAAGACTTATGAAAGACATTCTCTCTCTCATTTTGTTAACTTAGACAACCTGAGAGCTAATAACACTAAAGGTTCATTGCCTATTAATGTTATAG	A///A/6/<EEEA//EE/EE<AEEE/<A/EAE<</A/A<EEE/E<EEEEE<</EEEA<E/EEAAEEEEAE/EEEEEEEEEEEEEE/E/A/EE//<AE/EEEAEEA</EE/AEEEE/AEEEEAEEEEEEEEEAEAEEEAAAAA	s1:i:223	s2:i:0	RG:Z:1	NM:i:1	AS:i:266	de:f:0.0072	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:266
+ERR5069949.651338	163	MT192765.1	7629	60	149M	=	7745	254	ATTCTGTGCTGGTAGTACATTTATTAGTGATGAAGATGCGAGAGACTTGTCACTACAGTTTAAAAGACCAATAAATCCTACTGACCAGTCTTCTTACATCGTTGATAGTGTTACAGTGAAGAATGGTTCCATCCATCTTTACTTTGATA	AAAAAE/EAEEE/AEAEEE/EEEAAEEEEAEEEEE/EEEEAEEEEEEAEE/EEEE/EEE</EE/AEAE/<E/EEAEE<EEEE//AEEEEEE<EEAEE/EEE//E/<EE<A<A/EAA<AA/AEEA//A<A/A<A<6A6/AEE/AEEA<AE	s1:i:223	s2:i:0	RG:Z:1	NM:i:1	AS:i:288	de:f:0.0067	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.686090	83	MT192765.1	8097	60	150M	=	7975	-272	TGCAACTGCAGAAGCTGAACTTGCAAAGAATGTGTCCTTAGACAATGTCTTATCTACTTTTATTTCAGCAGCTCGGCAAGGGTTTGTTGATTCAGATGTAGAAACTAAAGATGTTGTTGAATGTCTTAAATTGTCACATAAATCTGACAT	EEEAEAEEEEEEEEEEEEEAEEEEEAEEE<EAEE/EEEEEEEEAEEE6EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE/EEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:252	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.686090	163	MT192765.1	7975	60	151M	=	8097	272	GATCAGGCATTAGTGTCTGATGTTGGTGATAGTGCGGAAGTTGCAGTTAAAATGTTTGATGCTTACGTTAATACGTTTTCATCAACTTTTAACGTACCAATGGAAAAACTCAAAACACTAGTTGCAACTGCAGAAGCTGAACTTGCAAAGA	AAAAAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEAEEEEEEEE/EEEEEEEEEEEEEEAEEEEEAEEEE<AEE/EEEEEEEAAAAEEEEEEEEEEEAEAEEEEEAEEEEA	s1:i:252	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:27	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.786562	83	MT192765.1	9096	60	151M	=	8904	-343	AAGTTTACGCCCTGACACACGTTATGTGCTCATGGATGGCTCTATTATTCAATTTCCTAACACCTACCTTGAAGGTTCTGTTAGAGTGGTAACAACTTTTGATTCTGAGTACTGTAGGCACGGCACTTGTGAAAGATCAGAAGCTGGTGTT	AEAE<AE/AAAEEAAEE<EEAEEEEAEEEEAAA/AEEEAEAEAEEEEEEEEAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAA6A	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.786562	163	MT192765.1	8904	60	150M	=	9096	343	GCATTTCTTACCTAGAGTTTTTAGTGCAGTTGGTAACATCTGTTACACACCATCAAAACTTATAGAGTACACTGACTTTGCAACATCAGCTTGTGTTTTGGCTGCTGAATGTACAATTTTTAAAGATGCTTCTGGTAAGCCAGTACCATA	AAAAAEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEE<EEAEEEEE<EEEEEEEEEAEEEAEEEAEEAA6A<EEEEAAEEEEAA/AEEEEEE/EEEEEEE	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.919671	83	MT192765.1	10501	60	151M	=	10467	-185	ATAGATTATGACTGTGTCTCTTTTTGTTACATGCACCATATGGAATTACCAACTGGAGTTCATGCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGTTGACAGGCAAACAGCACAAGCAGCTGGTACGGACACAACTATTACAG	EEEEEEEEAAEAAAEEAA6AEEEEEEEEAEEAAAAE/AEEEAEEEAEEEAEEEEEEEEEEEEEEEEEEEEAAEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAAAAA	s1:i:184	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.919671	163	MT192765.1	10467	60	150M	=	10501	185	CCTTAATGGTTCATGTGGTAGTGTTGGTTTTAACATAGATTATGACTGTGTCTCTTTTTGTTACATGCACCATATGGAATTACCAACTGGAGTTCATGCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGTTGACAGGCAAAC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAE/EEEEEEAEEEEEEEEAEEEEEEEEEEEAEEEEEEEAEEEEAEEEEAEEEEE6EEEEEEEAAEAEEEEEEE<EEEEEEE6AAEEAEEEAA6AEEAAAAAEEAAEEEAEAEEE	s1:i:184	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:24	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.973930	83	MT192765.1	10957	50	79M	=	10924	-112	ACTTTCCAAAGTGCAGTCAAAAGAACAATCACGGGTACACACCACTGGTTGTTACTCACAATTTTGACTTCACTTTTAG	<////E/EE/E//E/<//E/E//A/6EA/EE/EE///E/EAEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:101	s2:i:0	RG:Z:1	NM:i:2	AS:i:138	de:f:0.0253	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:138
+ERR5069949.973930	163	MT192765.1	10924	60	112M	=	10957	112	CCTTTTGATGTTGTTAGACAATGCTCAGGTGTTACTTTCCAAAGTGCAGTGAAAAGAACAATCAAGGGTACACACCACTGGTTGTTACTCACAATTTTGACTTCACTTTTAG	AAAAAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEAE<EAEEEEEEAEEAEE	s1:i:101	s2:i:0	RG:Z:1	NM:i:0	AS:i:224	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:224
+ERR5069949.986441	99	MT192765.1	11007	60	119M	=	11104	247	GTTACTCACAATTTTGACTTCACTTTTAGTCTTAGTCCAGAGTACTCAATGGTCTTTGTTCTTTTTTTTGTATGAAAATGCCTTTTTACCTTTTGCTATGGGTATTATTGCTATGTCTG	AAAAAEAEEEEEEE/EEE/EEEEEAEEEEEEEEEEEEEEEEEEEEE</EAAEA/EEEEEEEEAEAAEEEEEEEEEEEEE/E//<EAE/6///EE//E/EEE///E<EEEEA</A<<//<	s1:i:200	s2:i:0	RG:Z:1	NM:i:1	AS:i:228	de:f:0.0084	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:228
+ERR5069949.986441	147	MT192765.1	11104	60	150M	=	11007	-247	ATGGGTATTATTGCTATGTCTGCTTTTGCAATGATGTTTGTCAAACATAAGCATGCATTTCTCTGTTTGTTTTTGTTACCTTCTCTTGCCACTGTAGCTTATTTTAATATGGTCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACA	A6A<AEEEEE<E<EAEAAEA<AAEEAEA</EAEEA<E/E/E/EEEEAEAA/<EAAAEAEEE/EEEEEEEAEEE/EAEAE/AEAAA/EAEEEEEEAEAEEEEEEEAEAEEEEE/EAEEEEEEAEEEEEAEEEEEEEEAEEEEEEEEAAAAA	s1:i:200	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1014693	99	MT192765.1	11215	60	150M	=	11215	150	GTCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCCTTATGACAGCAAGAACTGTGTATGAT	AAAAAAEEAEEE6EAE//E/EEE6AEAA/EAEAEE6/E//EAE/EEEEAEE/EEE/EAEEEEEAE/EEEEEAEEEEEAAEEAEEE/AE/EAEAEEEEEEEEEEEEEE/AE/E/E/<<<AA<E<AEE</EEEEA6<AEEAAAA//A//EEE	s1:i:136	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1014693	147	MT192765.1	11215	48	150M	=	11215	-150	GGCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCCTTATGACAGCAAGAACTGTGTATGAT	A/<EEEAA<<AA<AEAE<6<A<AA<EA<///EAEEE<AAEAA/EA6/EEEEE/E/EE/AEAEAEEE<AEEEEEEE6<AAEEEEE<EEEAEEEEEEAAEAEAEEEAAEEEEEEEEEE/EEEEEEEEE/EEEEEEEEAEE/EAEEEEAAAAA	s1:i:33	s2:i:136	RG:Z:1	NM:i:1	AS:i:296	de:f:0.0067	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1020777	83	MT192765.1	11217	50	122M	=	11217	-122	CTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCC	EEEEA6AAAA6E/AA6AAAE/EEA<EE<AEEEAE<EAEAEAEAE<EEEEE/AEEAAEEEEAEEEEEEEE/EEEEE/EEEEEEEEEEEEE6EEEEEE/EEEEEE<EEEAEE6E6EEEEAAAAA	s1:i:110	s2:i:41	RG:Z:1	NM:i:0	AS:i:244	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:244
+ERR5069949.1020777	163	MT192765.1	11217	60	122M	=	11217	122	CTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEAEEEEEEAEEEAAEEEEEEEEEAEEEEA	s1:i:110	s2:i:0	RG:Z:1	NM:i:0	AS:i:244	de:f:0	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:244
+ERR5069949.1088785	99	MT192765.1	11864	60	149M	=	11912	198	CAGTAGTCTTACTCTCAGTTTTGCAACAACTCAGAGTAGAATCATCATCTAAATTGTGGGCTCAATGTGTCCAGTTACACAATGACATTCTCTTAGCTAAAGATACTACTGAATCCTTTGAAAAAAAGGTTTCACTACTTTCGGTTTTG	AAAAAE/EAEE<EEA///<AEEE/EE<AEEE<EA/EEEEEEE/EAAAEEEEEE<E/E6AE<<E/EEA//</E/EEE/EEE/EE/E/<<EEAAAE<EEEEEE/EAEA//<//AA/E</A<<E/EEEE/AEE<E/<EAE</A6///AEEAA	s1:i:182	s2:i:0	RG:Z:1	NM:i:3	AS:i:268	de:f:0.0201	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:268
+ERR5069949.1088785	147	MT192765.1	11912	60	150M	=	11864	-198	CTAAATTGTGGGCTCAATGTGTCCAGTTACACAATGACATTCTCTTAGCTAAAGATACTACTGAAGCCTTTGAAAAAATGGTTTCACTACTTTCTGTTTTGCTTTCCATGCAGGGTGCTGTAGACATAAACAAGCTTTGTGAAGAAATGC	AEEEEE<E//E<EAEE/AAAA<AEEEAEEEEE<AEEAEEEEEEAEAE</AE/EEE/<EEEEAEEEEEEEEEAEEEEEEE/EEEEEEEEEEEEEEEEEEEEAA/EEEAEE/EEAEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEAAAAA	s1:i:182	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1132353	83	MT192765.1	12075	60	150M	=	12066	-159	AACCTTACAAGCTATAGCCTCAGAGTTTAGTTCCCTTCCATCATATGCAGCTTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATC	EEAEEEEEEEEEEEEEE<A<EEEEEEEEEEEAAEE<EAEEAAEAEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEAAAAA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1132353	163	MT192765.1	12066	60	151M	=	12075	159	CAACAGGGCAACCTTACAAGCTATAGCCTCAGAGTTTAGTTCCCTTCCATCATATGCAGCTTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTG	AAAAAEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEE<EEEEEEEEAAEEAAEEEEEEEEEEEEEAE<AAAAAAE/AEEAEEEEEEEEEEEEEAAAEAAEEA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1151736	83	MT192765.1	12222	60	151M	=	12126	-247	ATCTGAATTTGACCGTGATGCAGCCATGCAACGTAAGTTGGAAAAGATGGCTGATCAAGCTATGACCCAAATGTATAAACAGGCTAGATCTGAGGACAAGAGGGCAAAAGTTACTAGTGCTATGCAGACAATGCTTTTCACTATGCTTAGA	AAAAAAEA//EE/EAAAEAEEEEAAEEAA</AEEEEEEAAEAAEEEEEA<EEEEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEE<EE/EEEEEEAEE/EEEEEEEEEE/EEEEEEEEEEEAEEEE/EEAEEEEEAEEAAAAA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1151736	163	MT192765.1	12126	60	151M	=	12222	247	TTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATCTGAATTTGACCGTGATGCAGCCATGCAACGTAAGTTGGAAAAGATGGCTGAT	AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEE<EEEEAEEEEEEEEEEEEEEEEAEEEEEAEEAAEEE<AEAEEE<A/AAEEEEEEEAAAAA<AAAE<EEEEAEEEAEEEEEEAEEAEA/A	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1189252	99	MT192765.1	12486	60	98M	=	12486	98	CTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAGTAAAATTG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEAEEE	s1:i:88	s2:i:0	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.1189252	147	MT192765.1	12486	52	98M	=	12486	-98	CTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAGTAAAATTG	EEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:88	s2:i:27	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:2	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.1246538	99	MT192765.1	12601	60	148M	=	12627	177	AGTATGGACAATTCACCTAATTTAGCATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEA/EEEEEAEEEEEE/EEEEEEEEEEEEAAAAEEAEEEEEEEEEEEEEEEEE	s1:i:168	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1246538	147	MT192765.1	12627	60	151M	=	12601	-177	ATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAAC	AAAAAAEEEAAEEEEAAEAAAEEA<AAAEEAEEEAAEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA	s1:i:168	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1328186	83	MT192765.1	12953	60	151M	=	12866	-238	AAGGATTAAACAACCTAAATAGAGGTATGGTACTTGGTAGTTTAGCTGCCACAGTACGTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGA	EE/<E6/E<AAE<E<EEAEE<AAEE//EEEEEA<A6</EEAEEEEE<AAAEEEEEEEEAEEEEEEAEE/EEEAEEEEEEEE/EAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1328186	163	MT192765.1	12866	60	151M	=	12953	238	GTACTATCTATACAGAACTGGAACCACCTTGTAGGTTTGTTACAGACACACCTAAAGGTCCTAAAGTGAAGTATTTATACTTTATTAAAGGATTAAACAACCTAAATAGAGGTATGGTACTTGGTAGTTTAGCTGCCACAGTACGTCTACA	AAAAAEEEEEEE/EEAEEEEAEEEEEAEEEEEEEEAEEEEEEEEEEEEEAE/EEEEEEAEE/EEEEEEEEEEEEEEEEEAAAEA/EEEEEEEAAEEEEE/EEEEAEEEEEAAEEEE/AAAE<A<EEEE6AEEAAA<<<<AA<AE/EEAEEA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1331889	99	MT192765.1	13010	60	132M	=	13010	132	GTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTG	A/AAAEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEE/EEEEE<AEAEEEEE/EAEAEEE/AEEEEEEEEEEEEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEA<EE	s1:i:122	s2:i:0	RG:Z:1	NM:i:0	AS:i:264	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:264
+ERR5069949.1331889	147	MT192765.1	13010	48	132M	=	13010	-132	GTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTG	A/EEEEEAEEEEEEEEAEEEEEEEEEA<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEAAEEEEEE/EEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEAAAAA	s1:i:26	s2:i:122	RG:Z:1	NM:i:0	AS:i:264	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:264
+ERR5069949.1340552	83	MT192765.1	13029	60	151M	=	13021	-159	AACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTACTGGTCAGGCA	AEAAAAEE/A<EEAAEEE/EEEEEEEEEEAAEEEEEEEEAAEEEEEEEEE<EEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA	s1:i:145	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1340552	163	MT192765.1	13021	60	148M	=	13029	159	GGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEAEAEEEEEEEEEE<EEEEEEEEEEEEEEAAA<AEEEEEEEEEEEEEE	s1:i:145	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1412839	83	MT192765.1	13187	60	147M	=	13154	-180	TTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATG	EEA<AAEAAAAAAE<A<<EA<EAE</E<EEEEE/EEEEAAAEEE/EEEE/EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEA<EEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEAAAAA	s1:i:166	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.1412839	163	MT192765.1	13154	60	150M	=	13187	180	GTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGT	AAAA6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAAAEEEEEEEEEEAAEEAEAE<EEEAEAEEE/<AAAEAEAA/EAEEEEAEEAAE/AEA/EEEAEEAEAA	s1:i:166	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1476386	99	MT192765.1	13329	60	151M	=	13382	201	TAATGACCCTGTGGGTTTTACACTTAAAAACACAGTCTGTACCGTCTGCGGTATGTGGAAAGGTTATGGCTGTAGTTGTGATCAACTCCGCGAACCCATGCTTCAGTCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTG	AAAAA/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEAEEEEEEE/AEEE/EEEEEE/AEE/EEAE/EEE<EA/<EEA/EEEEE/EEEEAAEEEAAAAEEAEEE	s1:i:188	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1476386	147	MT192765.1	13382	60	148M	=	13329	-201	TGTGGAAAGGTTATGGCTGTAGTTGTGATCAACTCCGCGAACCCATGCTTCAGTCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACACCGTGCGGCACAGGCACTAGTACTGATGTCGTATAC	AAEEEA<AEA/AAAEEE/E/AEE/E6AE/EAE/EEE<EEEAEEEEEEEEAAEE<<EEEEEEEEEEEEEEEEEEEEEA/EEEEEAA//EAEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEAA6AA	s1:i:188	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1538968	83	MT192765.1	13817	48	150M	=	13799	-168	CTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGACACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATGATTTTGTAGAAAACCCAGATATATTACGCGTATACGCCAACTT	AEE6AA<E/EA/<AE<AEA<6AA6AAEEEAAA6/6</AEEEE<EEEEEEE/EEEE//EEEAEEE/EEEA/EEEAEE/EEEE/EAEEEEEE<AEEEEAEEEEAEAEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEAEEEEEEEAAAAA	s1:i:41	s2:i:154	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1538968	163	MT192765.1	13799	60	151M	=	13817	168	CACGATGGCAGACCTCGTCTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGACACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATGATTTTGTAGAAAACCCAGATATATTA	AAAAAEEEAEEAEEEAEEEAEEEAE<EEE6EAEA<EAAAEEEEEEEEEEEEEA/</EEEEEEEEEEEEEEEEEEEEEEAEEEEE/AEEEEEEEEAEEEEEEEEEEEEEEEEAEEEAA<AEAEE<AAE<A<AEEEEE/EA6AAA/EE/EEEA	s1:i:154	s2:i:0	RG:Z:1	NM:i:1	AS:i:294	de:f:0.0066	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.1704586	99	MT192765.1	14601	60	149M	=	14761	310	GATAAACGCACTACGTGCTTTTCAGTAGCTGCACTTACTAACAATGTTGCTTTTCAAACTGTCAAACCCGGTAATTTTAACAAAGACTTCTATGACTTTGCTGTGTCTAAGGGTTTCTTTAAGGAAGGAAGTTCTGTTGAATTAAAACA	AAAA6EEEE/EE6EEEEEEEEEEEEEEEEEE<EEEEEEEE6EEAEEEEEA<EEEEE66EEEEE///EEEAEEEE<EEEEEEA/EE/EEEEEEEAE<E<AA<AAAEEAE/AEE<E<AA<EAAEEAE/AEE/E/EAEAAAEE/EA/A//EE	s1:i:277	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.1704586	147	MT192765.1	14761	60	150M	=	14601	-310	CTCAGGATGGTAATGCTGCTATCAGCGATTATGACTACTATCGTTATAATCTACCAACAATGTGTGATATCAGACAACTACTATTTGTAGTTGAAGTTGTTGATAAGTACTTTGATTGTTACGATGGTGGCTGTATTAATGCTAACCAAG	A//EEAE<AAA<AEAA6EEE</<AAA6EE//A<A<<AE<E//AEEEEE<EEEEAEAA<AEA<AE/EEEEAEEAEEEAEAEEAEEE/EEEEEEAE<EEEEEEEEEEEEEE/EEEEEEAEEEEEEEEEEE/EEEEEE/EEEEE<EEEA/AAA	s1:i:277	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1709367	83	MT192765.1	14886	50	129M	=	14886	-129	GGTGGCTGTATTAATGCTAACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATACACTT	AA/EEAAAEEEEAEE6A/EAAEAAEAAAAAAAAEEAEEE/AEAE<AEEAEAE/EEEEEEEEA/EEAA<AEE/EEE<AEA<EAAEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEAAAAA	s1:i:117	s2:i:42	RG:Z:1	NM:i:1	AS:i:248	de:f:0.0078	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:248
+ERR5069949.1709367	163	MT192765.1	14886	60	129M	=	14886	129	GGTGGCTGTATTAATGCTAACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATACACTT	AAAAAEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEAEEEEEEEEEEEEEEE	s1:i:117	s2:i:0	RG:Z:1	NM:i:1	AS:i:248	de:f:0.0078	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:248
+ERR5069949.1778133	83	MT192765.1	15491	48	146M1D5M	=	15485	-158	AACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTCACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTACAACACAGACTTTATGAGTGTCTCTATAGAAATAAGATG	AEEAEEEEEAAAAAA<AEEEEEEEEEEEEEEEEEEAEEEEEEEAEAEEEEEEEEEEEEEEAEEEEAEAEAEEEEEEEEEEEE<AEEEEAAAEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:55	s2:i:139	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.1778133	163	MT192765.1	15485	60	150M	=	15491	158	TGCCACAACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTCACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTACAACACAGACTTTATGAGTGTCTCTATAGAAA	AAAAAEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAAEEEEEE<AEEEEEEE/AAAE<AAEEAAEEEA<EAAEEEA<AAEEEEEE/EEAAAEE/EAAAAEEEEEAEAEE	s1:i:139	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1980512	83	MT192765.1	16852	60	151M	=	16801	-202	CTGTTGTTTACCGAGGTACAACAACTTACAAATTAAATGTTGGTGATTATTTTGTGCTGACATCACATACAGTAATGCCATTAAGTGCACCTACACTAGTGCCACAAGAGCACTATGTTAGAATTACTGGCTTATACCCAACACTCAATAT	EEEEEEEEEEAEEEEEEAAEEEEEAAEAEAAAEAEEEAEAEEEAEEEAEEEEEAEAAEEEEAEAEEEEEEEAEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:193	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:12	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1980512	163	MT192765.1	16801	60	150M	=	16852	202	GTAAAGTACAAATAGGAGAGTACACCTTTGAAAAAGGTGACTATGGTGATGCTGTTGTTTACCGAGGTACAACAACTTACAAATTAAATGTTGGTGATTATTTTGTGCTGACATCACATACAGTAATGCCATTAAGTGCACCTACACTAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAEEEEEEEEEA<EEEEEEAEEEEAAAEEAAE<EEEAAAA<AA<EEEE/AE	s1:i:193	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2033605	83	MT192765.1	17101	48	150M	=	17083	-168	TTGCTATTGGCCTAGCTCTCTACTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAG	AAA<EAA<EEEAAAA/E</EA/E6EAEE/EE/AEA<AAEEAEEA/EE<EEEEEEEEEEEE<AEE/AEEE/EAAEEEAEEAEEEEE<EEE<EEEEEAAEEEEEEEEAEEEEEAEEAEEEEEEEEEAEEEEEEEEEEEEEEEEE/EEAAAAA	s1:i:34	s2:i:160	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2033605	163	MT192765.1	17083	60	149M	=	17101	168	GTACTGGTAAGAGTCATTTTGCTATTGGCCTAGCTCTCTACTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATA	AAAAAEAEEEEEEEEEEEEAEEEEEEEEEEEEE<EEEEEEEEEEE<EEEEEEEEAAEAEEEEEEEEEEEE/EAAEEEA/EEEEEEAE<EEEEEEEEE<AEEEEAAAEAE<EAEEEEEE//</A/AEAAAEA/<E<AEEEAEE<EEEEEE	s1:i:160	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:24	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2161340	99	MT192765.1	17482	60	80M	=	17482	82	AACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGT	A/AA//EEAEA/E/AEEEE6EE/EEEA/6AEEEEEEEEE6EEEAEAEE//A/EEEEEE//E/E/A//E/E/<<EE</E/E	s1:i:69	s2:i:0	RG:Z:1	NM:i:0	AS:i:160	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:160
+ERR5069949.2161340	147	MT192765.1	17482	55	82M	=	17482	-82	AACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCC	A//E/<EAEA/EE/EEEA/<AE<AE/AEA/EEEAE/EEE//EEE6////EEEEAEAE///EE//</E/E</AE/6EAAA6AA	s1:i:69	s2:i:0	RG:Z:1	NM:i:0	AS:i:164	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:164
+ERR5069949.2243023	83	MT192765.1	17854	60	150M	=	17713	-291	ACTATGTCATATTCACTCAAACCACTGAAACAGCTCACTCTTGTAATGTAAACAGATTTAATGTTGCTATTACCAGAGCAAAAGTAGGCATACTTTGCATAATGTCTGATAGAGACCTTTATGACAAGTTGCAATTTACAAGTCTTGAAA	EE<EAEEAE<EA<E/EEAEAEEEE<EEEEAA<AEEEEEEEEEAAEAEEE<EEEEAEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:273	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2243023	163	MT192765.1	17713	60	151M	=	17854	291	TGGTAAGAGAATTCCTTACACGTAACCCTGCTTGGAGAAAAGCTGTCTTTATTTCACCTTATAATTCACAGAATGCTGTAGCCTCAAAGATTTTGGGACTACCAACTCAAACTGTTGATTCATCACAGGGCTCAGAATATGACTATGTCAT	AAAAAEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEEEEEEEEEEEE/EEEEAAEEAEEA<EEEEEEEEEEEEEEAEEEEE<<AA6AAEEEAEE	s1:i:273	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2257580	99	MT192765.1	17980	60	151M	=	18039	209	AGTTGCAATTTACAAGTCTTGAAATTCCACGTAGGAATGTGGCAACTTTACAAGCTGAAAATGTAACAGGACTCTTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTGACACTAAATT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEEEEEEEAEEEEEEAEEEEEEEEEEEEAEEEEEEEEEAEEEEEEAEE/EEAEA<EAEEEAEEEEEEEEE<EEAAAEAEEE<EA	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2257580	147	MT192765.1	18039	60	150M	=	17980	-209	AATGTAACAGGACTCTTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTGACACTAAATTCAAAACTGAAGGTTTATGTGTTGACATACCTGGCATACCTAAGGACATGACCTATAGA	EEEEEEAEEAAEEEEAAAAEEEEEEAEEEEAEAEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2521353	99	MT192765.1	19597	60	150M	=	19698	251	CTTTTACAAGACTTCAGAGTTTAGAAATTGTGGCTTATAATGTTGTAATTAAGGGACACTTTGATGGACAACAGGGTGAAGTACCAGTTTCTATCATTAATAACTCTGTTTACACAAAAGTTGATGGTGTTGATGTAGAATTGTTTGAAA	AAA/AE/6E6EEEEAEE/EE/EEE/EE/EA/EAEA//EEEEE6EAEAE/EEEEEE/EAE////EEA/EEEEEEEEEEEEEE///A/EEAEEEEEEEE<AEAEEE/AE/E<E/EEEEEA/E///AE/66AEEAEEE<E//E/EA/A<6AEE	s1:i:175	s2:i:0	RG:Z:1	NM:i:4	AS:i:260	de:f:0.0267	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:260
+ERR5069949.2521353	147	MT192765.1	19698	60	150M	=	19597	-251	ATCACTGTTTTCACAAAAGTTGATGGTGTTGATGTAGAATTGTTTGAAAATAAAACAACATTACCTGTTAATGTAGCTTTTGTGCTTTGGGCTAAGCGCAACATTAAACCAGTACCAGAGGTGAAAATACTCAATAATTTGGGTGTGGAC	A//A</</EE/A<AEEA//E<EEE/E<A/<<A///<6EAEEEEE/AAA</A//<<EA/EEA//</AA6EEAE</EEA//AEE//</AEEAE/EEEA/A/EEEE//E/EAA/EEE/AEE<EEE<EE/EAEEEEE6EEEE/EEEEEEAAAAA	s1:i:175	s2:i:0	RG:Z:1	NM:i:4	AS:i:266	de:f:0.0267	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:266
+ERR5069949.2605155	99	MT192765.1	21717	60	146M	=	21726	159	GTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEE/EEEEEEEEEEEEEE<EEAEEEAEAEAEEEEEEEEAAEEEEE<EEAEAEEEAA<E<EAAE</E/AA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:292	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.2605155	147	MT192765.1	21726	60	150M	=	21717	-159	TTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGG	A/EEEE/EEAEAEEEEEAEEAEEEAAAEEEAEEEEEEAEE/EEEAEAEAEEEEEEAEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:148	s2:i:30	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2650879	83	MT192765.1	22710	60	150M	=	22659	-201	TAAATTAAATGATCTCTGCTTTACTAATGTCTATGCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTGATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGC	EAEEEAEE<EEE/EEEEEEAEEEEEEEEEEEA<AAEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:192	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2650879	163	MT192765.1	22659	60	151M	=	22710	201	ATATAATTCCGCATCATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTATGCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATT	AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEAEEE<A<EEEEAAAAEEEEEEEEEEEE	s1:i:192	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2730382	83	MT192765.1	23528	48	142M	=	23528	-142	ACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGT	A<AA<A<EEEAAA/A<AEAEAEA<EAA<<AEA<EEEAAAEE<EEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEE<EEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEE/EAAAAA	s1:i:48	s2:i:143	RG:Z:1	NM:i:0	AS:i:284	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2730382	163	MT192765.1	23528	60	142M	=	23528	142	ACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGT	AAAAAEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAE<EE/EEEEEEE/EEAEEEEEEEEEEEEEEEEEEAEEEEA<AEA<<EA<A<AEEEEEA<EAE<66A/AEEEEEEEAE<AAEA	s1:i:143	s2:i:0	RG:Z:1	NM:i:0	AS:i:284	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2734474	81	MT192765.1	23547	1	149M	=	23548	-148	ACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCA	AA/EEA/EAAAA<AAEEEEAAEEEEEEE<A/EEAEE<AEEEEEEEEAEEEEAEAAEAAEE/EEAAEEE/AEA/EEE/E/EEEEEEEEE/EEEEEEEEAEE/EEEE/EEEEEAEEEEEEEEEEEEEEEEE//EEEEAEEEEEEEAAA/AA	s1:i:58	s2:i:136	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2734474	161	MT192765.1	23548	60	148M	=	23547	148	CCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCA	AAAA/EEEEEEEEE/E/EE6EEEEAEEEEEEAEEEEE/EEEEEEEEEEEAE/EAEE/EEEEEAE/EE<EAEEEEEEA/E<EEEEAE/EA<EEEEAEE/E/EE<EEEEE</EE/E//<<<AA6A<A<A/<AE/AE/EEEA6<A6A/</A	s1:i:136	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.2734873	83	MT192765.1	23550	48	98M	=	23550	-98	CATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGT	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:25	s2:i:92	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:4	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.2734873	163	MT192765.1	23550	60	98M	=	23550	98	CATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE	s1:i:92	s2:i:0	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.2772897	83	MT192765.1	23876	60	144M1D7M	=	23809	-219	AAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCATTTATTGAAGATCTACTTTCAAC	AEEEEEE<AAEEAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:199	s2:i:0	RG:Z:1	NM:i:1	AS:i:294	de:f:0.0066	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.2772897	163	MT192765.1	23809	60	150M	=	23876	219	CTTTCGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAATAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEA<EAEEEE<EEEEAEEAAEEEEEEEEEEE	s1:i:199	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.2787556	99	MT192765.1	24088	60	106M	=	24088	106	GCTGCTAGAGACCTCGTTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAE	s1:i:78	s2:i:0	RG:Z:1	NM:i:1	AS:i:202	de:f:0.0094	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:202
+ERR5069949.2787556	147	MT192765.1	24088	50	106M	=	24088	-106	GCTGCTAGAGACCTCGTTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGG	EEAAEEEEEEEEA<EEEE<AAA<EEEEEAEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:78	s2:i:0	RG:Z:1	NM:i:1	AS:i:202	de:f:0.0094	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:202
+ERR5069949.2832676	99	MT192765.1	24409	60	139M	=	24409	139	GTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCA	AAAA6EEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<E/EAEEAEEEAEEAEEEEEAEEEEEEEEEEEEEEAEEAEEEEEAAEEEEEEA<AEEEAAAAEEEEE<EEAAAEEAEEAAEEEEA	s1:i:132	s2:i:0	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.2832676	147	MT192765.1	24409	48	139M	=	24409	-139	GTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCA	A<EEEE</EAEA6EEA</AEEEEAEEEAAE/EEAEE<A<AAAEEEEAAEEE/EEEEEEEEAEEAEEAA<EEEEEEEA<EEEAEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEAA6AA	s1:i:37	s2:i:132	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.2888794	83	MT192765.1	24853	60	151M	=	24758	-246	ACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACACATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACCTGAATTAGACTCATTCAAGG	AAEAAAEEEEEEEEEEEEEEEAAAEEEEAAEEAAAAEEEEAEEEEEEEEE/EEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:231	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2888794	163	MT192765.1	24758	60	150M	=	24853	246	TCCCTGCACAAGAAAAGAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTTTCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAG	AAAAAEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEAEE<<<6AEE</AAAEEEEEEEAA<EEAAEA	s1:i:231	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:25	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3017828	99	MT192765.1	26176	60	107M	=	26177	107	ATGATGAACCGACGACGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAAT	AAAAAE6EEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:214	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:214
+ERR5069949.3017828	147	MT192765.1	26177	48	106M	=	26176	-107	TGATGAACCGACGACGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAAT	A/EAAEEEAEEAE<E</EEEEEEEEEAE<EEEEEEAE<EE/E<EEEEEEEEEEEE<EEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAAAA	s1:i:37	s2:i:96	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.3022231	99	MT192765.1	26228	60	147M	=	26228	147	GTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEAAAAEEEEEEEEAEEE	s1:i:139	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.3022231	147	MT192765.1	26228	48	147M	=	26228	-147	GTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATAT	EAAAEEEEEEAEEEEE<EEEEAE<EEAAEAAEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:34	s2:i:139	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.3057020	99	MT192765.1	26621	60	86M9S	=	26621	86	CAATTTGCCTATGCCAACAGGAATAGGTTTTTGTATATAATTAAGTTAATTTTCCTCTGGCTGTTATGGCCAGTAACTTTAGCTTGGTTGTACGC	AAAAAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAE	s1:i:71	s2:i:0	RG:Z:1	NM:i:0	AS:i:172	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:172
+ERR5069949.3057020	147	MT192765.1	26621	51	86M9S	=	26621	-86	CAATTTGCCTATGCCAACAGGAATAGGTTTTTGTATATAATTAAGTTAATTTTCCTCTGGCTGTTATGGCCAGTAACTTTAGCTTGGTTGTACGC	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:71	s2:i:33	RG:Z:1	NM:i:0	AS:i:172	de:f:0	rl:i:0	cm:i:2	nn:i:0	tp:A:P	ms:i:172
+ERR5069949.3122970	83	MT192765.1	26996	48	127M	=	26996	-127	ATCAAGGACCTGCCTAAAGAAATCACTGTTGCTACATCACGAACGCTTTCTTATTACAAATTGGGAGCTTCGCAGCGTGTAGCAGGTGACTCAGGTTTTGCTGCATACAGTCGCTACAGGATTGGCA	A//6AAAEAEEA/AAEEEEEEAAE/EEE//A<EEEEEEEEEAEEE/EEAAEEAEEEE/<EEAEEEEEAEEAEEAEEEEEEEEA<EAEEAEAEAEEA6EEEEEEEEEEEEEAEEEAEEEEEEEA/AAA	s1:i:52	s2:i:119	RG:Z:1	NM:i:0	AS:i:254	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:254
+ERR5069949.3122970	163	MT192765.1	26996	60	126M	=	26996	127	ATCAAGGACCTGCCTAAAGAAATCACTGTTGCTACATCACGAACGCTTTCTTATTACAAATTGGGAGCTTCGCAGCGTGTAGCAGGTGACTCAGGTTTTGCTGCATACAGTCGCTACAGGATTGGC	AAAAAEE6EEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEE/EEEEEAE<EEAEAEEEEEEEEEAAEEEEAEAEEE/AEEE<A<A/AAAAAE/E<A66AEEEEEEEEEEEAE<</6AA<A/6/EA	s1:i:119	s2:i:0	RG:Z:1	NM:i:0	AS:i:252	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:252
+ERR5069949.3184655	83	MT192765.1	27352	60	150M	=	27311	-191	ATGAAGAGCAACCAATGGAGATTGATTAAACGAACATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAGAGGTACAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAA	AAAE6E</EA6<A6/A/E6A</EEE<EEA///E/A<<</AEEEE<E<EEEEEEEEEE/E<E/EE/A<AEEAEAE/EEEEEEEAEEEEEEEEEEEEE/AEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEAAAAA	s1:i:185	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3184655	163	MT192765.1	27311	60	150M	=	27352	191	TTTATCTAAGTCACTAACTGAGAATAAATATTCTCAATTAGATGAAGAGCAACCAATGGAGATTGATTAAACGAACATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAG	AAAAAEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEE<EEEEE/EEEEEAEAEEEEE/EEEAEEE<EEEEEE<EEAAEEAEEEEEAAAEEE/E<AAEEAAAE6A/A<<A<AAAEE/AA6AE/A	s1:i:185	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3249622	83	MT192765.1	28372	37	77M	=	28218	-231	CGATAAAAACAAGGTCGGCCCCAAGGTTTACCCATTAATACTGCGTCTTGGTTCACCGCTCTCACTCAACATGGCAA	E/E///<E<<////AE/EEA/EEEEEE/EEEEE//A//E/EEEEEEEE/EEE/EE/EAEEAEEEAEE/AE/EAAAAA	s1:i:97	s2:i:0	RG:Z:1	NM:i:3	AS:i:124	de:f:0.039	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:124
+ERR5069949.3249622	163	MT192765.1	28218	38	116M	=	28372	231	ATCATGACGTTCGTGTTGTTTTAGATTTCATCGAAACGAACAAACAAAAATGTCTGATAATGGACCCCAAAATCATCGAAATGCACCCCGCATTACGGTTGGTGGACCCTCCGATT	AAA/AE//EEE/EE6AE/A</EE//6AE6EE//EE/AE//A/EE//EEEE<EAA/EE//<E/A/E/EE//E/A/E/E//EE/<A/A<EE/A//</EE//E/E//A/EEE///A//6	s1:i:97	s2:i:0	RG:Z:1	NM:i:5	AS:i:182	de:f:0.0431	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:182
+ERR5069949.3338256	83	MT192765.1	29452	60	151M	=	29431	-172	CCTGCTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCT	AEEEEEEEEEEEA<AEEAEEEEEEAA<EEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAAE<EEEEEAEEEEEEEEEEAEEA/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:163	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.3338256	163	MT192765.1	29431	60	150M	=	29452	172	CAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEAEEEEEEEEAAAAAEA<AAAEA<AA	s1:i:163	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:25	nn:i:0	tp:A:P	ms:i:300
diff --git a/src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.bam b/src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.bam
new file mode 100644
index 0000000000000000000000000000000000000000..0694dec737c76228e68a5e50bb5bbffca8429716
GIT binary patch
literal 18822
zcmV)DK*7HsiwFb&00000{{{d;LjnM70fmy^PQox4#fx{vm)Hxe<Eso@uQnh-l5I{W
z+%DZ8a0y#+t!TXW?R+T1kS&s-cc=Zn{!Y$GCyv+qx<bglo8gtufQ)c>&k`Z3G%vZz
z`gZt$L!Wt1VA1i&B!FF?QR|jkefx2a7c&Ofov>{l0}foaaNVW1@Esg_Ol4BxV98=$
zB_hqBZwK{*A}{lT?<fxZR<C?U35@m18UmiW%|O%nyCy5`0Fy?_6*tygDsD(#$wcg_
zF`qio0KgoY=FlV%IidG++C#L-N>StqS(p4A@nWQ9i-~~Lmt^zt&mwIWPk|&aln`5a
z2-ND}S2!S0=Q_=`eV)B(dKhaBm~TrWd2vo9aL#RX7gvf(X}V~!JDyirdZdd=Whsey
z(K(Jg%;_a_0!Mo~=!o@PcNo<b9#xjU?m7)_E*j834C1ic;m}_XA#{xpYX2<yR0w?o
zuo~OM>jD4(ABzYC000000RIL6LPG)o{71EY3$!d*c~+l!F*6L6TQ$|ghC)(RR>sZ9
z8qWIx1Kz4z14V>5nZY<gAUq@*41^Uz&@7a@%QD#sBoaa5ii(315lxKo5t1nAM2Qbp
zmLY^S#EF<$goKz;7b-+T$l&+yN7b%f)!pZud+C{bPM_V?=id7Ie}4bJ|J`gY)Z;JU
zPk7f0qpih*i}x(_<7T|}7{1?q@4W}xhbJfdCwK0i93CDoUU}1EAp{dF<e`kZw&|j{
z%!*h<ZNnLFyR7DwjB44+ylJ|uDp=OWS=NeL$|kRyrp(K*7BQ>iIG1Uf21QWVN7_FN
z5?aSA34KY^LSGjI^h8}6(8c;Wy+q6O6I~zRvIH+DiTX`+b&9pL2$t$QJ)^%Xp|5z3
zC3-tn@C4VsY3I^Ee$(!y#l0_h>CWOCUVLe*yzjnz>(9LQTYuv1zxJH({Ps(W*S!AH
zliu)#ON;M%V|mZ(--Op5b8ypG-naPX#enbfD_{A_1j~0`zgXk|h1W6)c^pPrR<v2v
z#<I&hDWkY)tFq{LA&Rb&U3roIrNL4IK{uArk93RlNPuO6|CV%fxI^5k!qfiIkflXp
zeh&1TQ>;#I_}cQmOYg@0UVGEc&o%dZ!*IWs-u*%&I_))!MWk-`HNEIOhPAJZMQ3Mc
z|M+NI$xaxTRoB!xiy<UMUbZ4^GTB8kuF4|jT@`gKm#mDLY{N2)x-yDHn29RxnnvVB
z7xJttx|oY7rAJ_8#u1&-PrYQ(f3xdpLXwcC{mRT`_LVfTe_fKK8cSNI=PCUu3PAZe
z7s>@>=b0osgpI*+>B!DAxBj>R^L;&-4=#Shfcc(l7mNG0#$fIrotzvm-Uu)m595}}
zl8ZWz%1YK5FIz6Uu#_^3!;sfu#>0rkSyOj1%9}E3S=Hdmuqf*~6B#c!YYG--7gF^o
zmIgp2Wa&~rR}!UEO{=Q`-Rf<kE{7Z?&^7!?@g=}dDHa0rZhLvER@6mGY6}CSQtV_&
zFQr<RQ(7g-{G_jn1;P2T`d6(g-s-?vY+2cStcUYmy=;=aZjDvZmCaKj6-f^H<z-ii
zobxX3G8x7ZZ$%X~5bZGLcqfTUnvjfWZKUVw5s<Lpe|_V0=USxb=R^Z!Oi^k<?u9WK
z1T$48(aPei$OiqdP?yl-G_o0vWb^+za;U+)Z}E!7Ag}-U@iCZMHXnsylR}mdK7iK=
z!K;oSWo5<6P(n~eR+LfGc41gnkXKo$yDI{9^GZMnCS$CUrIE>0+v`NjB7Ip}Q8faV
zdN8Dgz41?aH-70u<NJQ1W8XexW$2gt+a6rw<5>H5o_#wwIN9G<GSoy-9Tt+8VH|T-
zvMwvLTuR(^$hs<wL#QhjGFI1N*JdG{B9l26q0Ctbi^o|T=3(6c8ExKro8*45!&Cob
zH=+=@qEQUOL@y=e^0eeBBc@(Tw7!`);Dz2_6w4&AQi^3j`7>veAq(c?jtp5a3E<~?
z3ORt;g$%&d85{vz1avIy>XJ1%K<tEUBVgKQf@N_eWEN#{4V+>2;6_-@C99h_lm+7m
zOmf@+8JLtJ0176S!4k$sn`|XGctKmh-WO=K(gGQnPeK_pKw{9e_y-nv`rXUr@S!AQ
z)m#prl9kJ!*%WBpJPI_dvv~S-gXgu*;<v8T0f-Cc@&3X7u5uP0c=;H-a2<HDtQw9t
zmaGc#t+{BB(h}HP_teovg3SD=U55_nAVGn?!n)4D3Z42W$baZxORhl$IHc~W@+lfj
z<v0p;or>E6<#gx@g7a^=a9#sApQ@e3{dXBSZ*ky!>elBBbT4$^e3`TMGEdPDPY!nW
z77xL`dCFRbVI|=cr08l<vbJoyR<v2u6>XG--2m9tdjv+FaMv&MjoZ(zHf}#Vy60`}
zAl-ZZF;5@Qd(W%?*;j;wRh$&}{IWIoe64%W|H)Ai8x&NFL6v^j^J7|VQo6dS5KT2A
ztTNseWUnd8Dk<uMy3}2+;ut9Y(%4ZIS(6uEYD=La9`o;BDx<Na=h{6}@I4V*Sb<A#
z#m#*CO@Gl!?TyaOy!F|}23&aJ;GN!KZYIRqJG{j0;OKaFXYtUDi$%p*6idW5aKL3O
z!Y-CY-gR_OWyhOLAZC=%@mRn$%Sc91#_Ni8OvbEi>R2G?$;7FJIlW5P6jR4;-v{je
zqVJB_8h6CDPPPW5q8;sy-o6+R{Vp$6)9C#a(94B@aEH8TqBv||4w0X6B)MG`v#M)Z
z7)G*#A8HscncyPJqpr?mDMg7ev20*hBX)uO4`m!hs?*O@xz<#>lSCQ7Brt}~q`y5#
zfs2ANl!<w!J<;+Ecxu5T@U~s>Zu7v~a=^Q7>qQ2<@3>{^er|ozV)5*;``Oz*+27e#
z?x&2AmDD6h9gEvA3whHu@H=IJ)D9E^L!cB1vMy05n+~~en75&fK@@}pWh|SPTvZu&
zI#)3X$x<g^fkOuLZq|_%M}Ew+-VO~hK<}_qrPKC$?fsRTG9R9DPeBsstTNE+fb+g+
z%NNEwJv?u7@Vs;D>$e8%{A7hN{`X_MWbmZeNHPg!6&%PlEJB<VQm6?{4$GLsjW;38
zG_o0(E>^{P*hvAGAAxo$k>ONG!x40fq7pn*4jTV{2p<WZ(r*2QlCfk-w@Rs8Y9j1p
z05Ns#d63wc99Qa`E(wyL56lA-P1&GnAait(uw2@_Q(Hw3&AH$xnhlO_yfBr{mtUAL
zT6;%uZ6``PD}+KqG<8$L7{beer;;LqNvtJ{cm#B7UW1$f-eJe%0%4LY5ifPX7@q*%
z3rh)lqHg1hSJ&64n_n9Lg+9xnUem$Rqh}M&>kXWD_i#SAc->-Hdz}a8;o-^A!Qz1%
z77G$UqIohNA%?F+RU%U85M4G^7R69-#=;!#1$<yuH;6>RfJLXpEj`C8;|HL4r5`bU
zVvNL1j>G-L2#Dusm*E2Op0UfgQXo!yPs`b>lC(%qfrWJdn}E2^1;VCsFTXG~SEO`H
z4+yPbid#AD&3B@#V<gH^+`&MFO+%M+DKg0;u*i~eEz-(w8i_H?Nudds)DWQ>jnhH7
z2B*z0SSTOelox-^C@;3Y^qZoowR!|=PaXTx{lkO(1D(o6oT4?6RSgTfMzo03cq<`m
zSq|<mV#K|KIpc^}!7Xzk$o01HtsQI?kBDq;7}C2|h8M4z)>%IgtbULFn9`n>LO^W(
zZv6==H?Ca@tpD{}XCwZsFMTAM3MXOwTVr2p@Fd~v(J0{?83`R@VEkgia*;<!lXFCH
zO&LoD>ZXfAx-tXeEu<NdU|AzBK`H<tjXW^n@F>u8c<Oav_F-O8%rZacc^KjJ(43Rb
zTSwAq(fqTwPARZ!;>BXelg^`)<Lx6&fuWj_H8H+Ogm76_WQg7Bh&352GXW7q3etgx
z-~u&`4&?yWAP5OX7;*3sEm%SdUNR|J^VLY3hezUCO`E$7z-d;Df|mY5&D#J?wh_&E
z7EMCV=fDWzU-D#gK$C|ZVB<VOq7he8mQ5sbQsJ`Yv6LJ<O+(mL2mxEFm4pZZS(Ha1
z0>2tTjO2I)c}OX2^ko!K2L^#5l7l|#D@G7Ze>(FAveF${+1!lglV=&l--)NPdFOJm
zcz7I`9BhN*Kh#lk6vyBR5zB>D$C2!I9pbiD#<}24DByKOjiflHI1s8GQHC7J1_T+a
zGRlHmq{m^|@GiV6vN`h$HvC#cNG3X`Q8D_!bF0wimo|YJjljILH2`Nt^RF)kfbSUl
zbOUBeb>P!NRiJhUmFAsNeoFL^iXg;}@gYFVBa>~=?JXUxOza)^Y8lw2v%`DpkIH)U
z35><oN>UB4H3r6i)5G|h#osV6ejaPT=*i^X?#a=~;vT@5K{+!HH{FH64XheG3A7cJ
zVd#7<!Yt+y%wsDWnAAFMBS<YOq#erYdBm$W#v8eav#z<YN+yG=;Yl>(t;9`POY=dI
z1<|RY$?Y_+F<byFKjLF~#~R;rN3Y*Mzntp#Q*T=={+Ea4{`SGH4l}PBlo7j1zQ?9x
z=|~u71T!%;mfl*XW6P=kH30L|XLIB>DgJy9=DYeZlZ18O({C-96lNA3%9&9J$}eOL
z(GYaEY^a*mh>ko?gm9CC?J7}-sTe3lDWKvUbPj^2S}>8v_zf9nVRS(e{Rw%QK$T(=
zgNe>o0v*q5^-j!<2YNk8hakQWiUJ2=k5JaKzH}ndw3rirtLGotloY#qerK=eB!|{q
zT0K|t>cOdHll1(&*da?obs56UQ7h~+Q5F|dTP@L@2e!>+X{vO_bgJdTEUB`el{70^
z=}f`Y0TRF1?XOrm*O*=zTLm*%etMIhUzpYNi?>bNNPYxs|HISsqrIbT?Z=zwRKPJ0
zxTeC@5a%Xu;}G8|DN_ITWpTA0!Tj<DPc$jv1I&kKOIgqR?QgzoM$ezx3+9j9Ht4w(
zOaeIe1XF`af(hqMCBcftQQdK({lmEBS;_Fd$%Tw+*2orcl_5fR#Gc@|5v^wpu%!Ad
zgOzPyV&f1FC>gZ1w`bJ;;KbXOU35fiJv!%iURE$4-Xxe$8VTkj2DiW8WBK6X#m?G`
zeZf53J3d}~4CSKJfq2;pon@dM)=tY}|MRl2mnbGW&`^$0!TmkYXK{XXe0xq2`O=S$
z?rv9`F!%2Mxi2Yay*t~SmyAA%Wd0NCzmkO+2P#Alq^G#ME%)wnqX+jqW%%s=(qdTq
zOXJV(aA#*{cW?2rCoC4AaX~rdaTW>|OJq{W7C;5d8u?btkcc5=5a6dWh5!Q8I}^|x
zEQ846vH47$kI?&+W*^DwrIlLIyET)#DOCbWZ}w68(io+{LFr2)l(w}DxG4RVu?$>2
zl)RfslR>y=?snFirsuH-))MRjr8loZ=|&eNYg2#m4-K-^+tjBo7LR+fghKh=v9hV)
z+Q7Xb&Sa7aq)1`e6>*IG2uv`c7DpKiK`-$#k3rputZh&i>wqOLMc$q>@ORL&j+vld
zJK~@(RvFK&fpDrSh0T@%!>6gY=+aDT>xH(CnDm#cdKQ`J#4sL8g;UjfL4T5Ex(rNX
z5!rPh`RhJCmwLq+*V(6A<<@{3VR;QC?uQlwzSnpXr?IT@>H|089j%9pL<y#Dkzi7=
zAA_W$2cvZk*#kpbR)SIcBPxlkgNZ@~jK#YBcd2`VZ6}JhRf?jc2E8CPfs=+v=hwCk
zc@ll<W))Q+R85UWJ`M7H&DJq$mV8m-I05ly0`W`#^Rc-;<D2_n{`Rk(_KH4$wF_Hg
zfjmCkJ~8#zHm)Fv4X9&Tlc*KoGpiUy#12AM$p~~&1RKt>NXoDfs2G+=*|QLwQiG~C
zs*5p4&;@GVO*7A^h?6j=UkK(dR9{_MZbDEw*Abjw-W1FyqhQ_!pbvcQl$IliKlk`p
zCN-QY(HtQXc3n$i$6)YUfog?BMp?BrwWM{jL=Fmr7(y0VgCFTC)VMPQQEeA7KRF!T
zeNwojWa<S3YhVtFDaNOKSlRsYrt0ghqw1^WIR8UF&Dy{34=)yfYAl;a$A<@pnt!ht
zEFd`cM#{L$8f2YW3sV@O;6pWTynwop>aAc|B~h9s9|IUsLV_tpEu_U;s;r5;-jwuo
zH@~K0;iOEded7J{RM#;~Z9{*x_SODoaGV#;$Qaw|5Fuwp(`q^){13;XY0xB14?)0J
zc?bBAYa+=6$zD<6tVM}~qeKKc1PnU>j$=YPk`Pc1h&xedMFkgChA<YO=9=cRYPZ!r
z0s8JN5avepou;k;r<;ZDe73{>dVoEP=lWvyeFo3J`bRXL*E-d<hq3nG$FBVN;P7Dg
zXmQ~NT#Z*=c`bOl2pOh`>K5WB%Pwl*$2sa4W!80!R}tqW{7@ysoQG-$gW#DgMBEC7
zx`yN+m8cdBdz)Q254q%au^7Xcx*So^zoI#nK(}m=BU8@{<=IpZnrcZ`n_-kwSJVMH
zDk%Lg%zE_|MYpuKpcYn__PmZS7PpO+{`hG7WLIaMXNFToEo+;w;bF_DT8G@fri{O_
zk+<t(-ETe>WL<RX_ERPH%GAnsF(b^Wv!t1L(<MxrpnGBMX8{?#VQW*`>B{IGy^KEG
z%jiJW5x`F#he$?77h76J-$-Fn4hDqwf2_i)g%@fXA!|fTq64wlbqMEU{EzrEit`XA
zqJ)iz#bt_>s9SoO^o=@+3J?)KB|0pm=5UQ(YJ;6rg-VHwJ>FpGyg+#`-PlI(>&x5^
zkHRgV<Tr?#(!MXF*8<jC2-e?x)%)g}d~ezMl!5RIj*R{ZvsOSxzuQy&<AamK!^P*2
z@A~-3rbSh~_VK%%nl4YZJ*NL(#o2*gN_FzCUkYrWl#(aiK5AO{)Wx{PZ}P~WOFw`+
z{6FO{**m=6NmG8n`U+AwpW-FC`JJ(^FiJ?i!Zk`!i)e+RM1zVUDmU=6M6sbzBTJE?
zqywRdkjOO*4yI`<>I06;tmIN=kf=&&p=ySujxiFFz9gV~993Qs=>K$XVuIyFZ`2zy
z-#~}lnknwP#-<M41m0s6SC6GyS(^Id4QKQf4?DtP_3&;-h<?D+LoG0m!bjw!hxoGz
zM;=~WYWdLn98(RC?tdfn%&i9f&P5Z_EiDeJu{fQRl*_oacYOWpW?P5vHInkveCh>X
z@eCwqV{?77ceKB|ySV3iq-WPGR6n#t;@VUKwdfp*p&B2c6o`J2oWXK8$YE<F5MWy+
zt3zBJbtNvSx++5Y8;3P2(s+d`FPG2i1*aElx-ls14mDR$bT?vPIEEyk_Cm5#nsQLP
zU<g7TEmQJl?ariIA}#N`^t)p(_+$swPg^JWX+Dj7uLG(-H#RUQJ12)nJ1X);9kc^F
zs7;}~Pc4TGVM7-oooFR82lzk4v|Y)Nj`2DMz#IiGq`cH3fi$BlsiM$9g~QXd&!uVM
zx0Q^uDd)SBwJwLZJMpl7sgFQt->EG^5BoXDl>O|cz?aR|s|Ywh@Fjp>FjhVTrrHj9
zZtUVLL%s%ESW{;O6WK)vGILAIaS)K2=pilE<Qc;UGEaUMrxkx^!_d!evSl}oY?<{9
zuXx7PH;@&0XzUw|#uJ?D5dvKRd99%AO@wT$K_#Fo%dCbJ^NbaU)-re$h<7B_R$Ub>
zn1CtKDbg~vk0kB4Vks=u7D*FqjigdPg1~C{5b6!mPiA1b21qNgtNNnW`-Z^a`QvvC
zy8cFk=P%x&t-p)s(ca14_G0k_(shKnsQ-Z`K&2Tjlgm7$e3$lJWei+_$Y{$z1KDjU
zwJmuITNzKH+w(nw73=8`tm(f6GWJ@ZN`7=@wdB&HpK|WrmJ@7vMJT<~Q&^1%g;$Ym
zQH!f$qVb}zjp~MSeoj5>JVOB=vXenmFD8MBnQY&^39e`-joAwy1kPs2U?m7viC()Q
zymBiFKkyX0fwBL{U$c0NxxsJwql?A=^FH$9!~K0&Bz)xK`vC3J<Z+BOMp=(U<b=9g
z2L=wJA@B!o6)`63I2a<+nCafqkl;-piP@eoe5#&lA;6T+TLys+#Wn3~!fk5xr|otL
z!1<5<W~9#moImowO?S=oMt{coJ_5q;x+K7E@;-eH4*5P-H%!nyx2Py}8G~|kEvmLC
z>k=Wv5oJ=jJjc@xY5`lrq+ozfhz`Z`SW>rs0ldmAT@L-eiP;=&wzisX6qS_(eT#}}
z%N%xp_P_@HhK|u_fx!mpXWrC-sW-K9_8&L-zT0No1phO@ywxj+)7pEy?TFhu`-dlc
zx+2aQt9S*j7VK^XWWg+=D8ulj6aWh-!(P+YkdnjpK<=_O<`k|)LQP!ADkx!xbzKoR
z7KZ{+o3H4USMLcnNsVP00|%6JB&hZmnVB1&^qS!Sf$H2&^w18ouO~4+kMx|DfTlcK
zXWIzP?XhUm+UvZHjt-jjRYcPaAW?Uy_%z&_y4zH3Z&i*gHP|i#BVV>~le&GX%Otf@
z?l``_qTnj~<XPH(Kvp(0pebFZhkNwywGNtm>*)r~kM*+o;Np80!`k=wvbno`bab?M
z0A}!tz&u-$ke=~nN00q2VuaBfd`I~Cl^y+;K0iXpI6@bp@Ac*FDj}r)u_~BaQYJ~y
zY}#WwFcL%2+E%I{9oG-t&2!FU>AqxzZmE~86cIwtSc8!3Nk9JrgHZF#!M?rAAoNMB
z-M;0?(Ba|k!O3D<WlFA6l^t@>8U!}dY$^bDElry!B%;tp0zMELT*ip0a)y#&Q&+8I
zVM|2?nf8u#UKnOmBMXELF~?hPj28(~*CC{VQO=|%Y5T9KTNTth2d8}Y?Q7}-xA$@V
z-+ShujyA6UFxLK=ua1YiDB`QYejPeI<Pim@F^wc@vJMeSm6gPNQ9nNQpu=UOnkqSn
zF<xIz&zi%Z_hJ8@#3}D+53!v7^(2`8U{hQl&F;kfhYp_BS^O&2eqn;=(e|DxUrAB6
zs6Qc^?Yf$yV8sx<2;Pbav0mMYMt}uDmM>+E_^)9=6_puO0`WBG4E$myB77<$8{k|n
zmu7EoFa#b0*=17Bz)X?R0+$Xln)G64MlR@n07KQ961NJW6Fpo(H|v1wP0YUm*U^M2
zs*3~7AK&D9FU(f2KHQ@@B=ID%-{70VK{~G(L*B+6-VSSex#uFRqKz%SBQP`hdFox3
zx{W<H)=rDF3BNU)8>RF5k#xRzYcP_xdRaUf(?=fn@l=LWQM^|Kp~O|Kj4Nn-ClpcU
z)}WM(E$s&OH0_HbCcaS&o7>r=1vXe?V@DrAy85Lto~I}W#<S}FtB!zLbte<>oQZ(;
z>V8%m8L}uMik#8TVM$kUQ3GxeCrWj-vsUxmw5%dPRFtqCX=f29jkHa-dg@kc!gDi`
zCLJ{krhRxG1jC>J^D&G){m@=y#Pi2DmHuxSmHurK_^0k1{JO0IejnEUy>C2^kB^QI
zbQ!RTTN$-&jPy688bmBgmZ8i-rIi|~a)X?inobcCwU9(=qM&Wqh_bto_PHaq29E&F
z*<{RmvY)Q2B*rtF9_U^WgQ9m@lotZLL^t>ZW=xI!qf!%%WzxS@U#42wRn~0!t<t{D
z8mr4dQ_BSA2WQJwKmUOryK81&=q<fye&3ygu3OP0fS>J)rUp}Q7F!QmtU>DxXpr2B
zoE9ot6?N>X4HBsT$Dx{FrqMCQJS7^=IHM$WOsTqMjaf2M&-IjyF&6VPVf?NPE1DnN
z<cWe=Pn11t8YkU>wZH3+D%{)MJJ{2nD2nSW;sT(sIEQ3{!fA6vOrVvsis%|%wKQ%4
z)ew{in~non>ZPDr6%w*3fn>v};b+)pyPBI?|CkN368BZ_3N@9m=c&Ed%%(rW^WB>a
z=W^C?-tnxd;UtV-<&W3Xc&a_%ISghcM8&hRk+IBq3q@xYWUk{;$Fmq^GFDb7NCI1i
zRH*Gju%T)ncn6?cNN?9wB5Wt}l8iif9~7lyACh|CXI?t(!(*7ObiR9&bY4FzoiCq}
z&JRCpLM`s>9PAwJ>C)|)<gbQhfX$R;4$cfk$S#W-U<&4$)x2C?i1Qa$OW}&l+R9~O
zsSGE6u+0DP3x?QCjZ~T2QF7;+ouNau<HOII#+`)mO}=yvc&@<*!96l^ntaAk38@h6
z)5Mw>)&Uqtg36&Z1q3D`<M<k1usNnKN%lOh*3!fo&Ntrv?$e3$pF~nPRXjJe@~;j?
zAHs?qI(M!amNdkja(gPDFW*@#zTHdMcMpyaw)Jq*hHPaCI#=Xy@L@(nXd<c-H<`%e
zNQj7bHX%4_06Fc)qpdl_YtRszGHhXe8#pB(E6y@fWgN(_4j}a_Gd<^e>fyk>8+T`a
zZ(?4hpAmQ7vN7(wbp+^_40ylL1Nxf9tDUu1`wD-2vU{+*NNJV`3$skX7K)<lXy;Re
zBAWzPo>k=aLzagb(2ZeGSu79<Tw!YNBQ*0e1nPdDA^XVc(Q_Xe-Tl4}ZQQ$mudkk0
zaQEl1>8F2Kb)vgZ*4+Iy?%mrcd3$FXCBGhP@Ap1?q{PSDdOHDXgfgbt38<3s5Or5p
zL#tRPA~?^mt}?hv-ee(u4h3AOj9d5v1bJmvf*V74m)FQzqA*gn-Xs?)Xw`$7EN8E&
zgym8%1$w+zpu=M`2TNT*om{3*KClBE&5&T#XsBlVs{NjcsmdtbQT&1$kCz(GKe3Y=
zzZXzGYh9Fl{U}PdRN}D%=DQ7;_hRjz`e5!K9_$@2F4BHC0+Ou<Qmz3~C?GrDWmlE_
zn)^-ZsOA@pFQ2ap1IiEhHi@sXNxXNO_wG!|1%mfacv@+oR8I3s(Box?45dJNg0(R1
zxv#$UQs1G|=ITedUzLZGycjM2z$P1i!>rSM`OY*5AxHY|$2~j;rzsmaN-+I(9!g%J
z9Lhvif;a#~O^&EI5=9iFI^1=<Dw;@SS;#9a(~J-l4p1Y-8tu>w>x(+v2u!%|6pSo~
zV4Cafji$30tiz``#XCm7)sQu}oRO@t^S55_2<W#h24nhCUqH3td?(z-d4Rl-1iD30
z<^E*E60?VDspkUfUIUx=_Z`D#a<`pTp>5-ka+nuW!_@?O?*8}s{`2E2dDTlFx0+h+
zPQ$UkxjPw*fZXeN=U9igxJMMuGt^JYD9hm}QC`Q7S(Xdh1|7yowU}fP2&ynHWf+Bc
zL4eglY$j_EfibVLtce;Pjk?!ukm-f4zNXGbtb<q#)QlULrOryAIcd#O>wq@V9>nrz
zrwVUxolO`1=I&He3FF8UEC<i9W}r*pLl{x1uxcGOu*1Mbu_#*N3~CUlVOGXn-a!aK
zz0&Y9$s1x0D6mDn5BY8s#uo}LTV@u%-eRGM{lI$VbeZ(geB$kJ8DQ;~N7JTGmDo)`
zlJp#SKFiKc@9})iqFW4WoiCqzJIBW-i;Fi>|3Vb1Mr6eOH1Umm3D{L8T1i_e!4Obv
zhZGkYoP{NACgpKg*N{`hL;^9OifUHf>N?}5K~CJ}j<oQGtvYXB*+`Wr=leZAWH+v{
zZ8we~GffIEWN-CF@ajS~1|%Kb%({m&Yd<JX<BR76(n{G=0onJggY4Q7WVS=}<Ga&R
z-5=kZY=Yd`JK6<mih??uVggBl8c1E2S(LF1@nX{m+QZkuU#y7z&KHzSnzi&J88kfT
zgLBQen&PL$(HI5tu3H;-nmuL2LD@+9zdCT*NcuCoQ-*YJfBOJQ8AZ}p2T7XSu|ATX
zQMq*H**vr^?dM#Zc$Yb2-pm7%`NQXnq(3v`PYB*u`O2)JRQ}|uz{u-1t}fU?ZmP_y
zU=7?Goj!9$!fbe~>SNxpYBl|b&lgFPR{MC8_Ivwtt){CPnPlH&NFBG-Rf@VM{?F<R
zGy>utWW%tQtY+YxTQ2LasaSzaIj+hZIB^!E2!(7L#<RTa&_BJ(O>ffpnN1MRyN`*J
zTK}+-^w!q-*!x%QO-*NuwSVF1`u@)D@!p9&+2o2)M7^^FlOKauO;0^>$0D*^d&Ssx
zn7&UV)2S&sr-f}_w1lXh+ifj`63_RaZM4?&w+7^_c-nR*Lik0Vx*IgN)gZ#CKshc1
z6#)vLikVFf@)O15wgatSNCey9nZar?3ecJCx(JkhEkIttnTs509`b3yF6s=~rf`Xx
z;;oC2w%M!b<u)ja`C{7$(Zm7rG!It;tfIevQ#aGiv)xP=_J_dHD*D}6`%k`H(o7m1
zh=XUyc~du%wv&b+2}%5lwksNxJs@tPgAIj+WGF5dMVMzDZ{s!w`H*F-pbg|jEpavE
zQuEBg>2B8il&-tXHK~DPhXY&l3`|`*>8T#`X^jXT;Q67mWz5#6-`(T+uHL5;#z($n
zYCOrOcc^;Q5@iqMGYy23=T#ZyC~IU*R&}h3!nTpf4}o}9cOpi)ByVU!e^b>NOaqSv
zux%R_>AOf&DaVMPzVW5bgM)D=t0~*97!wqjD|E-~nfF7elkS#EG(WV-r{6s5(|_w=
znsa{<Yaj5>g4#PiIn-=UR<JrNb2=H3NSz8=tcKeH*DQlJmm$)|He%|;18QnyLbf3d
zn}?NUWHI9q0$U0p`<&a^@vZ+Prb|0b3O|H0(;DaC)UviBUGL4R&WRFjSNiN|>9nQ*
z&mZ-BMQ-<Yj7K9?w=RgV(+ST|Kqidu_m6|pc&he>D~2(uAeR+Pr`|!S=Gz5kw^6dR
z^Qg%^1xr(>>o<n0FE-5%tV|B$b_|Be;L2B82WQJcADbG(`3#`>qw8GIO|w1jZ#kUq
z82BjGerIx0+3wDXMw2m76hag<iMggBt@N~_2|6fp3ZinM?O9iZ$RVj&49d>SOvdV*
zvbII7IYh+SF<Fe~G+(`0dUKD&hjnElKxqEv*@kY|3eEcu2VJ)nn#B>~O&`s}9mJfv
zLPNfXCX#TJGhr_2oCs<O>kxi+5fddHwZ&-bkI+)t)_^X9FV7*6AlT|yf|x|cewJ;q
z)9aGiIDY=u4TXCrMsS;abh9!QaVE2h=B63AVBXOi&W|2W%L*i}w@uD{Q=+Lll+Otr
zY7s(a>c|rg@s4;G=@U(qf~!r4`lQsQZaYv!u9`2D^h}7^d7<ZF>8F7{6fQ}xpnu2x
zFpR4I<|d#1HM6lh38LS5Ljrrj$5RWZirvo$T2{l*(tx8T%R1_;N92z5GnWz5yEn}l
zm3j<BYI_p=<Y<D`O3&>#d-?A=1)9IUN!6b;YdCK_nHFO|fVD%9L^|5uKRDKtOG<={
zsA*Apo;9GwBNSaw$ddFkHH3zF0p}!9geiF|XmAm48Xh8Gq9Pgoqv1Ox=wr>KFAV@Q
zcC(L#hEb7vX{j#`dVkap%@}QhPt>`NCXUn%in=qmU(lrt%@(K%hMruq+&H;pIns4Y
z$$j8t+OtRqf7DCV4VucSXEhX?cuJJ=K~k0+m5PS?zoQJO!U@@<p&+c1dYd{Hq9)P7
zB}FCDCSp8UC88`=|9ai`ru3aa(hcou=uJw=ky3jT_}C%D>gr_J&6f72ZxGGYr#}3h
z^iFi&|8jef(ejLkFFeD*`E}18{E%1-YhvLG=gHpT(c%I~;hNGN3FiZnw?RS)40Az!
z1K>TS%$Y=B)YVN`^O$NqFbo+hktD^Gha!cV^FC9_+~niekQ`G#?Oo-MvZNiTUj$mk
zk9_4hGuy7dI0DOTyK=#L+gE(qyP9APW>1BbL1Bk-+L~%=yol+Tb?OKDy>`4^VnTR-
zW7aH+sblyEtUK0#wdI0k&E8w@njwI&_FiAIj*d}s&?E2ih-#!zH5yb&vnpf*HoeIT
zkU-=;Xs`#5I&#w?9q9`-Wldg%bS_Fn0CsWLq6k^eot<#DjLxHb3U`;JsTB5mKN9=h
zDOgy8H+U?)@%`;*)6<u14aiY0bFITq62h<cMNgwihbPd)rhDiZ!m9)+XyA*({xj2C
zxf-~1Rtr;q-6S(B!_f^@qFzRifa}}W9p&TZ+`~~m50oUOUq6!4U%}c(Jt;jo-agpT
zyEB?FCVs4r@lAER$Dkdlm$;%K)lm^st0B!Vrc+U95~e_YSRu>=-<F}4mNTL>K(F!m
z3~~7BA1jUDs0Y-JQArGji5Z`)XSDtN+4e!(K>SyFGzTdqgbzI_HE62MLk*o9SW>$k
zox(_`SR*Wlmj)k+WRQ~X7-EVnys08iqe8)3#&O0Wy>td<&4tJzsz}E#Pj0%eUOD}i
z>0sW>rS9G>AHcKjzVaScI)8psAb!gz5Vu+4zpMu7w4viKRue{OcYAN|a9i(oA^JQ-
zR87s>vd%>X<>gT>dDT`h0YYSz#2Tv#AP@ByT^ENnksLyHIrJPtE9hiqI~I6!|6?$c
zx##{{hTc8%8-{lcy6=n#q0X2zFrQbD^aos<w*kxtHbstX6ggT|C*W?~{`yf?LjZep
zkOfn1#JW-pBTi@bgLEfVPt1sNO#vrr+uG7e2<=hnY2cZ?B<;yvm!UJeEiLM!k7#aK
z;pju@MKcGQ53E!58%C;b3uce>qB-#BZ@#oxY;Sp@N#{&$n<Fr<97b|tSRciZx&^*J
z(u@uiHTRv^ZD)FHpq}NG8Y(;;aOzK4r&HVQaAo~^qM)Y}*Hit$CAuxBABzy!O>uNR
zBJn3}ooz!wKibOy)arTyK!4KL^~3#>oo%J-rv@`y3np?(I}64V5GKdd16+#n220(q
zX0_C)xRy_~^-{I#((8d9+Nd>KouW=+Z(vLZLWG{D`rH;Z3RX{l#2TdY2RDrXcO6mM
z>pB5v!L+(AvGxTI%$?nXgA?u6SyPrYz&d11f~Eio$(op4jL5sJ=s4KRk_H1~y`^3@
zI)+M=a86xI8!bEFo8gURSwv@a>%*nV#w$iY_7du@j$NPL#A%82|9LjD^2sim%5uJI
z@#e)KuJ`(u(}?C16q?N$(2NE&FRL^816|Qc^ia#xZerCBC0V+>l@23cGzV8K)yTud
zUY+Q4eQ70CpWCC4II|{Ik2$;@;QXLFvFTx8^Wx<@=EfX8Yz6Jjy`T+ep^$0(*yA2F
z+6=10P}a&$OjU#i>nJmXC^B>icS6bw5r-pmM<i5AsIQ<1=ptA~H7J5cVJ>yWIY^Si
z+19Hz;s?5*qFqa(wN@$n(&W)5anUCy4Kk#A^3y?_T$4NVu$y~lY~cE_v<F**idr<K
zV*{V)37S#c%Sf;@q$^yaaL}NNz`DGPqcW>v#z5n6S#}{PNml0+jf0(p%yqytkC}iF
zhRRHeOz*bUo0!e`P?~b0Bay)DQFR{4#u~S;up6y`^~Mtoy|6|@FZBHZf6xQ>VBa5b
zxn3-u;(6np<Aa@@9n~M8MF<2q>nhZ@t2UJ68L}dSD2hnLbkJEGW^pJ4UdtLdj0l1l
zMncpr2(wP+Z5?tmWYY{AwT^Vy0Z!kRW(=4*03y)yifB}?9$chnLj-!}iAtFhJwPlm
zN~&l3nUOQL8(PhcNc8rP)c9lgW&O7T$uIhc8%2cV_ka8qcg^hLje5`VgnF6~J`-#I
z%J&=xJI8z0b8G@J2Z(Ylh;3cSCigc~1slcY4A)vg8@<)}+su*2iF1hxbyPz5Gw$A*
zGwg}3r%~b?u=X21nn#Cw+j>v*l>x|MG@CApI+)85)3}NmOuc3Ig86Ny&UJ|`Wd-(P
zOUb1}4|U?aMPCFw|I7J?Nj$U1bI4&xV(<11<RG0<1>xi&ssk0W=Q7GlMkcY1Vfob2
zCL!E&lgANjT3(cqY>Em-l2<UDsxkvlj+$))(9<+^IRx)CSM}V_M$YsY`<PQ)Z>{Ff
zoUJlrXB1yDXkP0?i9dz4Ydx#EyM3^`d$M@vjha;Z7zy^dg;Lhw)L5t{qSFB#r7NU?
zi-dW$l7jFzh>yN43p@3C^6s9tOJ|F_dTs7>j;F6_yLkH6uNwqk=;-`C&f2}c&L2{D
z>SA#{?b|;clqyTGai0!MObC+(&XA_1)$E|3-mM3ErutiwnBm~K$-DjCF%PWFelI=n
zY4;Y7bewXpv-Xp|8tPk2&v1*m`Zp$K{Exo1#2k@f`$0@vR@mJ_hV?g2lWt56%UG&5
z6Qv-NyZo`0?eCX<%%0c#EsiVvVRM%cVeRwY2Z`I<+eW!VZOPuK2<40-AP{3=j2JD0
z2gNJl1+{spKVq{FD%3<)l_#2gqCsEsP-y}+*d%@Ss&YwHp;JK!#T|h{ELFOpd&cy@
z8#T#&sUS9e@2Sc{=<lQ7N}}?JuMdp$;k8fs^mAr>AnR@R8+sUriY58QKjaIG_JL}S
z#i>oQU^Mt$8xmDjq9Fmux*93qJ3Gmop>TtlG4qfULxWH{KOXQg>n-QZkTFl`G@MxV
z9&5Az%lWG5muEa6MZceY+(T1Gx{9zr6^LRDhzLhI{FJq+CFc~Ap;E123Hb=z+tYKD
zOul6}qQQ#mV^WMRCuXRynhmdbkW>qD-wu{ow~!>xhtgBCOM~VYH;q$rr>Esx-?%k+
zJM967^p(8gA9!dQ5BQKd2;`l}@I4XqH5jE2$><1f(MA<b-fbC2RVmC=E*4i!En=`y
zxs0NVse;lnPIflO(2Z!un^HoMm<=oa=>3tTrTqxrbi6!V8u(FJF`8bJx*|GpfKOX<
zJk`_sWb7eFL~Sxgw)7TH;f>v-WNfn@_afM3Otdkv0-`R6>o4hSgscz|>}z+?_QDc{
zw$u?CrA*6oX*cSs9)&<{Y*f;f=$_bg*k`Iqxn5v(il#47BAsZjyHrbV1E}7)ujSRq
zxJL4kStP&z<Ns}@-j3HS$-I5{;DP#~HCX$@et@>Wx4pY(M^BCsDa#P`PL6^Ss&x$h
z7XdJ8@)f5IYqBn>38W1pb%<IU#Wkk@83ny2TiGUe-u}aM@}x)DZYr2;i!(%j(szz3
zk~uu%{$|@f^WnRv){{W~l&|j^P?bM`q++Zkj57Fd=y%jglrcpJCQ6`K$LU1mI73ym
z&IFeg3%j;saU22jGK;0ECC9BSLpw65NbGTkCXLefpSZb%(eim0GWYE~56f4DXFNL1
z%ID7Wr}FthtUb+_&!g?_Bi(p70Y&Cf6Vm`C)QoBGU<sv%l?2Ez1rnrb+=Ok^)iPqJ
z8E17yUvs2y@+?DZ&Wh3md2gQYFfYOWY&gY20eb7!`L;<v=lN5<;JdN*%YMjtaCC62
zt4db~OT+^qJM(cdV|?a#gmik%_|?F|HhMiN`g%29I?WmK^Rs5Z`}xz@lQ4dk?}{{@
zD)yvV8zqMej~OaPVJ%q3>qcZPs_(I?!ABfP48nL-I*PiG0(pWE+<t}fQdmc<r23sN
zIb$*=_3n8Nr>p$1j{PH>)cuLG>i)s!Pum;F1boJ~o{qYo9<4Y+h{Rb5@uctxq(AIL
z*fk>OArdKZc>pgBoU}+iOY}f{Lr7_t?ryzyXq(kU;ig`jRd}94-Jd+`j$Z$QDcSlM
z)?&|o?(H5QZSO1b%(_z6ye+%7tZR6tN^r!HRVQna!w%%o`$_=8R%8g9!W^{}-au$+
zn-(39l_Sy;@LaOJ;@LVbvFBhiT}uaPAAFoo_1I4oaNf9ezV@5{`huzTESkmQzj@Yk
zZ-4jbQ15NaY6%z4=-4&EqL@huUoMfI!`tT-uVa=m8IlZ(Ji?_>2;0yiszN!hD}*wu
zi02y-cjq~BGEL8^?c)<O(xJI$5OI4v2lV}uXY)o@JRf_(G+H8&{U4sbyW;7{SJ}|u
zzA%r998qWs=?vj)7{VyE!?KQ${nS*IuNtO|ZHA;Nq%B9Z&9F_KRvI@k*Qt8aPyHJL
z_0&d>snhvnUb_n03aqx@xOKigFTeMKX@|o-5YNYa@jTo)Jl@ld@K*uHw(TN&c)sZ}
z7(yA-o1P3%xv9Lnsd6|+UblabE^}koL>H&*$EDIQ^ZS?j@9t0gBH8?nv-QDS&rW*n
zdv?<Q-1F>4*Ht&u>GT*L3;qYLh?juA5E8{`hT;LMQBQ><WAIS}Ev-%+Nz<7KXv8m{
zS;;pON#mD|XaD3T(Hu{ax4T&HanQ88SV>%$KAKuIX&39caEqxI6j5nhqEKCAv^|D1
zB<h8p*{Rx#oKli*p((WFXyFZ#z|nWxd}El>gn|IIEj!SglKLI6_ApHQ7zTX|y9^BX
zVxjzvO}kjb(JoeNH2;1xZM}KX3m1#$c|LuA=iqop8%+>NF#@7is0mRWN>ZYZcv&IX
z1Y?J2iNz7MeS%kNGhWF|9mxUk>a1n0kTpZ{9sx-h&2#vmm9HclRy|kmD}iQQ_3**_
zuGwfbm$Q85KbnE_%UJtu-)J5m92hv)3R_bmE~mi>6<mEpJJlh!tOC##qw_=&UUiJ-
zO9_@YQgcbm(QU4L^lgiZ`&{aIjry(qKHy~RlpKmlTU&?aRNgE|%_B$#auCA#AI^8|
z)vwN=`C<s?6DPuXw0)@gkMqFBN;-&-HW#U>0kuE?3UV(US<2C4*4#NkzRkxI<MY#E
z<tH*6>D2d36eH>E_Z90iK7i&I&bBGRqWP&AG_MDmAD+nO@yW54%`>WVK_w#ao^6=1
z8c{vMc!cy>ScN$otNPjjKEqPF{`Da@oenq)oS)K>rmHW?P$wjt@7siCIE&_&XFNJ-
zKl6Ripy}t9#0X?EZ&=rcO|JIFMW)9t>0_b57KD<-2(6Q_rw*8X2c8RV>aV2U_x`bp
zOjXgEujzA@jhjZ54I6Bd9QuuUB(IkyXe!xM2SuDu4$HO^w5g`e)TFcw-124RMVE%i
z>8X#vOw6~`*$5A{OEBs0-M*&I*s^EVt&WOL86LfFQ#<YTqjp-`C~?;dr;QT#zi=|l
z2dpveeOp|<PM3)ur321SA%X*NYK%%1wjki+rr{v&^GFq%WLPy<lCNJOrDXbg%&-dc
z3sXB-_}zZ1S*#l+?w>n;Bkt*>7dawo({+lT{;ii@nn+0>7_m{25zup*tAaXF)<psw
ziKx0ZE2!d*Ftq>+!s|%Li&nbEj;6;eJ%4gOF`cJ<n@KRovC`aTt_!o9xnBIjsRbpE
z|Ah&ly?3A0=fc<LwPJ~At3$o0O9wc0|AsY<6vQu$?3yF^oeX`1(hyN@#BKh9<tTsE
zO8*64JXPPq^M3oA@0vlA^;-__@0ETKPcj5A@+_!P`iHL3O8-+gAk(}OHMe+%qJ`<b
zY~sJdrkVX{!g)wGJNV%dY`AvSXq$BABo4y$Hc#zcV|rUJrSEvrG-~?aSo?j?m>%rz
z*mF%dRWl-~4mF_zcq$Y!5tgEyUc+a>yq0v546>&vYY+ltG89qU;Z1aWMA-pSScOnF
zQf-XYy+KQJ*TYtiz;x4_X6L05UIT?{FbayopI#LNLFzU`n<=M(lKVdXSC`?hU7AlW
zvHMC?m71caf8wHf9nidv(EP&BJ#yE~2F~kxH2?RDraDguU++n&gC^=3jcg-2@`ZsT
zE;~`Ojy7_NkPeAJ(kaM@(gs9XS1_5>tsmwwl{qP&qP>Pp)=g7K)aQ`|x)~`k#Juj^
z&?R=-^|CbBs4YlA+IQ%yu5%S7=ps7(F0=iuFu<p#e@$c6Z(1<hzHErPbY#`a<^;`~
zJv7~3rJJ{EgXSk*GRWp%w^7qeCKEb$54R6?4i3~d<E8~-l0tc~AiQf}HaUYA;=C?%
zlrJIy)WE!>7>^24hLSM}B<=6yi2lnmZUr6T5nrXvuq!hH8;QC7l;-d2C;q4tf>ju-
zjz^n2ZKqh184aH3Q!klHC1Lw2Pbv+Lik)kbSwgELS<4({4Uu(VF6)Sf((tgXX|GHU
zLn&xzfD&A!_Z20hbOcYBfp;!zj+j-}Mra0VxUbfxBOeot@ko7dv=Z90K3t3W^waM(
zv~`X4{%M*0r|+N4?Atv&*giheXA&}oVlX8{00`wYI?}d;r^x{&ZOaBrS4ljlbQENL
zCOg4IS<}=n<WnIlcur^U@H$*KEon7QXRf3bqYq}hmZUu)=)MFzAKFycxG-B4{^|Rt
zbq&J!$d^uyCrPKEjoBpxlGA~}G;S_tEh861HW8i&$y3-?ZB|#Pah9M(D`cSp|A#Wl
z%8pS;z^g`x%DU;qv0CmOC#$T$v^lz1xpJ0=mClFG){$u2y<YOt!DiZauW!TJkNeVj
zczlc+`!?B33E#oP5VntpvQv}2!VWQ#Y(q|S&!|r>qXH%uyd>?0vy!~xun?%n)md3)
z062;+UlpTy`d^?22Iw#>8Siqi!$n9g-+Q)`hOJz_{H4=!@Pk-;#7FXIfA>UN&74kd
z;}Q8|nkok-z5zfPVkL$;JM}BX4SA@J+zcI9)^<5uRRq+lIHsKu2%sXG-QZKlBT$xW
zK^grtQ)y4ewl%dYJ-umm#rChK10@PHfB9_XHjC!By>#k>NLK%(ujN`Ub+6Y|z_I99
z9>w-BerwbcJCD~6OBsRZ+dP*yy7HJn|01L1@7;vv4YL)Z2YWQ{>nlVguebYXYS~m3
zqB8*o7OyNNr*U*y&hS>6cHCs}Tq5!jvJG?kt4%C*jx=--CZlgx5l{=|FK?<41)~bl
z^9+=q{nLY*|IlJs`yo%wkM_3rj&~Q2(zctHG5Bw4n#q6*vP;nwpgCApa{M1kp0{;T
zXN72p{%Auw_pzfHE3C;7lvZ&_J1<sQ%it9E?3{|XV)Lgli>cm@*v~W7vyzedI489z
z8XpM$H6OvJtx@f#jS)1XEL;S?(-X$80)mO2qhR>iB(;Sab<kE}Us)9a`9LQYxKoCl
z_UE9N!EGS;*VZ9;(+ELJ4}Xae+#&?Ov8RWBA8X$}L2!TnNYlen&IpEvu?hr+K%9vG
z1#4SgR4DVJ9!$hW#&SfR6-}z?+K7-w#@NP))T0(5I_CiOLTaW}=uH@7syyi@ACBe9
z4(<wu#sr-ZI-t5GieOlnf;;XX(~S`T^VtAXuIoj3;;0wF+Q#4S75aU>Z6tu-JyB={
zCfUX-62cBq5hsd6$VwH}Tm<Z#!%;LnW_p4iAG>waUL4p$o3j%>7z&$-IdUn{{n%<a
zrQVsPQmG~Je3ZWfO1a54#v|Kk&n^4|$2Pv&S$p*q%HyLwb#CFAk-R#I_PPh||79Ca
zZFJOfIJNN)9J}V;{d=ccdG5PE<t?O})`Qtsja7OI?*3(K?*2OW?)$N;FQ10Lm$3G0
z{%3!*dwg<ah`~Zup+HDMr{`xG@l=hPtXeX#s$B@$5aC>uhb3K+WxxW#oy206M|63q
z;ta4dr;>^*pS(PiyczYg`ZoGkoTdPH>!<)|>FY}dO?P4*A^df|ENL_sW@6qOX<{C7
zS>Tg_SOLWnA-P2Yh6ENsW;qh!801w$UAmHJL`M4`$m)kt7H3F@!EwnlW_12}IHX6@
z^Rs<K))n<P;yL@a<?K6*j81U%r>Z|`R13Vp;QHiO4C(Ic7Q@=>d?`IXIyg97JOCZ7
zI!@a`3NWgQkrq*%BQgfIG(eAM;0Hi1$O^n_#@i0`tf=a9S*f%0E_!R{H~8Z~KME+`
ze9cp5xaUhB9U-x+6D${r_xKWag-E!9qrKA5fH=b$ozJvesn~?XmW#x*ZK>#62AO#K
zVpw~-Clkke$d3*eKSf82p)wTGSr4?$Eh{)mKQsUk3=%0<MNCN3;H8c+<V~o;cM;Vx
zYF@@2l(7`GEZbqa@#t1IVUQGoy=WEZ=?uCw3wqK+>QKgZ$Cg%GVAuIR|F$)4+qZ3v
z&@<zeT=d@T$<ftE&vpi;YKECTH(cLnF$~~#B2>Q#W|%&e%!=MAPIIz*)V~hsb?eZZ
zFQ@(VE2myFe$`^}yT0rlgUi%yGXY2mxAtvLo-hnc&>LCQ@w@}4j`Ad{!>~dgj)=Ys
zQ8-6EvJF|S4!(kWji_lB{5mTmC~wOIk7kM%=jpRPg~}%W8xVsoQ$03*h~std4x#zk
zvmMc9sdhrn-{L|D-|WkxMpK=GMTa)gv9eiM!y}3=q#eGfqtW4?5e=MeIL*~Z4H7In
z5CqRjvt_BJ21YN7cnR*EXEKW~rfD&?`i}gisqZ9D*Y~yNvEgcK7e9M8s@?W@z#vYx
zWc&rJ-QqPIo*e9*?CA|{4Xl4$sxyv31!W>EpxM;bMYDLr79z{KGJ-gwWLZH>fpuJV
zJZgDMfr%^%@Jd-!ioCpXdN`e5oBy45<eF--Pwb2VyN^Mg@tqj?yl=5_%g}8jKy8Kc
z8(%Z1yJg#d@->rVNOn(-cIi-;Z51S!h|5s9ZNr-O3+0mbXYjVd^C~P@g#W{eXFTVk
zD5<#Ki5hWrB&jn&;_3=@wlEwWOPdBR8{}YRb8-}Zm&wwUZ}kR#wJjhW-m)|_K%fsv
zhb}MmTQzDys$S4ZX%H+^wN3*rkbdHq+ZyHQxS{LKc<%RHdg60u2G@PD*&2k<?|3I+
z{k+#JV+50Oc2BrO7Uyl$!oSq4=5(4yMJI8$C`=$Oj^hILUC1U1zb*LB7<M!gRn)dw
zT#s=ZLf--2V6;9FMhjx<|BAxJcZGaT9=7jjDwBTF{xzed{kdC%{8^*<gNs2<Z}W_%
zk;#v#lT~K`V=DpcWX5HKRF8KwFj<+~Mn;#Hff5okv6F^T^kTVOs#y}6!qCs`@QJ|8
zThqjbo|lrC7DJb#42A=O17`%@(sYU}i-AcpR`NrBON=^V|LOnW&&{<R+_H6}0q~D~
z>tH4SyaDhc*8cohsrR;bj*fSaRf`crq@0&UM29(4O$Bs93Wrd^ZK6``aA{fF$f88N
z9XC+}&m-;F*ky6<Fxi5$?ic76fW|9*4&>pLY}%!VZB{||ZKFY(RS-UZV=PP-3(6{R
z(-Dv;BEjPVWrmJsnb7%ju~5E4z_}Js(-39DS`;!AfFpH>2`BCthjgf6Db9O5i+HSb
z8$hyhL9yK{T_irSdj8NSY+?0H-#RU<(%QRxso34!-P1!2*L?Pn5*T#YeUz|^u3=C+
zz>x7cXOx1$T9t@Ya8U=RAkOSL(^IVUkCk&?)A3K^{Zn(N2i(496UpscBP5MXxk$d=
zm#MD?lFr<r;pTN6Ih~<IW-{7SJZ3a<Ie%d~jJ<yQIwWrxA!!-PFaKFBlh;5dzr|Vm
zCSN8Gj<$DERJ~ra;F+3&TV)Z#-xwi9L;Wcs;FK-rHM0D!%9@bN7$kQm`Z0p@&|K+5
zF&;1QZPQPTZ{H0Xeqwa@dvy2j8r}Ur^5y8t?!Ne{J?y7Xcb~1f`)l00FRT`Q+YoyG
z<;Ae}Ht)0F*@dUyUOY-;kP(_Ay9CWjb0?rDSqr_YB%O$b0KYCPS!5MR>a6ZiPHH4n
zhDC9U3v1R@MhvVZuVHy?HN!i+uJ4$&+Xc;eTJ&pGTk>-JI}?PUYAzV!2*T+tzLb1<
z<xZAMU%uVN$*}lub8r%#lo*^ocjY);pvcE=hYoskV<+XM<1Krpf~Et--k*6VhiO*Y
zjyD_IlGKfp0xePdK<X_nQjf2$cRX&ni=TYml)LyO*8YtzQOEm-woOp9B5ot(({U(5
zAvs8VghFK%=3!TsvIG-UMhrzqyaxR0jF(_8fE4e#n4?6(K~u+FC+8b3*OO<X2s&|=
zQYZUKN4*EBL#1i-4HupkDx<LkQ=qm)tJ>Mmoo}n;Cto*hYa@h9U+xAp85|oObY91-
zW0YxU9V(3QXmD;-)Uhz*NO)ue`4a#gWt66AggUJx790dy9OhtaC9gvrSQ42=Wk8C_
zm#GS>)U>!jWpzn^;%DMwsw42!KfcN1l}=q9@LGX4qIu&8&80!}<PAXcb}gIVWzhT;
ztbNv#&E5Tz!`<D*MWw}E)X)jjQi6%d+L8dJ-ZdHs1?kL@M7NBN`;77e1W}FBQ<?L;
zl%^})G@>gxP_<T?>bcZOhI%~>Y{rr3erT1+=v-Vu9GFZ+ohXp#oG39b+Hs(&V?9^`
znpgRm^9O;<1K+>>+?lZSgTLm0aO>A;F?;c10P889m>FO`cHLr8FzTtQ+ZqVek~%O#
zP!A1tOGgC<^N>Zz_AaP@q8f4+=yv&`+umehW`e2nLF>N#-X^Jc*-w6Sd)JotU3x2S
z?=AoOYv*q7tyYVk`W;i7^fs*hp8vTY?QL)C5yM>t=C90U$lAOuYusresZNGsnW*9x
zMU9AzI5M>y`Ud+`79mBBu;sNty@L*W#bwBN#fq@oo6zR~vm7afqa%s^qCS0lTtK1m
zA(O6tkpf#Qp+1N+)e5ZOB$ZPqNdwEr&bCF?@~JQTjw%25cd+*U37SXy`v-c0-KoKn
zbe;;2K?c@2&A=&nn+wqYbdqL7n;0*ogE#c?oB-y%Vni(MGdXM6gYkQNYn9rKBjy|t
zvtKS9{T=#5I{!@PIr`7F@xyWBZ~KmE8y-pQ6MW^<(n+*c$!dVtltn{RmFSq)26h~I
zizJL`JZ)WNWz1=g2%Y0a-L_>>fr#m9Vq8i@AG`}imZybbTg>!Xb9xqyS1gu_jY||S
zuU=D4Q2H!!HSf+0ATCTBi8;E$%LV<JSgG5_<9~)$%^&mBTu-{+e$o5yn%Qq0ZGFWN
z!&^^uQux}%Ag(W+K&eFY9wJ@F*oitU6n4c*>@MnqV)T(Pbh3idKgD<IQN`3R6PUId
zW4tv^rqwtw=dmvXAB5$Af-g_k7*qbY_x|2pbEe$dy5IfIsjYkBn-+_o@U(Mp?|An>
z4^hdeza23s_+wtt-hqyq+j$QAjvJ=#ud<3ej(P^-)QX2lpsN`5qAaeUvXu~sGNddJ
zniW?}&U^CUeP8()@AlOH3LS*eS77<z`D#amoL@U4gkRzbltGhhU5+>&p+Zfk2ZQ`X
zWh0BiHtXVyPRwgVp4CVScm|+zl;3K&lA0rXj2hX}BnEJxB_M53{kS}`V#(k!oSF5T
z(Oz*ynih%wdOF<;H+fe&kLR6hDw}t1{eRFwe8D&I001A02m}BC000301^_}s0stET
N0{{R300000007o0qmlps

literal 0
HcmV?d00001

diff --git a/src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.sam b/src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.sam
new file mode 100644
index 00000000..e9487b8b
--- /dev/null
+++ b/src/umi_tools/umi_tools_prepareforrsem/test_data/test_dedup.sam
@@ -0,0 +1,201 @@
+@HD	VN:1.6	SO:coordinate
+@SQ	SN:MT192765.1	LN:29829
+@RG	ID:1	LB:lib1	PL:ILLUMINA	SM:test	PU:barcode1
+@PG	ID:minimap2	PN:minimap2	VN:2.17-r941	CL:minimap2 -ax sr tests/data/fasta/sarscov2/GCA_011545545.1_ASM1154554v1_genomic.fna tests/data/fastq/dna/sarscov2_1.fastq.gz tests/data/fastq/dna/sarscov2_2.fastq.gz
+@PG	ID:samtools	PN:samtools	PP:minimap2	VN:1.11	CL:samtools view -Sb sarscov2_aln.sam
+@PG	ID:samtools.1	PN:samtools	PP:samtools	VN:1.11	CL:samtools sort -o sarscov2_paired_aln.sorted.bam sarscov2_paired_aln.bam
+@PG	ID:samtools.2	PN:samtools	PP:samtools.1	VN:1.20	CL:samtools view -h test_data/test_dedup.bam
+ERR5069949.29668	163	MT192765.1	121	60	150M	=	267	235	TATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCTTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTT	AAA/E/EEEEEEEEEAEEEEEEEEE/</E/E/EE<E/EEAAEA/E/EE//EA/EEEEEA/AEEE/EEEEE/E/EA/EE/EEE<E/E///E<AEE<<EEE/<EEEAA///AE/6A///A/AE/EAEE</EAEAE///AA/EEAEE/AAEAA	s1:i:173	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.29668	83	MT192765.1	267	60	89M	=	121	-235	CCTTGTCCCTGGTTACAACTAGAAACCACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAG	E////6/E/EE/EE/<<///6EEE/////<AAA<A<A6AE/E/AE6A/EAEEEAEEEAEEEEEA/AEAE<EEEAEEE////6EEAA/AA	s1:i:173	s2:i:0	RG:Z:1	NM:i:3	AS:i:148	de:f:0.0337	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:148
+ERR5069949.114870	99	MT192765.1	643	60	150M	=	748	255	AAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAAC	AAAAA/EAEEEEEEAEEEEEEEEE/EEEAEEEEEEEEEEEEEEEEAEAEAEEAEEEEEEEEEEEEEEEEAEAA<EE</AAEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEEAEEA<AEEEEEEA<<AAE<EEEEEAE<AAEAAAEAEE	s1:i:240	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.147998	163	MT192765.1	674	60	151M	=	919	339	ATCTAAAGTCATTTGACTTAGGCCACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAA	AAAAAAE6EEE/EEE/E/EEA6E/EEE/AE/E//EEEEAA/E/E/EAEEEE/EEEEEEE<EE/E/A/A</<E</<AE<///A<AA<//E/AE/E/EEE/EEEEA//E/A</<AE/////E<AAE<EE//EA/<6/A</A//<AAAA<EE/A	s1:i:215	s2:i:0	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.114870	147	MT192765.1	748	60	150M	=	643	-255	AAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTTCATGC	AEEAA<<<AAAE/E/AA<<<<<<<</AE<AE<A/E<AAEE<EEA<AEEE/E<A<EEEEA/A/EEAEEAEAEEEE/EAEEEEEEEE<EEAEEEEEEEAEAAEAAEEEEEEAAEEEAEEEEEEEEEEEEEEEE/EEAEAEEEEEEEAAAAAA	s1:i:240	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.147998	83	MT192765.1	919	60	94M	=	674	-339	TTTATTGACACTAAGAGGGGTGTATACTGCTGCCGTGAACATGAGCATGAAATTGCTTGGTACACGGAACGTTCTGAAAAGAGCTATGAATTGC	EE<EEE//A/EEE/A</E<AEAE<EEEA<6EE/EEE/A/EAEAE<//EEEE/EEE6EEE/E/EE/EEE/EEAAEEEEEEEEEEEEEEAEAAAAA	s1:i:215	s2:i:0	RG:Z:1	NM:i:0	AS:i:188	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:188
+ERR5069949.155944	163	MT192765.1	978	60	150M	=	1023	195	GTACACGGAACGTTCTGAAAAGAGCTATGAATTGCAGACACCTTTTGAAATTAAATTGGCAAAGAAATTTGACACCTTCAATGGGGAATGTCCAAATTTTGTATTTCCCTTAAATTCCATAATCAAGACTATTCAACCAAGGGTTGAAAA	AAAA/EEEEEEAEEEEEEEEEEEE/EEEEEEEEEE/EAEEEEEEEEEEEEEEEAEEEAEE/AEEEEEEAAEEEEEEAEAEEEE/AEE/<EAE/E<EEA<<<AAEEAEEE<AA<EE/EAAEEEE<<<EEEA/AEAEE6</EEA<AEEE<<E	s1:i:183	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.155944	83	MT192765.1	1023	60	150M	=	978	-195	TGAAATTAAATTGGCAAAGAAATTTGACACCTTCAATGGGGAATGTCCAAATTTTGTATTTCACTTAAATTCCATAATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCC	EA<EA/<A/6A/AEA/6/66/AAAEAEEE/EEA/6AAAAAAEE</AAEEEEAAEEEAA/EEE//A/EEEEE/AE/EEE6AEEEE/A/EAEEEEE/EEAEEEAE/AEA66AEEEEEEEEE<AEEEAEEEEEEEEE6EEEEEEEAEEAAAAA	s1:i:183	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.184542	99	MT192765.1	1055	60	151M	=	1255	266	TCAATGGGGAATGTCCAAATTTTGTATTTCCCTTAAATTCCATAATCAAGACTATTCAACAAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCTCCAAATGAATGCAACCAAATGTG	AAAAAEEEEE/EA/E/EEE/EEAEE/E/EE/EEEE//AEE/E/EEE//EEE/</E/EE<<//EE/EE<EEEEEAE/E/EAAEEEAEAEE<E</EEEE/E//E<<<///E<//A<AE</<AEEEAAE///EE</EE//AA///<E</A<AEA	s1:i:155	s2:i:0	RG:Z:1	NM:i:2	AS:i:282	de:f:0.0132	rl:i:0	cm:i:12	nn:i:0	tp:A:P	ms:i:282
+ERR5069949.169513	99	MT192765.1	1098	60	92M	=	1098	92	AATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EE6EEEE	s1:i:92	s2:i:0	RG:Z:1	NM:i:0	AS:i:184	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:184
+ERR5069949.169513	147	MT192765.1	1098	48	92M	=	1098	-92	AATCAAGACTATTCAACCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATG	EEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:32	s2:i:92	RG:Z:1	NM:i:0	AS:i:184	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:184
+ERR5069949.184542	147	MT192765.1	1255	60	66M	=	1055	-266	ACGTGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGAAGGTGCC	E////E/A6EAEEE<AEE///A/A/6/EEE6AA//E/EEEAAAA6EE/A/6EAAA/EEAEAA/AAA	s1:i:155	s2:i:0	RG:Z:1	NM:i:1	AS:i:124	de:f:0.0152	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:124
+ERR5069949.257821	163	MT192765.1	2833	60	140M	=	2834	140	GCCTATACAGTTGAACTCGGTACAGAAGTAAATGAGTTCGCCTGTGTTGTGGCAGATGCTGTCATAAAAACTTTGCAACCAGTATCTGAATTACTTACACCACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEAEEEEEE	s1:i:121	s2:i:0	RG:Z:1	NM:i:0	AS:i:280	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:280
+ERR5069949.257821	83	MT192765.1	2834	49	139M	=	2833	-140	CCTATACAGTTGAACTCGGTACAGAAGTAAATGAGTTCGCCTGTGTTGTGGCAGATGCTGTCATAAAAACTTTGCAACCAGTATCTGAATTACTTACACCACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATA	A/AE<EE<EA</EAEAAA<AEEAEE/A/E<<E</E</EEEAAE/EE<E/EEEAEEEEEEE/AEEEEEEEEEEE/EEEE<EEEE/EE/EAEEE6EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:121	s2:i:48	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.309410	99	MT192765.1	3184	60	151M	=	3348	314	GAAGAAGATTGGTTAGATGATGATAGTCAACAAACTGTTGGTCAACAAGACGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTCAACCTCAATTAGAGATGGAACTTACACCAGTTGTTCAGACTATTGAAGTGA	AAAAA//EEEEA6EEEAE</EEE/EEEEE/EE6EEEEEEEEEEEEEEEAEEAAEEEEEEEEEAEEEEEE/EEAEEEEEAEAEEE/EEAEEE<AEEEAA////EEEEEEEEA//A/EE/EAAEA/AE<EE/E//E/</AEAEAE/AEA/AEA	s1:i:274	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.309410	147	MT192765.1	3348	60	150M	=	3184	-314	TTATTTAAAACTTACTGACAATGTATACATTAAAAATGCAGACATTGTGGAAGAAGCTAAAAAGGTAAAACCAACAGTGGTTGTTAATGCAGCCAATGTTTACCTTAAACATGGAGGAGGTGTTGCAGGAGCCTTAAATACGGCTACTAA	E//EEAEA<<EAAE/AAAAEAAAAEA</A/<6/E/<A<//AE/EEAAE<EEEAEEEEEEAEE/EEAEEEEEE/<E/EEE6EEAE/<EE//E</</EE/EEAAEE/EAA/EEEEAEEEEE///EA/EEEEEEEE//E66EE/E/EEA/AAA	s1:i:274	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.376959	99	MT192765.1	4105	60	151M	=	4190	235	GCTCCATATATAGTGGGTGATGTTGTTCAAGAGGGTGTTTTAACTGCTGTGGTTATACCTACTAAAAAGGCTGGTGGCACTACTGAAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAAAAAEEEEEEEEEAEEAEEEEEEEEEEEEAAAEEAEA	s1:i:224	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.366975	163	MT192765.1	4166	60	106M	=	4166	106	CTAAAAAGGCTGGTGGCACTACTGAAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAATGGTTACAC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.366975	83	MT192765.1	4166	59	106M	=	4166	-106	CTAAAAAGGCTGGTGGCACTACTGAAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAATGGTTACAC	EEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEAEEEEEEEEEE<AEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:4	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.376959	147	MT192765.1	4190	60	150M	=	4105	-235	AAATGCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAATGGTTACACTGTAGAGGAGGCAAAGACAGTGCTTAAAAAGTGTAAAAGTGCCTTTTACATTCTACCATCTATTATCT	EEAAAEAAA<AAAAAEEEA<EEEEAE<<<AAAAAAA<A<AEEEEE<EEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA	s1:i:224	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.465452	99	MT192765.1	4695	60	151M	=	4827	282	ACCTGATGCTGTTACAGCGTATAATGGTTATCTTACTTCTTCTTCTAAAACACCTGAAGAACATTTTATTGAAACCATCTCACTTGCTGGTTCTTATAAAGATTGGTCCTATTCTGGACAATCTACACAACTAGGTATAGAATTTCTTAAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE/EEAEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEE<EEEEAAAEAEEEEEEAA6AAEEEEEA<EEEEE</EEAEE/EE	s1:i:261	s2:i:0	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.465452	147	MT192765.1	4827	60	150M	=	4695	-282	AGGTATAGAATTTCTTAAGAGAGGTGATAAAAGTGTATATTACACTAGTAATCCTACCACATTCCACCTAGATGGTGAAGTTATCACCTTTGACAATCTTAAGACACTTCTTTCTTTGAGAGAAGTGAGGACTATTAAGGTGTTTACAAC	AAAEEEEEEEEEEEEAA/<EA<AA/EAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE/EEA/EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:261	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.479807	163	MT192765.1	4968	60	150M	=	5123	305	GTTTACAACAGTAGACAACATTAACCTCCACACGCAAGTTGTGGACATGTCAATGACATATGGACAACAGTTTGGTCCAACTTATTTGGATGGAGCTGATGTTACTAAAATAAAACCTCATAATTCACATGAAGGTAAAACATTTTATGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<AEEEEEEEEEE<EEE<AEE/EEEEEEEEEAEEE<AA/EAA<AEEEEEEEAEEAAA	s1:i:280	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.479807	83	MT192765.1	5123	60	150M	=	4968	-305	CTAATGATGACACTCTACGTGTTGAGGCTTTTGAGTACTACCACACAACTGATCCTAGTTTTCTGGGTAGGTACATGTCAGCATTAAATCACACTAAAAAGTGGAAATACCCACAAGTTAATGGTTTAACTTCTATTAAATGGGCAGATA	AA/EEEEAAAEAEEEAAAEEA/AAEAAEE/AAAEAAAAEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:280	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.501486	163	MT192765.1	5355	60	150M	=	5423	214	TTACAGAGCAAGGGCTGGTGAAGCTGCTAACTTTTGTGCACTTATCTTAGCCTACTGTAATAAGACAGTAGGTGAGTTAGGTGATGTTAGAGAAACAATGAGTTACTTGTTTCAACATGCCAATTTAGATTCTTGCAAAAGAGTCTTGAA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEAA/EEE/<AEEAAEAEA</EEEAEAAAAAEE	s1:i:207	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.501486	83	MT192765.1	5423	60	146M	=	5355	-214	TAGGTGAGTTAGGTGATGTTAGAGAAACAATGAGTTACTTGTTTCAACATGCCAATTTAGATTCTTGCAAAAGAGTCTTGAACGTGGTGTGTAAAACTTGTGGACAACAGCAGACAACCCTTAAGGGTGTAGAAGCTGTTATGTAC	EAAAAEAEEEE6E<AEEEEEEEE<EEEEEEAAEE/EEEEEE/<EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:207	s2:i:0	RG:Z:1	NM:i:0	AS:i:292	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.532979	99	MT192765.1	5568	60	149M	=	5621	204	CATGGGCACACTTTCTTATGAACAATTTAAGAAAGGTGTTCAGATACCTTGTACGTGTGGTAAACAAGCTACAAAATATCTAGTACAACAGGAGTCACCTTTTGTTATGATGTCAGCACCACCTGCTCAGTATGAACTTAAGCATGGTA	AAAAAEEEEEEEEEEEEEEEEEEE/EEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/A/</EEE<EEE<<AEEEEEA/AA/AE/EE/EEEEAEA</EAEEE<<AEEEEE	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.540529	163	MT192765.1	5570	60	150M	=	5659	238	TGGGCACACTTTCTTATGAACAATTTAAGAAAGGTGTTCAGATACCTTGTACGTGTGGTAAACAAGCTACAAAATATCTAGTACAACAGGAGTCACCTTTTGTTATGATGTCAGCACCACCTGCTCAGTATGAACTTAAGCATGGTACAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEAEAEEEEEEEEEEAAEEEEEEEEEEAEEEEEEE<EA<AEEEAEEEAEEEEAEEAEEEAEEEEEEEAAAEEE	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.532979	147	MT192765.1	5621	60	151M	=	5568	-204	CGTGTGGTAAACAAGCTACAAAATATCTAGTACAACAGGAGTCACCTTTTGTTATGATGTCAGCACCACCTGCTCAGTATGAACTTAAGCATGGTACATTTACTTGTGCTAGTGAGTACACTGGTAATTACCAGTGTGGTCACTATAAACA	AE/E/<E/AA/EA<EA<EEEEAEA<AEAEE/AEAAA</<EEEE<AEEAEEE/EE/AEA<AE<EEEEEEEE/EEEEEEAE/EEEEEEEEEEE/EEAEEEEEEEEEEE/EEEAE6EEE/EEEEEEEAEEEEAEEEEEEEEEEEEEEEEAAAAA	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.540529	83	MT192765.1	5659	60	149M	=	5570	-238	GAGTCACCTTTTGTTATGATGTCAGCACCACCTGCTCAGTATGAACTTAAGCATGGTACATTTACTTGTGCTAGTGAGTACACTGGTAATTACCAGTGTGGTCACTATAAACATATAACTTCTAAAGAAACTTTGTATTGCATAGACGG	AEAEAEE<EE<AAEA<EEE/<EE6A<AEEE<EE<EEEE<EEE/E<AEEE<E/<EAEEEEEEE<EEEAEEEAEAEAAEEEEEEEEEEEEEEAEEEEEEEEEE/EA<E/EAEEEEEEAEE6EEEEEEEAEEEAEE/AAEEEE/A/EAAAAA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.573706	99	MT192765.1	5697	60	150M	=	5784	236	GTACGAACTTAAGCATGGTACATTTACTTGTGCTAGTGAGTACACTGGTAATTACCAGTGTGGTCACTATAAACATATATCTTCTAAAGAAACTTTGTATTGCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTAT	AAAAA6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAAEEEAEEEEEEEEEEE	s1:i:214	s2:i:0	RG:Z:1	NM:i:2	AS:i:282	de:f:0.0133	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:282
+ERR5069949.573706	147	MT192765.1	5784	60	149M	=	5697	-236	AGAAACTTTGTATTGCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGTTACACAACAACCATAAAACCAGTTACTTATAAATTGGATGGTGTTGTTTGTACAGAAA	AA<E<EEEEEEEEA<AEEEAEEAA<<EEE<AEEEEEEAEAAAAEAEAEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:214	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.576388	163	MT192765.1	5798	60	77M	=	5798	77	GCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGT	AAAAA6EEAEEEEEAEEAEEAEEEEEEA6EEEEAEEAEEEEE6EEEEEEAEEEEA///A<<EEEEEEEEEAEEEEEE	s1:i:62	s2:i:0	RG:Z:1	NM:i:0	AS:i:154	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:154
+ERR5069949.576388	83	MT192765.1	5798	50	77M	=	5798	-77	GCATAGACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAGT	EA/AEEE/<EEEEEEEEEEEAA<EEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEAEE6/EEEAEEEEEEEEEA6AAA	s1:i:62	s2:i:0	RG:Z:1	NM:i:0	AS:i:154	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:154
+ERR5069949.611123	163	MT192765.1	6481	60	125M	=	6481	125	ATTATACTTAAACCAGCAAATAATAGTTTAAAAATTACAGAAGAGGTTGGCCACACAGATCTAATGGCTGCTTATGTAGACAATTCTAGTCTTACTATTAAGAAACCTAATGAATTATCTAGAGT	AAAAAEEEEEA6EEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEE<EEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEE/EEEEEEEA/AAEAAEAAEAE	s1:i:117	s2:i:0	RG:Z:1	NM:i:0	AS:i:250	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:250
+ERR5069949.611123	83	MT192765.1	6481	48	125M	=	6481	-125	ATTATACTTAAACCAGCAAATAATAGTTTAAAAATTACAGAAGAGGTTGGCCACACAGATCTAATGGCTGCTTATGTAGACAATTCTAGTCTTACTATTAAGAAACCTAATGAATTATCTAGAGT	EEEAEEEEEEEEEEEA<EEEAEEEEA/EEEEEEEEEAEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:50	s2:i:117	RG:Z:1	NM:i:0	AS:i:250	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:250
+ERR5069949.651338	163	MT192765.1	7629	60	149M	=	7745	254	ATTCTGTGCTGGTAGTACATTTATTAGTGATGAAGATGCGAGAGACTTGTCACTACAGTTTAAAAGACCAATAAATCCTACTGACCAGTCTTCTTACATCGTTGATAGTGTTACAGTGAAGAATGGTTCCATCCATCTTTACTTTGATA	AAAAAE/EAEEE/AEAEEE/EEEAAEEEEAEEEEE/EEEEAEEEEEEAEE/EEEE/EEE</EE/AEAE/<E/EEAEE<EEEE//AEEEEEE<EEAEE/EEE//E/<EE<A<A/EAA<AA/AEEA//A<A/A<A<6A6/AEE/AEEA<AE	s1:i:223	s2:i:0	RG:Z:1	NM:i:1	AS:i:288	de:f:0.0067	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.651338	83	MT192765.1	7745	60	4S138M	=	7629	-254	ACTCTGAAGAATGGTTCCATCCATCTTTACTTTGTTAAAGCTGGTCAAAAGACTTATGAAAGACATTCTCTCTCTCATTTTGTTAACTTAGACAACCTGAGAGCTAATAACACTAAAGGTTCATTGCCTATTAATGTTATAG	A///A/6/<EEEA//EE/EE<AEEE/<A/EAE<</A/A<EEE/E<EEEEE<</EEEA<E/EEAAEEEEAE/EEEEEEEEEEEEEE/E/A/EE//<AE/EEEAEEA</EE/AEEEE/AEEEEAEEEEEEEEEAEAEEEAAAAA	s1:i:223	s2:i:0	RG:Z:1	NM:i:1	AS:i:266	de:f:0.0072	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:266
+ERR5069949.686090	163	MT192765.1	7975	60	151M	=	8097	272	GATCAGGCATTAGTGTCTGATGTTGGTGATAGTGCGGAAGTTGCAGTTAAAATGTTTGATGCTTACGTTAATACGTTTTCATCAACTTTTAACGTACCAATGGAAAAACTCAAAACACTAGTTGCAACTGCAGAAGCTGAACTTGCAAAGA	AAAAAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEAEEEEEEEE/EEEEEEEEEEEEEEAEEEEEAEEEE<AEE/EEEEEEEAAAAEEEEEEEEEEEAEAEEEEEAEEEEA	s1:i:252	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:27	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.686090	83	MT192765.1	8097	60	150M	=	7975	-272	TGCAACTGCAGAAGCTGAACTTGCAAAGAATGTGTCCTTAGACAATGTCTTATCTACTTTTATTTCAGCAGCTCGGCAAGGGTTTGTTGATTCAGATGTAGAAACTAAAGATGTTGTTGAATGTCTTAAATTGTCACATAAATCTGACAT	EEEAEAEEEEEEEEEEEEEAEEEEEAEEE<EAEE/EEEEEEEEAEEE6EEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE/EEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:252	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.786562	163	MT192765.1	8904	60	150M	=	9096	343	GCATTTCTTACCTAGAGTTTTTAGTGCAGTTGGTAACATCTGTTACACACCATCAAAACTTATAGAGTACACTGACTTTGCAACATCAGCTTGTGTTTTGGCTGCTGAATGTACAATTTTTAAAGATGCTTCTGGTAAGCCAGTACCATA	AAAAAEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEE<EEAEEEEE<EEEEEEEEEAEEEAEEEAEEAA6A<EEEEAAEEEEAA/AEEEEEE/EEEEEEE	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.786562	83	MT192765.1	9096	60	151M	=	8904	-343	AAGTTTACGCCCTGACACACGTTATGTGCTCATGGATGGCTCTATTATTCAATTTCCTAACACCTACCTTGAAGGTTCTGTTAGAGTGGTAACAACTTTTGATTCTGAGTACTGTAGGCACGGCACTTGTGAAAGATCAGAAGCTGGTGTT	AEAE<AE/AAAEEAAEE<EEAEEEEAEEEEAAA/AEEEAEAEAEEEEEEEEAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAA6A	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.856527	99	MT192765.1	10118	60	97M1D54M	=	10199	233	CAACTACACTTAACGGTCTTTGGCTTGATGACGTAGTTTACTGTCCAAGACATGTGATCTGCACCTCTGAAGACATGCTTAACCCTAATTATGAAGATTACTCATTCGTAAGTCTAATCATAATTTCTTGGTACAGGCTGGTAATGTTCAA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAAEEEEEEEAEEEE<AAAEEAEEEAEEEE/EEEEEAEEEAEAEE//<//AE//E<E//<//EA/A<EA6EE//E/AA/6A//6/AEE<EE6AE/AE	s1:i:197	s2:i:0	RG:Z:1	NM:i:1	AS:i:288	de:f:0.0066	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.870926	99	MT192765.1	10118	60	149M	=	10245	278	CAACTACACTTAACGGTCTTTGGCTTGATGACGTAGTTTACTGTCCAAGACATGTGATCTGCACCTCTGAAGACATGCTTAACCCTAATTATGAAGATTTACTCATTCGTAAGTCTAATCATAATTTCTTGGTACAGGCTGGTAATGTT	AAAAAEEAEEEEEEEEEAEEEEEEAEEEEEEEEE/EEEEEEEAEEEEEEEAEEA<AE/AEEEEEEEEE<EEEEEEE/AEAEEEEAAEEEEAAEEEEEAEEEEEEEEAEEAAEEEEEAEAA/EEEAAEEEAAAA/A/EA/E/AEEEE/EE	s1:i:262	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.856527	147	MT192765.1	10199	60	16M1D135M	=	10118	-233	ACCCTAATTATGAAGATTACTCATTCGTAAGTCTAATCATAATTTCTTGGTACAGGCTGGTAATGTTCAACTAAGTGTTATTGGACATTCTATGCAAAATTGTGTACTTAAGCTTAAGGTTGATACAGCCAATCCTAAGACACCTAAGTAT	A///6A/A6<//EA/EEEE<E<A<A6/<<A<A/A<EE6<EEEEE<A/AEEEE/AAA<E/E<EAEEEEEEEEE/A6/EEE/EE//E6EEEAE/EE<<AE/<EAEEEAAAEEAEEEEEE<</E<EEEEEEAEEEEEEEEAEEEE/EAAAAAAA	s1:i:197	s2:i:0	RG:Z:1	NM:i:3	AS:i:268	de:f:0.0197	rl:i:0	cm:i:12	nn:i:0	tp:A:P	ms:i:268
+ERR5069949.885966	99	MT192765.1	10230	60	79M	=	10277	118	GTCTAATCATAATTTCTTGGTACAGGCTGGTATTGTTCATCTCAGGGTTATTGGACATTCTATGCAAAATTGTGTACTT	AAA//E/EAA/E//E//E//E/E//AE/A/E//EAEA///AE//E///E/EEE6EEEAEEA///E/AEE/EAEE/E//E	s1:i:86	s2:i:0	RG:Z:1	NM:i:2	AS:i:138	de:f:0.0253	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:138
+ERR5069949.870926	147	MT192765.1	10245	60	151M	=	10118	-278	CTTGGTACAGGCTGGTAATGTTCAACTCAGGGTTATTGGACATTCTATGCAAAATTGTGTACTTAAGCTTAAGGTTGATACAGCCAATCCTAAGACACCTAAGTATAAGTTTGTTCGCATTCAACCAGGACAGACTTTTTCAGTGTTAGCT	AAAAE////<6/EA6/</EE/EEEEAAAA<AE/EAA</</</</EEE</E/EEEAEE<E<<EEEE<EEEEEEEA//EEEA</EA<EA/EAA/EEEEEEEAEEEAEEEEEEEEEE/E/A/EEEEEEEE/EEEEEEEEEEEE6EEEE/6AAAA	s1:i:262	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.885966	147	MT192765.1	10277	60	62M2D7M	=	10230	-118	TTATTGGACATTCTATGCAAAATTGTGTACTTAAGCTTAAGGTTGATACAGCCAATCCTAAGACCTAAG	6/E//AE</E/E/AAE/EAAEE/E/E/EA6EAEEEAE/EAA///AE/AEE/EEE6EEEEEAEE6AA//A	s1:i:86	s2:i:0	RG:Z:1	NM:i:2	AS:i:128	de:f:0.0143	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:124
+ERR5069949.937422	99	MT192765.1	10422	60	85M32D66M	=	10591	320	TTACCAATGTGCTATGAGGCCCAATTTCACTATTAAGGGTTCATTCCTTAATGGTTCATGTGGTAGTGTTGGTTTTAACATAGATTATGGAATTACCAACTGGAGTTCATGCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEA<E<EE/EAE/E//EE/EEEEEEEAEEE<E/EEE/EEA<EEE</66EE<A/AA<EEEA<EAE/A//AEEAEE///<A<EEEEEEA	s1:i:261	s2:i:0	RG:Z:1	NM:i:32	AS:i:246	de:f:0.0066	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:226
+ERR5069949.919671	163	MT192765.1	10467	60	150M	=	10501	185	CCTTAATGGTTCATGTGGTAGTGTTGGTTTTAACATAGATTATGACTGTGTCTCTTTTTGTTACATGCACCATATGGAATTACCAACTGGAGTTCATGCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGTTGACAGGCAAAC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAE/EEEEEEAEEEEEEEEAEEEEEEEEEEEAEEEEEEEAEEEEAEEEEAEEEEE6EEEEEEEAAEAEEEEEEE<EEEEEEE6AAEEAEEEAA6AEEAAAAAEEAAEEEAEAEEE	s1:i:184	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:24	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.919671	83	MT192765.1	10501	60	151M	=	10467	-185	ATAGATTATGACTGTGTCTCTTTTTGTTACATGCACCATATGGAATTACCAACTGGAGTTCATGCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGTTGACAGGCAAACAGCACAAGCAGCTGGTACGGACACAACTATTACAG	EEEEEEEEAAEAAAEEAA6AEEEEEEEEAEEAAAAE/AEEEAEEEAEEEAEEEEEEEEEEEEEEEEEEEEAAEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAAAAA	s1:i:184	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.937422	147	MT192765.1	10591	60	151M	=	10422	-320	TATGGACCTTTTGTTGACAGGCAAACAGCACAAGCAGCTGGTACGGACACAACTATTACAGTTAATGTTTTAGCTTGGTTGTACGCTGCTGTTATAAATGGAGACAGGTGGTTTCTCAATCGATTTACCACAACTCTTAATGACTTTAACC	AA/A<EE/EEEAAE<EAA/AE6AAAAAAAAA<AAAAEAEEE/AEEEA<AE<AEAEEAAEAEEEEEEEAEEEEEEEEEE//EEAEEE<EEEAEEEEEEAAAAAEEEEEE/EE/6EEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEAAAAA	s1:i:261	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.973930	163	MT192765.1	10924	60	112M	=	10957	112	CCTTTTGATGTTGTTAGACAATGCTCAGGTGTTACTTTCCAAAGTGCAGTGAAAAGAACAATCAAGGGTACACACCACTGGTTGTTACTCACAATTTTGACTTCACTTTTAG	AAAAAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEAE<EAEEEEEEAEEAEE	s1:i:101	s2:i:0	RG:Z:1	NM:i:0	AS:i:224	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:224
+ERR5069949.973930	83	MT192765.1	10957	50	79M	=	10924	-112	ACTTTCCAAAGTGCAGTCAAAAGAACAATCACGGGTACACACCACTGGTTGTTACTCACAATTTTGACTTCACTTTTAG	<////E/EE/E//E/<//E/E//A/6EA/EE/EE///E/EAEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:101	s2:i:0	RG:Z:1	NM:i:2	AS:i:138	de:f:0.0253	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:138
+ERR5069949.986441	99	MT192765.1	11007	60	119M	=	11104	247	GTTACTCACAATTTTGACTTCACTTTTAGTCTTAGTCCAGAGTACTCAATGGTCTTTGTTCTTTTTTTTGTATGAAAATGCCTTTTTACCTTTTGCTATGGGTATTATTGCTATGTCTG	AAAAAEAEEEEEEE/EEE/EEEEEAEEEEEEEEEEEEEEEEEEEEE</EAAEA/EEEEEEEEAEAAEEEEEEEEEEEEE/E//<EAE/6///EE//E/EEE///E<EEEEA</A<<//<	s1:i:200	s2:i:0	RG:Z:1	NM:i:1	AS:i:228	de:f:0.0084	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:228
+ERR5069949.986441	147	MT192765.1	11104	60	150M	=	11007	-247	ATGGGTATTATTGCTATGTCTGCTTTTGCAATGATGTTTGTCAAACATAAGCATGCATTTCTCTGTTTGTTTTTGTTACCTTCTCTTGCCACTGTAGCTTATTTTAATATGGTCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACA	A6A<AEEEEE<E<EAEAAEA<AAEEAEA</EAEEA<E/E/E/EEEEAEAA/<EAAAEAEEE/EEEEEEEAEEE/EAEAE/AEAAA/EAEEEEEEAEAEEEEEEEAEAEEEEE/EAEEEEEEAEEEEEAEEEEEEEEAEEEEEEEEAAAAA	s1:i:200	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1014693	99	MT192765.1	11215	60	150M	=	11215	150	GTCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCCTTATGACAGCAAGAACTGTGTATGAT	AAAAAAEEAEEE6EAE//E/EEE6AEAA/EAEAEE6/E//EAE/EEEEAEE/EEE/EAEEEEEAE/EEEEEAEEEEEAAEEAEEE/AE/EAEAEEEEEEEEEEEEEE/AE/E/E/<<<AA<E<AEE</EEEEA6<AEEAAAA//A//EEE	s1:i:136	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1014693	147	MT192765.1	11215	48	150M	=	11215	-150	GGCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCCTTATGACAGCAAGAACTGTGTATGAT	A/<EEEAA<<AA<AEAE<6<A<AA<EA<///EAEEE<AAEAA/EA6/EEEEE/E/EE/AEAEAEEE<AEEEEEEE6<AAEEEEE<EEEAEEEEEEAAEAEAEEEAAEEEEEEEEEE/EEEEEEEEE/EEEEEEEEAEE/EAEEEEAAAAA	s1:i:33	s2:i:136	RG:Z:1	NM:i:1	AS:i:296	de:f:0.0067	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1020777	163	MT192765.1	11217	60	122M	=	11217	122	CTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEAEEEEEEAEEEAAEEEEEEEEEAEEEEA	s1:i:110	s2:i:0	RG:Z:1	NM:i:0	AS:i:244	de:f:0	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:244
+ERR5069949.1020777	83	MT192765.1	11217	50	122M	=	11217	-122	CTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATATGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACTAATCC	EEEEA6AAAA6E/AA6AAAE/EEA<EE<AEEEAE<EAEAEAEAE<EEEEE/AEEAAEEEEAEEEEEEEE/EEEEE/EEEEEEEEEEEEE6EEEEEE/EEEEEE<EEEAEE6E6EEEEAAAAA	s1:i:110	s2:i:41	RG:Z:1	NM:i:0	AS:i:244	de:f:0	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:244
+ERR5069949.1066259	99	MT192765.1	11337	60	147M	=	11480	294	CCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTGACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTC	AAAAAEAEEAEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEE/EAEEEEEE/6EEEEEEEEEEAEEAEEE/EE/AEEAEEEEEAEEEA/EEAAEAE<AEEAEEEAEAEEEAEAEEAE/AEEEEAEEEEAEA	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.1062611	163	MT192765.1	11427	60	151M	=	11454	178	TGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTGTGTTGAGTATTGCCCTATTTTCTTCATAACTGGT	AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAEEE<AAEEEEEEEEAAEEAAA6AEEEEEEEEEEEEEEEEEA	s1:i:167	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1067032	163	MT192765.1	11434	60	150M	=	11481	197	GCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTGTGTTGAGTATTGCCCTATTTTCTTCATAACTGGTAATACA	AAAAAAEEEE666<EEEE/E/EEEAEAEEEE/EEEEEEEE/AEEEAEE/<AAEE<EAEEEA/AEEE/EAEEEE<E<AEEAEEEE<<//EEE/EEE<EEE<A/A/A<EEE///A/6<A<AE<//<EEEE6</<AAAAAAE<A//<</<A/E	s1:i:171	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1062611	83	MT192765.1	11454	60	151M	=	11427	-178	CATGTGGGCTCTTATAATCTCTGTTACTTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTGTGTTGAGTATTGCCCTATTTTCTTCATAACTGGTAATACACTTCAGTGTATAATGCTAGTT	AEEAAEEEAAEEEEAEAEEEAEEEEEEAEEEEEEEEEEEEEA<EEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEAEEA6EEEEEEEEAEA/EEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:167	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1066259	147	MT192765.1	11480	60	151M	=	11337	-294	CTTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTGTGTTGAGTATTGCCCTATTTTCTTCATAACTGGTAATACACTTCAGTGTATAATGCTAGTTTATTGTTTCTTAGGCTATTTTTGTAC	EE//AAEEEEEEEAEE<AEEA6A<AEEAEEAAAEAA/EEEA<AA<EA<E/A//EA/EEEEEEEEEE/<AAE/AEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAAAAA	s1:i:272	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1067032	83	MT192765.1	11481	60	150M	=	11434	-197	TTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTGTGTTGAGTATTGCCCTATTTTCTTCATAACTGGAAATACACTTCAGTGTATAATGCTAGTTTATTGTTTCTTAGGCTATTTTTGTAC	EE<EAAE/EEAEE/<<AEEEAA<6</E/AAA/EE/EEEAAEEEEEE<A///AAEEEE/EE<EE/A6EAE//EEE6AEEEA6EEEEEEAEEEEAAAE/EEEEAE//EE<EEEE/EEEEEEEE/E/EEEEEEEEEEEEAE/EAEEE/AAAAA	s1:i:171	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.1088785	99	MT192765.1	11864	60	149M	=	11912	198	CAGTAGTCTTACTCTCAGTTTTGCAACAACTCAGAGTAGAATCATCATCTAAATTGTGGGCTCAATGTGTCCAGTTACACAATGACATTCTCTTAGCTAAAGATACTACTGAATCCTTTGAAAAAAAGGTTTCACTACTTTCGGTTTTG	AAAAAE/EAEE<EEA///<AEEE/EE<AEEE<EA/EEEEEEE/EAAAEEEEEE<E/E6AE<<E/EEA//</E/EEE/EEE/EE/E/<<EEAAAE<EEEEEE/EAEA//<//AA/E</A<<E/EEEE/AEE<E/<EAE</A6///AEEAA	s1:i:182	s2:i:0	RG:Z:1	NM:i:3	AS:i:268	de:f:0.0201	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:268
+ERR5069949.1088785	147	MT192765.1	11912	60	150M	=	11864	-198	CTAAATTGTGGGCTCAATGTGTCCAGTTACACAATGACATTCTCTTAGCTAAAGATACTACTGAAGCCTTTGAAAAAATGGTTTCACTACTTTCTGTTTTGCTTTCCATGCAGGGTGCTGTAGACATAAACAAGCTTTGTGAAGAAATGC	AEEEEE<E//E<EAEE/AAAA<AEEEAEEEEE<AEEAEEEEEEAEAE</AE/EEE/<EEEEAEEEEEEEEEAEEEEEEE/EEEEEEEEEEEEEEEEEEEEAA/EEEAEE/EEAEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEAAAAA	s1:i:182	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1132353	163	MT192765.1	12066	60	151M	=	12075	159	CAACAGGGCAACCTTACAAGCTATAGCCTCAGAGTTTAGTTCCCTTCCATCATATGCAGCTTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTG	AAAAAEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEE<EEEEEEEEAAEEAAEEEEEEEEEEEEEAE<AAAAAAE/AEEAEEEEEEEEEEEEEAAAEAAEEA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1132353	83	MT192765.1	12075	60	150M	=	12066	-159	AACCTTACAAGCTATAGCCTCAGAGTTTAGTTCCCTTCCATCATATGCAGCTTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATC	EEAEEEEEEEEEEEEEE<A<EEEEEEEEEEEAAEE<EAEEAAEAEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEAAAAA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1151736	163	MT192765.1	12126	60	151M	=	12222	247	TTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATCTGAATTTGACCGTGATGCAGCCATGCAACGTAAGTTGGAAAAGATGGCTGAT	AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEE<EEEEAEEEEEEEEEEEEEEEEAEEEEEAEEAAEEE<AEAEEE<A/AAEEEEEEEAAAAA<AAAE<EEEEAEEEAEEEEEEAEEAEA/A	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1151736	83	MT192765.1	12222	60	151M	=	12126	-247	ATCTGAATTTGACCGTGATGCAGCCATGCAACGTAAGTTGGAAAAGATGGCTGATCAAGCTATGACCCAAATGTATAAACAGGCTAGATCTGAGGACAAGAGGGCAAAAGTTACTAGTGCTATGCAGACAATGCTTTTCACTATGCTTAGA	AAAAAAEA//EE/EAAAEAEEEEAAEEAA</AEEEEEEAAEAAEEEEEA<EEEEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEE<EE/EEEEEEAEE/EEEEEEEEEE/EEEEEEEEEEEAEEEE/EEAEEEEEAEEAAAAA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1258508	163	MT192765.1	12425	60	151M	=	12638	364	GTGTTCCCTTGAACATAATACCTCTTACAACAGCAGCCAAACTAATGGTTGTCATACCAGACTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE/EEEEEEEEEAEEEEEEEEA/AEAEAAAEAEEEEEAEEEEE<<EEEEEEE<AEAEAAA<EEE	s1:i:268	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1189252	99	MT192765.1	12486	60	98M	=	12486	98	CTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAGTAAAATTG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEAEEE	s1:i:88	s2:i:0	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.1189252	147	MT192765.1	12486	52	98M	=	12486	-98	CTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAGTAAAATTG	EEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:88	s2:i:27	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:2	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.1261808	99	MT192765.1	12593	60	149M	=	12654	166	GTGAAATTAGTATGGACAAATCACCTAATTTAGCATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATAACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGAACTACACAAACT	AAAA/AAA/EEEE/AE<EE//EE/EEEEEA<EAEEE66//EE/E/A6EEEA/E<<E/E//A/EE/AEEE6/E6EA<EEE/E//EAAEEEA/E//EEE/EA/A/A</<<E/AA<EEE<E<EEEE<A6</<A//A/AA/<<<AA<<A<AEA	s1:i:134	s2:i:47	RG:Z:1	NM:i:3	AS:i:268	de:f:0.0201	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:268
+ERR5069949.1246538	99	MT192765.1	12601	60	148M	=	12627	177	AGTATGGACAATTCACCTAATTTAGCATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEA/EEEEEAEEEEEE/EEEEEEEEEEEEAAAAEEAEEEEEEEEEEEEEEEEE	s1:i:168	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1246538	147	MT192765.1	12627	60	151M	=	12601	-177	ATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAAC	AAAAAAEEEAAEEEEAAEAAAEEA<AAAEEAEEEAAEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA	s1:i:168	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1258508	83	MT192765.1	12638	60	151M	=	12425	-364	TTGTAACAGCTTTAAGGGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAACACAACAAAGGG	EAAEEA/EEEAAAEEEAEEEEEEAA/AAEEEE/<EA/EAAEEEAAAAAAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEAAAAA	s1:i:268	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1261808	147	MT192765.1	12654	60	105M	=	12593	-166	GGCCAATTCTGCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTACACAAACTGCTTGCACTGATGACAA	<//</EEE/<<EE<E/A<AEAE/<E6////EE/E/EAE<E/EEEE//AEAEEEEA/A/EEAE/EEEEEE/EE/E//AEEEEEEEAE/EEAE/EAE6AEEEA//AA	s1:i:134	s2:i:0	RG:Z:1	NM:i:0	AS:i:210	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:210
+ERR5069949.1328186	163	MT192765.1	12866	60	151M	=	12953	238	GTACTATCTATACAGAACTGGAACCACCTTGTAGGTTTGTTACAGACACACCTAAAGGTCCTAAAGTGAAGTATTTATACTTTATTAAAGGATTAAACAACCTAAATAGAGGTATGGTACTTGGTAGTTTAGCTGCCACAGTACGTCTACA	AAAAAEEEEEEE/EEAEEEEAEEEEEAEEEEEEEEAEEEEEEEEEEEEEAE/EEEEEEAEE/EEEEEEEEEEEEEEEEEAAAEA/EEEEEEEAAEEEEE/EEEEAEEEEEAAEEEE/AAAE<A<EEEE6AEEAAA<<<<AA<AE/EEAEEA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1328186	83	MT192765.1	12953	60	151M	=	12866	-238	AAGGATTAAACAACCTAAATAGAGGTATGGTACTTGGTAGTTTAGCTGCCACAGTACGTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGA	EE/<E6/E<AAE<E<EEAEE<AAEE//EEEEEA<A6</EEAEEEEE<AAAEEEEEEEEAEEEEEEAEE/EEEAEEEEEEEE/EAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:226	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1331889	99	MT192765.1	13010	60	132M	=	13010	132	GTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTG	A/AAAEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEE/EEEEE<AEAEEEEE/EAEAEEE/AEEEEEEEEEEEEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEA<EE	s1:i:122	s2:i:0	RG:Z:1	NM:i:0	AS:i:264	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:264
+ERR5069949.1331889	147	MT192765.1	13010	48	132M	=	13010	-132	GTCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTG	A/EEEEEAEEEEEEEEAEEEEEEEEEA<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEAAEEEEEE/EEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEAAAAA	s1:i:26	s2:i:122	RG:Z:1	NM:i:0	AS:i:264	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:264
+ERR5069949.1372331	163	MT192765.1	13011	60	150M	=	13132	272	TCTACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACA	AAAAAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEE/EEEEEEE<EEAEEEEEEA/EEA<EAEEEEEEAEAAAEEAAAEEEEEAAA<AAAEEEA	s1:i:257	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:25	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1340552	163	MT192765.1	13021	60	148M	=	13029	159	GGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEAEAEEEEEEEEEE<EEEEEEEEEEEEEEAAA<AEEEEEEEEEEEEEE	s1:i:145	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1340552	83	MT192765.1	13029	60	151M	=	13021	-159	AACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGATGCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTACTGGTCAGGCA	AEAAAAEE/A<EEAAEEE/EEEEEEEEEEAAEEEEEEEEAAEEEEEEEEE<EEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA	s1:i:145	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1372331	83	MT192765.1	13132	60	151M	=	13011	-272	ACTAATTGTGTTAAGATGTTGTGTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGAT	EE<<EEEEEEEAAAEEEEEAEEEEEEEEEAEEEEEEEEAEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:257	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1412839	163	MT192765.1	13154	60	150M	=	13187	180	GTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGT	AAAA6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAAAEEEEEEEEEEAAEEAEAE<EEEAEAEEE/<AAAEAEAA/EAEEEEAEEAAE/AEA/EEEAEEAEAA	s1:i:166	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1412839	83	MT192765.1	13187	60	147M	=	13154	-180	TTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATG	EEA<AAEAAAAAAE<A<<EA<EAE</E<EEEEE/EEEEAAAEEE/EEEE/EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEA<EEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEAAAAA	s1:i:166	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.1476386	99	MT192765.1	13329	60	151M	=	13382	201	TAATGACCCTGTGGGTTTTACACTTAAAAACACAGTCTGTACCGTCTGCGGTATGTGGAAAGGTTATGGCTGTAGTTGTGATCAACTCCGCGAACCCATGCTTCAGTCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTG	AAAAA/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEAEEEEEEE/AEEE/EEEEEE/AEE/EEAE/EEE<EA/<EEA/EEEEE/EEEEAAEEEAAAAEEAEEE	s1:i:188	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1476386	147	MT192765.1	13382	60	148M	=	13329	-201	TGTGGAAAGGTTATGGCTGTAGTTGTGATCAACTCCGCGAACCCATGCTTCAGTCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACACCGTGCGGCACAGGCACTAGTACTGATGTCGTATAC	AAEEEA<AEA/AAAEEE/E/AEE/E6AE/EAE/EEE<EEEAEEEEEEEEAAEE<<EEEEEEEEEEEEEEEEEEEEEA/EEEEEAA//EAEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEAA6AA	s1:i:188	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.1538968	163	MT192765.1	13799	60	151M	=	13817	168	CACGATGGCAGACCTCGTCTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGACACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATGATTTTGTAGAAAACCCAGATATATTA	AAAAAEEEAEEAEEEAEEEAEEEAE<EEE6EAEA<EAAAEEEEEEEEEEEEEA/</EEEEEEEEEEEEEEEEEEEEEEAEEEEE/AEEEEEEEEAEEEEEEEEEEEEEEEEAEEEAA<AEAEE<AAE<A<AEEEEE/EA6AAA/EE/EEEA	s1:i:154	s2:i:0	RG:Z:1	NM:i:1	AS:i:294	de:f:0.0066	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.1538968	83	MT192765.1	13817	48	150M	=	13799	-168	CTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGACACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATGATTTTGTAGAAAACCCAGATATATTACGCGTATACGCCAACTT	AEE6AA<E/EA/<AE<AEA<6AA6AAEEEAAA6/6</AEEEE<EEEEEEE/EEEE//EEEAEEE/EEEA/EEEAEE/EEEE/EAEEEEEE<AEEEEAEEEEAEAEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEAEEEEEEEAAAAA	s1:i:41	s2:i:154	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1552198	99	MT192765.1	13944	60	150M	=	14027	234	ATATTACGCGTATACGCCAACTTAGGTGAACGTGTACGCCAAGCTTTGTTAAAAACAGTACAATTCTGTGATGCCATGCGAAATGCTGGTATTGTTGGTGTACTGACATTAGATAATCAAGATCTCAATGGTAACTGGTATGATTTCGGT	AAAAAEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEAEAEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEA	s1:i:229	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1561137	163	MT192765.1	13991	60	149M	=	14081	240	GTTAAAAACAGTACAATTCTGTGATGCCATGCGAAATGCTGGTATTGTTGGTGTACTGACATTAGATAATCAAGATCTCAATGGTAACTGGTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTGTAGATT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEE<EAEEEAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEE<EEEEAEEEE/EEEEAAAEEEA<AEEEAEEEEEAEAEEA/AA<A	s1:i:223	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.1552198	147	MT192765.1	14027	60	151M	=	13944	-234	TGCTGGTATTGTTGGTGTACTGACATTAGATAATCAAGATCTCAATGGTAACTGGTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTGTAGATTCTTATTATTCATTGTTAATGCCTATATTAACCTTGACC	AE/AAE<AEAAA/A<///<AA6AAAE<E/E/EA<AEE/</EEEEEAEE/EE/AEA/E/<EEE/AEA//EE</AA<AEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:229	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.1561137	83	MT192765.1	14081	60	150M	=	13991	-240	GTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTGTAGATTCTTATTATTCATTGTTAATGCCTATATTAACCTTGACCAGGGCTTTAACTGCAGAGTCACATGTTGACACTGACTTAACAAAGCCTTACAT	AE<EAAA/EEAEE<A<A/AAEE<EE/EEAAAEEEEEAEEEEEEE/EEEEEEEEEEEEEEEEAEEAEEAEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:223	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1704586	99	MT192765.1	14601	60	149M	=	14761	310	GATAAACGCACTACGTGCTTTTCAGTAGCTGCACTTACTAACAATGTTGCTTTTCAAACTGTCAAACCCGGTAATTTTAACAAAGACTTCTATGACTTTGCTGTGTCTAAGGGTTTCTTTAAGGAAGGAAGTTCTGTTGAATTAAAACA	AAAA6EEEE/EE6EEEEEEEEEEEEEEEEEE<EEEEEEEE6EEAEEEEEA<EEEEE66EEEEE///EEEAEEEE<EEEEEEA/EE/EEEEEEEAE<E<AA<AAAEEAE/AEE<E<AA<EAAEEAE/AEE/E/EAEAAAEE/EA/A//EE	s1:i:277	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.1704586	147	MT192765.1	14761	60	150M	=	14601	-310	CTCAGGATGGTAATGCTGCTATCAGCGATTATGACTACTATCGTTATAATCTACCAACAATGTGTGATATCAGACAACTACTATTTGTAGTTGAAGTTGTTGATAAGTACTTTGATTGTTACGATGGTGGCTGTATTAATGCTAACCAAG	A//EEAE<AAA<AEAA6EEE</<AAA6EE//A<A<<AE<E//AEEEEE<EEEEAEAA<AEA<AE/EEEEAEEAEEEAEAEEAEEE/EEEEEEAE<EEEEEEEEEEEEEE/EEEEEEAEEEEEEEEEEE/EEEEEE/EEEEE<EEEA/AAA	s1:i:277	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1709367	163	MT192765.1	14886	60	129M	=	14886	129	GGTGGCTGTATTAATGCTAACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATACACTT	AAAAAEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEAEEEEEEEEEEEEEEE	s1:i:117	s2:i:0	RG:Z:1	NM:i:1	AS:i:248	de:f:0.0078	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:248
+ERR5069949.1709367	83	MT192765.1	14886	50	129M	=	14886	-129	GGTGGCTGTATTAATGCTAACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATACACTT	AA/EEAAAEEEEAEE6A/EAAEAAEAAAAAAAAEEAEEE/AEAE<AEEAEAE/EEEEEEEEA/EEAA<AEE/EEE<AEA<EAAEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEAAAAA	s1:i:117	s2:i:42	RG:Z:1	NM:i:1	AS:i:248	de:f:0.0078	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:248
+ERR5069949.1778133	163	MT192765.1	15485	60	150M	=	15491	158	TGCCACAACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTCACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTACAACACAGACTTTATGAGTGTCTCTATAGAAA	AAAAAEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAAEEEEEE<AEEEEEEE/AAAE<AAEEAAEEEA<EAAEEEA<AAEEEEEE/EEAAAEE/EAAAAEEEEEAEAEE	s1:i:139	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1778133	83	MT192765.1	15491	48	146M1D5M	=	15485	-158	AACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTCACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTACAACACAGACTTTATGAGTGTCTCTATAGAAATAAGATG	AEEAEEEEEAAAAAA<AEEEEEEEEEEEEEEEEEEAEEEEEEEAEAEEEEEEEEEEEEEEAEEEEAEAEAEEEEEEEEEEEE<AEEEEAAAEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:55	s2:i:139	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.1980512	163	MT192765.1	16801	60	150M	=	16852	202	GTAAAGTACAAATAGGAGAGTACACCTTTGAAAAAGGTGACTATGGTGATGCTGTTGTTTACCGAGGTACAACAACTTACAAATTAAATGTTGGTGATTATTTTGTGCTGACATCACATACAGTAATGCCATTAAGTGCACCTACACTAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAEEEEEEEEEA<EEEEEEAEEEEAAAEEAAE<EEEAAAA<AA<EEEE/AE	s1:i:193	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.1980512	83	MT192765.1	16852	60	151M	=	16801	-202	CTGTTGTTTACCGAGGTACAACAACTTACAAATTAAATGTTGGTGATTATTTTGTGCTGACATCACATACAGTAATGCCATTAAGTGCACCTACACTAGTGCCACAAGAGCACTATGTTAGAATTACTGGCTTATACCCAACACTCAATAT	EEEEEEEEEEAEEEEEEAAEEEEEAAEAEAAAEAEEEAEAEEEAEEEAEEEEEAEAAEEEEAEAEEEEEEEAEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:193	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:12	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2033605	163	MT192765.1	17083	60	149M	=	17101	168	GTACTGGTAAGAGTCATTTTGCTATTGGCCTAGCTCTCTACTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATA	AAAAAEAEEEEEEEEEEEEAEEEEEEEEEEEEE<EEEEEEEEEEE<EEEEEEEEAAEAEEEEEEEEEEEE/EAAEEEA/EEEEEEAE<EEEEEEEEE<AEEEEAAAEAE<EAEEEEEE//</A/AEAAAEA/<E<AEEEAEE<EEEEEE	s1:i:160	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:24	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2033605	83	MT192765.1	17101	48	150M	=	17083	-168	TTGCTATTGGCCTAGCTCTCTACTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAG	AAA<EAA<EEEAAAA/E</EA/E6EAEE/EE/AEA<AAEEAEEA/EE<EEEEEEEEEEEE<AEE/AEEE/EAAEEEAEEAEEEEE<EEE<EEEEEAAEEEEEEEEAEEEEEAEEAEEEEEEEEEAEEEEEEEEEEEEEEEEE/EEAAAAA	s1:i:34	s2:i:160	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2098070	99	MT192765.1	17115	60	151M	=	17270	304	GCTCTCTACTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAGAGTGTTTTGATAAAT	AAAAAEEEEEEE/EE/EA/EEA/EEEEA/<6EEEEEEA<AE/EAEEEE/AAA<A<E<EEA<EEA/EEAEEEEAEEAEE/EAEEEEE//6/6AEEA</EEAEEEEA<EE/AE66AEAEAAAA/<AA<<<<EAAEAEAEE/EA6EEEEEAEEE	s1:i:269	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2064910	99	MT192765.1	17123	60	149M	=	17180	174	CTACCCTTCTGCTCGCATAGTGTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCATATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAGAGTGTTTTGATAAATTCAAAG	AAAAAEA<AEEE/EE/EE/EEE/EEE//E/EA/EEEEEEEEEEE<E/A</</A/AEAEEEE/EAE/AEEE<AAE/E//EA</<//E<A/AE<EAEEE///<EA/E<AEAAAA/A/</EEEEEAE/A/<A/AA/EAE<AE/6/<<A</<A	s1:i:141	s2:i:0	RG:Z:1	NM:i:1	AS:i:288	de:f:0.0067	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.2125592	99	MT192765.1	17180	60	150M	=	17289	245	ATGTGAGAAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAGAGTGTTTTGATAAATTCAAAGTGAATTCAACATTAGAACAGTATGTCTTTTGTACTGTAAATGCATTGCCTGAGACGAC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAEEEEEEEAEEEEA<EEEEEEEAEEEEEEAEEEEEEAAEEEEEE<EE/A/EEEA	s1:i:237	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2064910	147	MT192765.1	17180	48	117M	=	17123	-174	ATGTGAGTAGGCATTAAAATATTTGCCTATAGATAAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAGAGTGTTTTGATAAATTCAAAGTGACTTCAACATTAGAACAGTATGT	<<////A/AAAE<EA///AA<EAAE/EEEEEEEE//E<AE/EEAEA/EAEAEEEEEAAE<AEE6/AE</<E/EEEEEE<EAE<AA6EAEEEE/EE6EEE/A/EEE/AEEE/EAA//A	s1:i:38	s2:i:141	RG:Z:1	NM:i:2	AS:i:214	de:f:0.0171	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:214
+ERR5069949.2098070	147	MT192765.1	17270	60	149M	=	17115	-304	AGTGAATTCAACATTAGAACAGTATGTCTTTTGTACTGTAAATGCATTGCCTGAGACGACAGCAGATATAGTTGTCTTTGATGAAATTTCAATGGCCACAAATTATGATTTGAGTGTTGTCAATGCCAGATTACGTGCTAAGCACTATG	A6A/6EEE6/</6EEAAE//<<EA/EE/EEEAAEA/A<E/A6AAA/AEEA/E</AEAEEAEA<A6EE/EEEE<AAEEEEA/EA<EEEEAEEEEEE<<E/E/AEEEEAAEEEEAAEEEEEEE/EEEEEEEEEEEEEEEEEEEEE/AAAAA	s1:i:269	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2125592	147	MT192765.1	17289	60	136M	=	17180	-245	CAGTATGTCTTTTGTACTGTAAATGCATTGCCTGAGACGACAGCAGATATAGTTGTCTTTGATGAAATTTCAATGGCCACAAATTATGATTTGAGTGTTGTCAATGCCAGATTACGTGCTAAGCACTATGTGTACA	A//<//E//EE</</</<//AA6//</A//EE<E//EEEEEEEEEEEE<EEE/EEAEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:237	s2:i:0	RG:Z:1	NM:i:0	AS:i:272	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:272
+ERR5069949.2185111	163	MT192765.1	17406	60	147M	=	17537	281	GCTAAGCACTATGTGTACATTGGCGACCCTGCTCAATTACCTGCACCACGCACATTGCTAACTAAGGGCACACTAGAACAAGAATATATCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGT	AAAAA6/E//AEEAEEEEE/EEE/E/EEEE/E/E6E///E<E/<EE</A/<EEE//E6/<EEEAE<//<E/E/6EEE/EAAA<//EE//<EEE/AEEAA/A<EAE/EEA<//AEAEEE/<A/E</AA<EEA<<EAAEA<<</E/EEE	s1:i:248	s2:i:0	RG:Z:1	NM:i:2	AS:i:274	de:f:0.0136	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:274
+ERR5069949.2151832	163	MT192765.1	17416	60	150M	=	17453	187	ATGTGTACATTGGCGACCCTGCTCAATTACCTGCACCACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTG	AAAAAEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEAEEEEEAAEEEEEEEEEAAEAAA<<EAAEEEEEEEAAA<<<AE	s1:i:183	s2:i:47	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2176303	99	MT192765.1	17442	60	151M	=	17519	227	TTACCTGCACCACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTT	AAAAAEEEEEEAEEEEEEAEEEEEEEEEEEEEEAEAEEEEEAEEEEEEEEEAEEEEEEEEEEE<EEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEE<A<AEEEAE/EE<EEEEAAAAAAEEAA<AAAEEEEEE<EEEEEEEEAEE<EEEEA	s1:i:217	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2151832	83	MT192765.1	17453	60	150M	=	17416	-187	ACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGA	AAAA<EEEEEEAEEEAEAAAAEEEEEEEEEAAAEE<EEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAA	s1:i:183	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2205229	99	MT192765.1	17476	60	137M1D13M	=	17585	259	CACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGATAATAAGCTTAAGCACATAAAGA	AAAAAEEEEEEA/EEEEEEEAEEEEEEEEEAEAEAEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEAEEEEEEAEEEEEE6EE<E<<EEEAEEEEEEEEEEEEEEEEA<AEEEEEEAAAEEEEEEEEEEEEEEEE<EAAEEEEE	s1:i:252	s2:i:0	RG:Z:1	NM:i:1	AS:i:286	de:f:0.0066	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:286
+ERR5069949.2161340	99	MT192765.1	17482	60	80M	=	17482	82	AACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGT	A/AA//EEAEA/E/AEEEE6EE/EEEA/6AEEEEEEEEE6EEEAEAEE//A/EEEEEE//E/E/A//E/E/<<EE</E/E	s1:i:69	s2:i:0	RG:Z:1	NM:i:0	AS:i:160	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:160
+ERR5069949.2161340	147	MT192765.1	17482	55	82M	=	17482	-82	AACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCC	A//E/<EAEA/EE/EEEA/<AE<AE/AEA/EEEAE/EEE//EEE6////EEEEAEAE///EE//</E/E</AE/6EAAA6AA	s1:i:69	s2:i:0	RG:Z:1	NM:i:0	AS:i:164	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:164
+ERR5069949.2216307	163	MT192765.1	17504	60	145M	=	17601	244	GTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGATGACACTGTGAGTGCTTTGGTTTATGATAATAAGCTTAAAGCACATAAAGACAAATCAGCTCAATGCTTTAAA	A//AA6EEE/EEEEAEEEEE/EE/A/6E/EAE</EE/AEE/AEEAEAE/E//AA<6AA6<E/EE/EEE/AE/6/EEE<//E/E6A/<E/////EAE<E<</AAAEE//AAEE/6AAA6/AEA/AAEEAAAA////AA/6E<6AAE	s1:i:218	s2:i:0	RG:Z:1	NM:i:1	AS:i:280	de:f:0.0069	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:280
+ERR5069949.2176303	147	MT192765.1	17519	60	150M	=	17442	-227	AACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGATAATAAGCTTAAAGCACATAAAGACAAATCAGCTCAATGCTTTAAAATGTTTTATAAGGGTGTTAT	EAEAEEEAEAEAAAEEEEEEAAA<AAAA<A/EEAEE/A</EAAEEAAEEEEEEEAEEEEEEAEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAEAEAEEAEAEEEEEAAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAAAAA	s1:i:217	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2185111	83	MT192765.1	17537	60	150M	=	17406	-281	GTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGATAATAAGCTTTAAGCACATAAAGACAAATCAGCTCAATGCTTTAAAATGTTTTATAAGGGTGTTATCACGCATGATGTTTCATC	AEA/EE<EEA/E<A<<EA//6<6A/EEEEEEEEEAAAEE<A//A/AEEA/AEAEE/EEAEAEEEEEE//EA/AAEA/EEE</E/EAEE/<A<E/A/<AAE//AA<EEE/EEEE/<EEEA/EEE<<AEEEEEEAEEEAAAAAEAEE6AAAA	s1:i:248	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.2205229	147	MT192765.1	17585	60	28M1D121M	=	17476	-259	GAGTGCTTTGGTTTATGATAATAAGCTTAAGCACATAAAGACAAATCAGCTCAATGCTTTAAAATGTTTTATAAGGGTGTTATCACGCATGATGTTTCATCTGCAATTAACAGGCCACAAATAGGCGTGGTAAGAGAATTCCTTACACG	A<A<<EE/EEE/EEAEEAEA/AAAAEEEAEEA<EEEAA<//<E<<EEAEEE<E/EEE/EEEEEEEEEEEAEEEAE/E/E/EEEAEEEEEEEEAEEEAEEE/EEEEEEE/AEEEEEE<EEEEEEEEEE/AEEAEEEEEEEAEEEEAAAAA	s1:i:252	s2:i:0	RG:Z:1	NM:i:1	AS:i:284	de:f:0.0067	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2216307	83	MT192765.1	17601	60	147M	=	17504	-244	GATAATAAGCTTAAAGCACATAAAGACAAATCAGCTCAATGCTTTAAAATGTTTTATAAGGGTGTTATCACGCATGATGTTTCATCTGCAATTATCAGGCCACAAATAGGCGTGGTAAGAGAATTCCTTACACGTAACCCTGCTTGG	AEA/AAAE<<A<6AEAAAA<//A6A6A/EEEAAA<A<<A/<<AA</<EEE<AEE<<EAE//AEAEEEEEEAE/AEAEEEEEE/EEAE</<<EAE/EEA/A<E6EE/E6<<EAE</EEEEEEEEEE/AEEE6EA/EEE/AAEA/EAAA	s1:i:218	s2:i:0	RG:Z:1	NM:i:1	AS:i:284	de:f:0.0068	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2243023	163	MT192765.1	17713	60	151M	=	17854	291	TGGTAAGAGAATTCCTTACACGTAACCCTGCTTGGAGAAAAGCTGTCTTTATTTCACCTTATAATTCACAGAATGCTGTAGCCTCAAAGATTTTGGGACTACCAACTCAAACTGTTGATTCATCACAGGGCTCAGAATATGACTATGTCAT	AAAAAEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEEEEEEEEEEEE/EEEEAAEEAEEA<EEEEEEEEEEEEEEAEEEEE<<AA6AAEEEAEE	s1:i:273	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2243023	83	MT192765.1	17854	60	150M	=	17713	-291	ACTATGTCATATTCACTCAAACCACTGAAACAGCTCACTCTTGTAATGTAAACAGATTTAATGTTGCTATTACCAGAGCAAAAGTAGGCATACTTTGCATAATGTCTGATAGAGACCTTTATGACAAGTTGCAATTTACAAGTCTTGAAA	EE<EAEEAE<EA<E/EEAEAEEEE<EEEEAA<AEEEEEEEEEAAEAEEE<EEEEAEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:273	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2270078	163	MT192765.1	17970	60	151M	=	18103	284	CTTTATGACAAGTTGCAATTTACAAGTCTTGAAATTCCACGTAGGAATGTGGCAACTTTACAAGCTGAAAATGTAACAGGACTCTTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEAEEEEEEEEEEEAEEAAEEEEAEEEEEAEEAEEEAAEAEEEEEAEAEEAEA<EAA	s1:i:275	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2257580	99	MT192765.1	17980	60	151M	=	18039	209	AGTTGCAATTTACAAGTCTTGAAATTCCACGTAGGAATGTGGCAACTTTACAAGCTGAAAATGTAACAGGACTCTTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTGACACTAAATT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEEEEEEEEEAEEEEEEAEEEEEEEEEEEEAEEEEEEEEEAEEEEEEAEE/EEAEA<EAEEEAEEEEEEEEE<EEAAAEAEEE<EA	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2257580	147	MT192765.1	18039	60	150M	=	17980	-209	AATGTAACAGGACTCTTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTGACACTAAATTCAAAACTGAAGGTTTATGTGTTGACATACCTGGCATACCTAAGGACATGACCTATAGA	EEEEEEAEEAAEEEEAAAAEEEEEEAEEEEAEAEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:196	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2270078	83	MT192765.1	18103	60	151M	=	17970	-284	CTACACACCTCAGTGTTGACACTAAATTCAAAACTGAAGGTTTATGTGTTGACATACCTGGCATACCTAAGGACATGACCTATAGAAGACTCATCTCTATGATGGGTTTTAAAATGAATTATCAAGTTAATGGTTACCCTAACATGTTTAT	AEEEEEEAEEA//EEEEAEEEEEEEEEA<EEA6<A<A<AEEE<AEE<</EEEEEAEEEEE<EEEEEEAEEE/EEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:275	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:23	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2328704	163	MT192765.1	18286	60	150M	=	18412	276	CATGGATTGGCTTCGATGTCGAGGGGTGTCATGCTACTAGAGAAGCTGTTGGTACCAATTTACCTTTACAGCTAGGTTTTTCTACAGGTGTTAACCTAGTTGCTGTACCTACAGGTTATGTTGATACACCTAATAATACAGATTTTTCCA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEAEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEAEEEAEAEEEA<AEEEEEEEE/EEEEEEEAEEEEEEE/EEEEAEE	s1:i:264	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2342766	99	MT192765.1	18397	60	151M	=	18469	222	CAGGTTATGTTGATACACCTAATAATACAGATTTTTCCAGAGTTAGTGCTAAACCACCGCCTGGAGATCAATTTAAACACCTCATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAAGTGA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEAEAEAAAAEAEEEAEEAEEE/EEEE<AAEEAEA	s1:i:215	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2328704	83	MT192765.1	18412	60	150M	=	18286	-276	CACCTAATAATACAGATTTTTCCAGAGTTAGTGCTAAACCACCGCCTGGAGATCAATTTAAACACCTCATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAAGTGACACACTTAAAAATC	AEAAEAEEEEEEAAEEEEEEEAEE/EAAEE</EAAAEAEEAEAEEEEEE/AEA<EEEEEAEAAEEEEEAAEEEEEEE/E/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:264	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2361683	99	MT192765.1	18426	60	149M	=	18513	235	GATTTTTCCAGAGTTAGTGCTAAACCACCGCCTGGAGATCAATTTAAACACCTCATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAAGTGACACACTTAAAAATCTCTCTGACAGAGT	AAAAA/AA/EAEEAAEEEEAEE/EAEAAA<EEAAEEEEEE/EEEEEEEEEAAEEE/AEEE/EEEA<EEEAEEEEE<AAA<EEEEEEEEEAAEEA<AEEEEEEEEE<EEE/<AEEEEEAEEA<//A<EEAAEE6EAEEAAAEEAA6AEEA	s1:i:227	s2:i:0	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2342766	147	MT192765.1	18469	60	150M	=	18397	-222	TTAAACACCTCATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAAGTGACACACTTAAAAATCTCTCTGACAGAGTCGTATTTGTCTTATGGGCACATGGCTTTGAGTTGACATCTATGA	EAEAEAEEEEA<<AAAAEEEAEEEEEAEEEEEEEEEEEAAAEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:215	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2361683	147	MT192765.1	18513	60	148M	=	18426	-235	GTAGTGCGTATAAAGATTGTACAAATGTTAAGTGACACACTTAAAAATCTCTCTGACAGAGTCGTATTTGTCTTATGGGCACATGGCTTTGAGTTGACATCTATGAAGTATTTTGTGAAAATAGGACCTGAGCGCACCTGTTGTCTAT	A<6EA</AEEAA/<AEE<EEEEAAAA//E<A/EAEE6EAA/AA/E/</EAEEEEEE/AEEEE/EAE/6EEE/EE<EA6<</E<E/AE/AAEE/EEE<EEEE</E/EEEEEEEEEEE<EAEEEEEEA/EEEEEEAEE/EEAEEAAAAAA	s1:i:227	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.2415814	99	MT192765.1	18598	60	150M	=	18765	318	GCTTTGAGTTGACATCTATGAAGTATTTTGTGAAAATAGGACCTGAGCGCACCTGTTGTCTATGTGATAGACGTGCCACATGCTTTTCCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATC	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEAEEEEEEAEEEEEEEAEEEAEEEEEAEEEEEAEAAAE<AEA	s1:i:258	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2385514	99	MT192765.1	18603	60	150M	=	18685	232	GAGTTGACATCTATGAAGTATTTTGTGAAAATAGGACCTGAGCGCACCTGTTGTCTATGTGATAGACGTGCCACATGCTTTTCCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTT	A/AAAE6EEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EE<EEEEE/A/AE6EEA/E/EEEEEA/AEEEEEEE<EEEAAEEEEEEE/EAEEEEEE/EEAAAEEE/<<AEEEEEEAEAEA<E/AAAAAAAA/<EE/EEAEEAE	s1:i:199	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2417063	99	MT192765.1	18649	60	150M	=	18766	267	CCTGTTGTCTATGTGATAGACGTGCCACATGCTTTTCCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTTATGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCA	AAAAAEEEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEE/EEEEAEEEEE<EEEEAEE<AAEAEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEAEAEEE<AEE<EAE/EEA<EEEEAEEEAEEEEEEEA/EEEAE/AEEEA	s1:i:245	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2388984	99	MT192765.1	18654	60	150M	=	18694	189	TGTCTATGTGATAGACGTGCCACATGCTTTTCCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTTATGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCAT	AAAAAEEEEEEEEE/EEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEEEAE/EAEEEEEEEEEEAEEEE/EEEEEEEEAEEAEEAEEEEEAEEEEAEEEE<A/EA<E	s1:i:161	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2385514	147	MT192765.1	18685	60	150M	=	18603	-232	CCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTTATGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCCATGGTAATGCAC	AA<EAAE<AA<EEAAEE/AAEEAAEE<<AAEEEAEAEAEEEAEEEEAEEEEAAEEEEE/EAEAEEEEAAEEEEEEEEEEAAEEEEAAEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEAAA/A	s1:i:199	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.2388984	147	MT192765.1	18694	60	149M	=	18654	-189	CAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTTATGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCCATGGTAATGCACATGTAGCT	EAAAA6EEE<EEAEEEAAAAEE/AE<AAAAEAEEEAAEEEEEAEEEEEEEEAEAA<<EEEEEEAAAEEEEAEEEA<EE<EE</EEEEEAEEEEEEEEEEEEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:161	s2:i:0	RG:Z:1	NM:i:1	AS:i:288	de:f:0.0067	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.2431709	99	MT192765.1	18749	60	72M1D78M	=	18777	180	GTTTATGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCATGGTAATGCACATGTAGCTAGTTGTGATGCAATCATGACTAGGTGTCTAGCTGTCCACGAGTGCTTTGTTAAGCGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAAEAEAEE<AEEEEAEEAA<AEA	s1:i:148	s2:i:0	RG:Z:1	NM:i:2	AS:i:276	de:f:0.0132	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:276
+ERR5069949.2415814	147	MT192765.1	18765	60	151M	=	18598	-318	CAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCCATGGTAATGCACATGTAGCTAGTTGTGATGCAATCATGACTAGGTGTCTAGCTGTCCACGAGTGCTTTGTTAAGCGTGTTGACTGGACTATTG	AEEEA<EEEEE/EEEEAEEEEEEEEEEEEAEAEAAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:258	s2:i:0	RG:Z:1	NM:i:1	AS:i:292	de:f:0.0066	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.2417063	147	MT192765.1	18766	60	150M	=	18649	-267	AACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCCGTATTGTCAAGTCCATGGTAATGCACATGTAGCTAGTTGTGATGCAATCATGACTAGGTGTCTAGCTGTCCACGAGTGCTTTGTTAAGCGTGTTGACTGGACTATTG	AEEA<EEEEE//<EAEAEE/EAEAE/E<A6AAAAEEE/EAEE/EEAEAAEEEEEAEEEEEAEEEEEEEAEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEE6E/EEEEEEEEEAAAAA	s1:i:245	s2:i:0	RG:Z:1	NM:i:2	AS:i:280	de:f:0.0133	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:280
+ERR5069949.2431709	147	MT192765.1	18777	60	44M1D107M	=	18749	-180	TTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCATGGTAATGCACATGTAGCTAGTTGTGATGCAATCATGACTAGGTGTCTAGCTGTCCACGAGTGCTTTGTTAAGCGTGTTGACTGGACTATTGAATATCCTATAAT	AAAAAAAE<AAAAEEEEAEAAAEEAEEEEEEEEEEEEEEEEEEAEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEAEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:148	s2:i:0	RG:Z:1	NM:i:2	AS:i:278	de:f:0.0132	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.2521353	99	MT192765.1	19597	60	150M	=	19698	251	CTTTTACAAGACTTCAGAGTTTAGAAATTGTGGCTTATAATGTTGTAATTAAGGGACACTTTGATGGACAACAGGGTGAAGTACCAGTTTCTATCATTAATAACTCTGTTTACACAAAAGTTGATGGTGTTGATGTAGAATTGTTTGAAA	AAA/AE/6E6EEEEAEE/EE/EEE/EE/EA/EAEA//EEEEE6EAEAE/EEEEEE/EAE////EEA/EEEEEEEEEEEEEE///A/EEAEEEEEEEE<AEAEEE/AE/E<E/EEEEEA/E///AE/66AEEAEEE<E//E/EA/A<6AEE	s1:i:175	s2:i:0	RG:Z:1	NM:i:4	AS:i:260	de:f:0.0267	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:260
+ERR5069949.2521353	147	MT192765.1	19698	60	150M	=	19597	-251	ATCACTGTTTTCACAAAAGTTGATGGTGTTGATGTAGAATTGTTTGAAAATAAAACAACATTACCTGTTAATGTAGCTTTTGTGCTTTGGGCTAAGCGCAACATTAAACCAGTACCAGAGGTGAAAATACTCAATAATTTGGGTGTGGAC	A//A</</EE/A<AEEA//E<EEE/E<A/<<A///<6EAEEEEE/AAA</A//<<EA/EEA//</AA6EEAE</EEA//AEE//</AEEAE/EEEA/A/EEEE//E/EAA/EEE/AEE<EEE<EE/EAEEEEE6EEEE/EEEEEEAAAAA	s1:i:175	s2:i:0	RG:Z:1	NM:i:4	AS:i:266	de:f:0.0267	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:266
+ERR5069949.2605155	99	MT192765.1	21717	60	146M	=	21726	159	GTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEE/EEEEEEEEEEEEEE<EEAEEEAEAEAEEEEEEEEAAEEEEE<EEAEAEEEAA<E<EAAE</E/AA	s1:i:148	s2:i:0	RG:Z:1	NM:i:0	AS:i:292	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:292
+ERR5069949.2605155	147	MT192765.1	21726	60	150M	=	21717	-159	TTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGG	A/EEEE/EEAEAEEEEEAEEAEEEAAAEEEAEEEEEEAEE/EEEAEAEAEEEEEEAEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:148	s2:i:30	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2650879	163	MT192765.1	22659	60	151M	=	22710	201	ATATAATTCCGCATCATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTATGCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATT	AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEAEEE<A<EEEEAAAAEEEEEEEEEEEE	s1:i:192	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2650879	83	MT192765.1	22710	60	150M	=	22659	-201	TAAATTAAATGATCTCTGCTTTACTAATGTCTATGCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTGATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGC	EAEEEAEE<EEE/EEEEEEAEEEEEEEEEEEA<AAEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:192	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2668880	99	MT192765.1	23125	60	147M	=	23146	171	GTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAACAAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTCTGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAE<EEAEEEEA<EAA<AAAEEEEEE<AEA<EAAEE	s1:i:158	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.2674295	163	MT192765.1	23134	60	150M	=	23204	218	CCTAAAAAGTCTACTAATTTGGTTAAAAACAAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTCTGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTC	AAAAAEEEEEEEEEEAEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEE<EEEEEEAEEE/EEEEEEE<EEAEEAEEEEEAEEAEE<EEAEE<E/AAAAAAE<AEAAAEEEEAEEAEAEE/EEAAAEE	s1:i:209	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2668880	147	MT192765.1	23146	60	150M	=	23125	-171	ACTAATTTGGTTAAAAACAAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTCTGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAG	AE/EAEAEAEEEEEEA<A6EEEEEE<EAAEEEAEEEEEEEAEEEEEAEEEEEEEEEEEEAEEAEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:158	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2674295	83	MT192765.1	23204	60	148M	=	23134	-218	GTGTTCTTACTGAGTCTAACAAAAAGTTTCTGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGAGATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAAC	EEEAEEEEEEEEEEEE<EEEEAA<EEEAA<AAEEE/EAAE<AAAEA<EEEEEEE<EEEEEE<EEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:209	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.2730382	163	MT192765.1	23528	60	142M	=	23528	142	ACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGT	AAAAAEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAE<EE/EEEEEEE/EEAEEEEEEEEEEEEEEEEEEAEEEEA<AEA<<EA<A<AEEEEEA<EAE<66A/AEEEEEEEAE<AAEA	s1:i:143	s2:i:0	RG:Z:1	NM:i:0	AS:i:284	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2730382	83	MT192765.1	23528	48	142M	=	23528	-142	ACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGT	A<AA<A<EEEAAA/A<AEAEAEA<EAA<<AEA<EEEAAAEE<EEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEE<EEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEE/EAAAAA	s1:i:48	s2:i:143	RG:Z:1	NM:i:0	AS:i:284	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:284
+ERR5069949.2734474	81	MT192765.1	23547	1	149M	=	23548	-148	ACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCA	AA/EEA/EAAAA<AAEEEEAAEEEEEEE<A/EEAEE<AEEEEEEEEAEEEEAEAAEAAEE/EEAAEEE/AEA/EEE/E/EEEEEEEEE/EEEEEEEEAEE/EEEE/EEEEEAEEEEEEEEEEEEEEEEE//EEEEAEEEEEEEAAA/AA	s1:i:58	s2:i:136	RG:Z:1	NM:i:0	AS:i:298	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:298
+ERR5069949.2734474	161	MT192765.1	23548	60	148M	=	23547	148	CCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCA	AAAA/EEEEEEEEE/E/EE6EEEEAEEEEEEAEEEEE/EEEEEEEEEEEAE/EAEE/EEEEEAE/EE<EAEEEEEEA/E<EEEEAE/EA<EEEEAEE/E/EE<EEEEE</EE/E//<<<AA6A<A<A/<AE/AE/EEEA6<A6A/</A	s1:i:136	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:20	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.2734873	163	MT192765.1	23550	60	98M	=	23550	98	CATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE	s1:i:92	s2:i:0	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.2734873	83	MT192765.1	23550	48	98M	=	23550	-98	CATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGT	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:25	s2:i:92	RG:Z:1	NM:i:0	AS:i:196	de:f:0	rl:i:0	cm:i:4	nn:i:0	tp:A:P	ms:i:196
+ERR5069949.2772897	163	MT192765.1	23809	60	150M	=	23876	219	CTTTCGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAATAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEA<EAEEEE<EEEEAEEAAEEEEEEEEEEE	s1:i:199	s2:i:0	RG:Z:1	NM:i:1	AS:i:290	de:f:0.0067	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:290
+ERR5069949.2772897	83	MT192765.1	23876	60	144M1D7M	=	23809	-219	AAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCATTTATTGAAGATCTACTTTCAAC	AEEEEEE<AAEEAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:199	s2:i:0	RG:Z:1	NM:i:1	AS:i:294	de:f:0.0066	rl:i:0	cm:i:13	nn:i:0	tp:A:P	ms:i:288
+ERR5069949.2787556	99	MT192765.1	24088	60	106M	=	24088	106	GCTGCTAGAGACCTCGTTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAE	s1:i:78	s2:i:0	RG:Z:1	NM:i:1	AS:i:202	de:f:0.0094	rl:i:0	cm:i:10	nn:i:0	tp:A:P	ms:i:202
+ERR5069949.2787556	147	MT192765.1	24088	50	106M	=	24088	-106	GCTGCTAGAGACCTCGTTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGG	EEAAEEEEEEEEA<EEEE<AAA<EEEEEAEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:78	s2:i:0	RG:Z:1	NM:i:1	AS:i:202	de:f:0.0094	rl:i:0	cm:i:1	nn:i:0	tp:A:P	ms:i:202
+ERR5069949.2832676	99	MT192765.1	24409	60	139M	=	24409	139	GTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCA	AAAA6EEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<E/EAEEAEEEAEEAEEEEEAEEEEEEEEEEEEEEAEEAEEEEEAAEEEEEEA<AEEEAAAAEEEEE<EEAAAEEAEEAAEEEEA	s1:i:132	s2:i:0	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:18	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.2832676	147	MT192765.1	24409	48	139M	=	24409	-139	GTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCA	A<EEEE</EAEA6EEA</AEEEEAEEEAAE/EEAEE<A<AAAEEEEAAEEE/EEEEEEEEAEEAEEAA<EEEEEEEA<EEEAEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEAA6AA	s1:i:37	s2:i:132	RG:Z:1	NM:i:0	AS:i:278	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:278
+ERR5069949.2888794	163	MT192765.1	24758	60	150M	=	24853	246	TCCCTGCACAAGAAAAGAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTTTCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAG	AAAAAEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEAEE<<<6AEE</AAAEEEEEEEAA<EEAAEA	s1:i:231	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:25	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.2888794	83	MT192765.1	24853	60	151M	=	24758	-246	ACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACACATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACCTGAATTAGACTCATTCAAGG	AAEAAAEEEEEEEEEEEEEEEAAAEEEEAAEEAAAAEEEEAEEEEEEEEE/EEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:231	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:16	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2953930	99	MT192765.1	25345	60	151M	=	25465	268	GTGCTCAAAGGAGTCAAATTACATTACACATAAACGAACTTATGGATTTGTTTATGAGAATCTTCACAATTGGAACTGTAACTTTGAAGCAAGGTGAAATCAAGGATGCTACTCCTTCAGATTTTGTTCGCGCTACTGCAACGATACCGAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEE/EEEEEE<AA/EA<A<AAAA<AAEAAEAEE/E<<E<EAAEA	s1:i:256	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:22	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.2972968	163	MT192765.1	25426	60	147M	=	25519	234	CTTTGAAGCAAGGTGAAATCAAGGATGCTACTCCTTCAGATTTTCTTCGCGCTACTGCAACGATACCGATACAAGCCTCACTCCCTTACGGATGGCTTATTGTAGGCGTTGCACTTCTAGCTGTTTTTCAGAGCGCTTCCAAAAACA	AAAAAEEEEEEA/E/EEEEEEE/EEEEEEEAEEEEE/EEE/E/E/<66<6/EEEE///EEE<<E/<AEAEEEE/EEEEE6A/EEA/E//AE<EEE</6E/E/E/A<AAEA/A//AE<A/E<EE//6A/A<AAA6<E<<A66</E/AA	s1:i:188	s2:i:0	RG:Z:1	NM:i:5	AS:i:248	de:f:0.034	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:248
+ERR5069949.2953930	147	MT192765.1	25465	60	148M	=	25345	-268	ATTTTGTTCGCGCTACTGCAACGATACCGATACAAGCCTCACTCCCTTTCGGATGGCTTATTGTTGGCGTTGCACTTCTTGCTGTTTTTCAGAGCGCTTCCAAAATCATAACCCTCAAAAAGAGATGGCAACTAGCACTCTCCAAGGG	EEEEEE/EEEEEEEEEEAAEEEEAAAAAEEEEEE/EEEEEEAAEEEEAEEEEEEEAEEEEEEEEEAEEEE/EEAEEEEEEEEEEAAEEEEEEAEEEE/EEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:256	s2:i:0	RG:Z:1	NM:i:0	AS:i:296	de:f:0	rl:i:0	cm:i:15	nn:i:0	tp:A:P	ms:i:296
+ERR5069949.2972968	83	MT192765.1	25519	60	141M	=	25426	-234	GTCTTATTGTTGGCGTTGCACTTCTTGCTGTTTTTCAGAGCGATTCCAAAATCATAACCCTCAAAAAGAGATGGCAACTAGCACTCTCCAAGGGTGTTCACTTTGTTTGCAACTTGCTGTTGTTGTTTGTAACAGTTTACT	6//A//A/EE/EE/A//6/E/EEEEEE66AAE//EEAEE/AE//EEEE//EA/A///<E/E/E/EE<E<E/EE/<E</EEEEAEEEEEEEAE<A/AEE//EEEEE/E//EEEEEEEE6E/EA/EA6EEEE//EEEEAAAAA	s1:i:188	s2:i:0	RG:Z:1	NM:i:2	AS:i:268	de:f:0.0142	rl:i:0	cm:i:14	nn:i:0	tp:A:P	ms:i:268
+ERR5069949.3017828	99	MT192765.1	26176	60	107M	=	26177	107	ATGATGAACCGACGACGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAAT	AAAAAE6EEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE	s1:i:96	s2:i:0	RG:Z:1	NM:i:0	AS:i:214	de:f:0	rl:i:0	cm:i:11	nn:i:0	tp:A:P	ms:i:214
+ERR5069949.3017828	147	MT192765.1	26177	48	106M	=	26176	-107	TGATGAACCGACGACGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAAT	A/EAAEEEAEEAE<E</EEEEEEEEEAE<EEEEEEAE<EE/E<EEEEEEEEEEEE<EEAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAAAA	s1:i:37	s2:i:96	RG:Z:1	NM:i:0	AS:i:212	de:f:0	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:212
+ERR5069949.3022231	99	MT192765.1	26228	60	147M	=	26228	147	GTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATAT	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEAAAAEEEEEEEEAEEE	s1:i:139	s2:i:0	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.3022231	147	MT192765.1	26228	48	147M	=	26228	-147	GTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATAT	EAAAEEEEEEAEEEEE<EEEEAE<EEAAEAAEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:34	s2:i:139	RG:Z:1	NM:i:0	AS:i:294	de:f:0	rl:i:0	cm:i:6	nn:i:0	tp:A:P	ms:i:294
+ERR5069949.3057020	99	MT192765.1	26621	60	86M9S	=	26621	86	CAATTTGCCTATGCCAACAGGAATAGGTTTTTGTATATAATTAAGTTAATTTTCCTCTGGCTGTTATGGCCAGTAACTTTAGCTTGGTTGTACGC	AAAAAEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAE	s1:i:71	s2:i:0	RG:Z:1	NM:i:0	AS:i:172	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:172
+ERR5069949.3057020	147	MT192765.1	26621	51	86M9S	=	26621	-86	CAATTTGCCTATGCCAACAGGAATAGGTTTTTGTATATAATTAAGTTAATTTTCCTCTGGCTGTTATGGCCAGTAACTTTAGCTTGGTTGTACGC	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:71	s2:i:33	RG:Z:1	NM:i:0	AS:i:172	de:f:0	rl:i:0	cm:i:2	nn:i:0	tp:A:P	ms:i:172
+ERR5069949.3122970	163	MT192765.1	26996	60	126M	=	26996	127	ATCAAGGACCTGCCTAAAGAAATCACTGTTGCTACATCACGAACGCTTTCTTATTACAAATTGGGAGCTTCGCAGCGTGTAGCAGGTGACTCAGGTTTTGCTGCATACAGTCGCTACAGGATTGGC	AAAAAEE6EEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEE/EEEEEAE<EEAEAEEEEEEEEEAAEEEEAEAEEE/AEEE<A<A/AAAAAE/E<A66AEEEEEEEEEEEAE<</6AA<A/6/EA	s1:i:119	s2:i:0	RG:Z:1	NM:i:0	AS:i:252	de:f:0	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:252
+ERR5069949.3122970	83	MT192765.1	26996	48	127M	=	26996	-127	ATCAAGGACCTGCCTAAAGAAATCACTGTTGCTACATCACGAACGCTTTCTTATTACAAATTGGGAGCTTCGCAGCGTGTAGCAGGTGACTCAGGTTTTGCTGCATACAGTCGCTACAGGATTGGCA	A//6AAAEAEEA/AAEEEEEEAAE/EEE//A<EEEEEEEEEAEEE/EEAAEEAEEEE/<EEAEEEEEAEEAEEAEEEEEEEEA<EAEEAEAEAEEA6EEEEEEEEEEEEEAEEEAEEEEEEEA/AAA	s1:i:52	s2:i:119	RG:Z:1	NM:i:0	AS:i:254	de:f:0	rl:i:0	cm:i:9	nn:i:0	tp:A:P	ms:i:254
+ERR5069949.3184655	163	MT192765.1	27311	60	150M	=	27352	191	TTTATCTAAGTCACTAACTGAGAATAAATATTCTCAATTAGATGAAGAGCAACCAATGGAGATTGATTAAACGAACATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAG	AAAAAEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEE<EEEEE/EEEEEAEAEEEEE/EEEAEEE<EEEEEE<EEAAEEAEEEEEAAAEEE/E<AAEEAAAE6A/A<<A<AAAEE/AA6AE/A	s1:i:185	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3184655	83	MT192765.1	27352	60	150M	=	27311	-191	ATGAAGAGCAACCAATGGAGATTGATTAAACGAACATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAGAGGTACAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAA	AAAE6E</EA6<A6/A/E6A</EEE<EEA///E/A<<</AEEEE<E<EEEEEEEEEE/E<E/EE/A<AEEAEAE/EEEEEEEAEEEEEEEEEEEEE/AEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEAAAAA	s1:i:185	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:8	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3249622	163	MT192765.1	28218	38	116M	=	28372	231	ATCATGACGTTCGTGTTGTTTTAGATTTCATCGAAACGAACAAACAAAAATGTCTGATAATGGACCCCAAAATCATCGAAATGCACCCCGCATTACGGTTGGTGGACCCTCCGATT	AAA/AE//EEE/EE6AE/A</EE//6AE6EE//EE/AE//A/EE//EEEE<EAA/EE//<E/A/E/EE//E/A/E/E//EE/<A/A<EE/A//</EE//E/E//A/EEE///A//6	s1:i:97	s2:i:0	RG:Z:1	NM:i:5	AS:i:182	de:f:0.0431	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:182
+ERR5069949.3249622	83	MT192765.1	28372	37	77M	=	28218	-231	CGATAAAAACAAGGTCGGCCCCAAGGTTTACCCATTAATACTGCGTCTTGGTTCACCGCTCTCACTCAACATGGCAA	E/E///<E<<////AE/EEA/EEEEEE/EEEEE//A//E/EEEEEEEE/EEE/EE/EAEEAEEEAEE/AE/EAAAAA	s1:i:97	s2:i:0	RG:Z:1	NM:i:3	AS:i:124	de:f:0.039	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:124
+ERR5069949.3273002	163	MT192765.1	28443	60	150M	=	28544	249	TGGCAAGGAAGACCTTAAATTCCCTCGAGGACAAGGCGTTCCAATTAACACCAATAGCAGTCCAGATGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATTCGTGGTGGTGACGGTAAAATGAAAGATCTCAGTCCAAGATGGTA	AAAAAEEEEEEEEEEEEEEEEEEAEEAEEEEEEEAEEE/EEA/EEEEEE6EEEAEEEEEEEEEAEEEEAEEEEE<EEEE<EEE/EE//AE/EEAEAEAE/EAAEEA6AEEE/<E<</EEE/E</AEA//A/EA6<AAEEEA/AEE6AE/E	s1:i:235	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:19	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3277445	99	MT192765.1	28509	60	151M	=	28574	166	TGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATTCGTGGTGGTGACGGTAAAATGAAAGATCTCAGTCCAAGATGGTATTTCTACTACCTAGGAACTGGGCCAGAAGCTGGACTTCCCTATGGTGCTAACAAAGACGGCATCATA	AA/AAEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEE/AEEEAEEEAAEEAEEE<E/EEEEEEEEAEEEEA/EEEEEEEAAA//EEEEEAEA<AEEAAEAEAEAEAEEEEAEEEEEEA/EEAA	s1:i:154	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:21	nn:i:0	tp:A:P	ms:i:302
+ERR5069949.3273002	83	MT192765.1	28544	60	148M	=	28443	-249	GAATGGGTGGTGGTGACGGTAAAATGAAAGATCTCAGTCCAAGATGGTATTTCTACTACCTCGGAACTGGGCCAGAAGCTGGACTTCCCTATGGTGCTAACAAAGACGGCATCATATGGGTTGCAACTGAGGGAGCCTTGAATACACC	AE/A//A/E</EE/AEAE<EEEA//A6E6/E/AA<EE<<<E/AEAAAEE<//EAEEE<EA</E/E/A/EEEE</EEAEA/EAEEE<EEEEEEEEEEEEAEAEEEAEEEEEEEEEAE/EEEE/EEEEEEEEEEEEEEEEEEEEAAAAAA	s1:i:235	s2:i:0	RG:Z:1	NM:i:3	AS:i:274	de:f:0.0203	rl:i:0	cm:i:17	nn:i:0	tp:A:P	ms:i:274
+ERR5069949.3277445	147	MT192765.1	28574	57	101M	=	28509	-166	ATCTCAGTCCAAGATGGTATTTCTACTACCTAGGAACGGGGCCAGAAGCGGGACTTCCCTATGGTGCTAACAAAGACGGCATCATATGGGTTGCAACTGAG	</</A///EA<//</<AE/EA/AE<E//</E/A/<///EE/E///E/E//E/E/A/EAAE<EEA/A//EAEE6//EEEEEEEEEEEEEEE6EEEEEAAAAA	s1:i:154	s2:i:0	RG:Z:1	NM:i:2	AS:i:182	de:f:0.0198	rl:i:0	cm:i:3	nn:i:0	tp:A:P	ms:i:182
+ERR5069949.3338256	163	MT192765.1	29431	60	150M	=	29452	172	CAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCG	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEAEEEEEEEEAAAAAEA<AAAEA<AA	s1:i:163	s2:i:0	RG:Z:1	NM:i:0	AS:i:300	de:f:0	rl:i:0	cm:i:25	nn:i:0	tp:A:P	ms:i:300
+ERR5069949.3338256	83	MT192765.1	29452	60	151M	=	29431	-172	CCTGCTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCT	AEEEEEEEEEEEA<AEEAEEEEEEAA<EEEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAAE<EEEEEAEEEEEEEEEEAEEA/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	s1:i:163	s2:i:0	RG:Z:1	NM:i:0	AS:i:302	de:f:0	rl:i:0	cm:i:5	nn:i:0	tp:A:P	ms:i:302

From 80aaf333dc1245b8bbec9aa7b628d85c5e413f4b Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Fri, 13 Sep 2024 09:10:30 +0200
Subject: [PATCH 17/42] Add Kallisto index (#149)

---
 CHANGELOG.md                                  |   6 +-
 src/kallisto/kallisto_index/Kallisto          | Bin 0 -> 2439 bytes
 src/kallisto/kallisto_index/config.vsh.yaml   |  94 ++++++++++++++++++
 src/kallisto/kallisto_index/help.txt          |  21 ++++
 src/kallisto/kallisto_index/script.sh         |  34 +++++++
 src/kallisto/kallisto_index/test.sh           |  35 +++++++
 .../kallisto_index/test_data/d_list.fasta     |   5 +
 .../test_data/transcriptome.fasta             |  23 +++++
 8 files changed, 217 insertions(+), 1 deletion(-)
 create mode 100644 src/kallisto/kallisto_index/Kallisto
 create mode 100644 src/kallisto/kallisto_index/config.vsh.yaml
 create mode 100644 src/kallisto/kallisto_index/help.txt
 create mode 100644 src/kallisto/kallisto_index/script.sh
 create mode 100644 src/kallisto/kallisto_index/test.sh
 create mode 100644 src/kallisto/kallisto_index/test_data/d_list.fasta
 create mode 100644 src/kallisto/kallisto_index/test_data/transcriptome.fasta

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d88d0996..846007d8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -149,7 +149,11 @@
 * `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic 
                data. (PR #146)
 
-*  `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
+* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
+
+* `kallisto`:
+    - `kallisto_index`: Create a kallisto index (PR #149).
+
 
 ## MINOR CHANGES
 
diff --git a/src/kallisto/kallisto_index/Kallisto b/src/kallisto/kallisto_index/Kallisto
new file mode 100644
index 0000000000000000000000000000000000000000..3c7b5b2bff962965d99ca3f9a4a6b6af6da1f3f0
GIT binary patch
literal 2439
zcmeHJTTBx{6rFCj(iW7(R4PH@f{iwcTCEBOksY_VO)N&jMkQb@MkU0z5<nES+iDtq
ztcrfC@k>b%pZJItq>6y3i5QJ<W1@)>C02jb;7mKSNlWm{Po|xmnK|d)x%cj5cE^Hf
zo5Ms=gP>q-=Dx`Y&8Tam%b*(*sAYc)aw*sH`%Qmb#irI!wJd#g^%N{jJ942Y*B;7v
zR8myiWp1c$UHLL&=A`AC^o$*g(yPKxE9&k2TL(+#tPc&Aci!%O@N%%KUe}u^mgT!#
zi%Y`WO!oXjbAkQw#f0xZP+kZo&F8eXH{r?boldZcY~7r&DBNCpW#D;g<%>%V>88S*
zn(fP&_poKg&FeY4ul8ovgjcuEYhCLuEry9t<T_<r;)x-6Y*vFQbgQZF#X(b>sbgo;
zfcbNEMVB@;wf0@jF3-`vqIs3Vln>G5z~*-Q*|O7R4?+L_8<&^;{;|Iv=yw^0x?1Fq
zKKH{t-pk+KZ7B%++|bB;H`mVIyPQk@FmSG}@6usnuL2HvP5$1gzh1Rfd<xy4PRpnY
z$lwb>lA=oM8O*p9pU%UxMoXtZa4MiI;e6lN9)Zj9SsczoL4F-nxW<!$M?XHx=`41$
zNSX;JB>7gjS;lL#ITXLZTe7Tm{6y<^@7*BFcmyOu`fe-N>GMM_MNm8n3s15~-A-du
zs>S(M$mE>7gVQsBMnsW@M&}f>b(D#qkcNO}MG=t0Wgt?+7*11VW2Q|{Q<m5`y=8^|
z9+;~?zuCji5Og|U(_<Ox<qoL${owUfYj%<xv>bnyveDt-4O(8?_>}HQ+|nsHK}JD>
zEI}_&#snCR!wU`=ImWmY2!IuUp@YzBuGIbjA?Q;}NCbS6Ej{38{Q(Ed(9~7CHlh~@
z(zu?LfHTHk>o~Hk>ib8~11>`F@%qmr>8X$)4UE=ZAnP=qIJp|ns6M_j(fMdSqjeZP
zKmX@E#Gf*HzUVyzWeLinBtr;M7o$H(mPA>GqA1d9hM{`wuLpH{uWG16io*!3&Oq!~
zY>JwOK3Z%+$HT0KEnpYTsK*f41@5@T5O@KlCdndB3}+_eA<9%le@sX@NTS*o#e2qq
z{mZk6y~(I%5^AViqJ%~m(IWP&+TTT!n9y(~NA!$eA9;u!LWn;?@K-_t>bR9cmu<nq
hRn%Ezn!9QyjNx;|7(NH_$x(BVXNzpCjqI)ke*rf>5}*J8

literal 0
HcmV?d00001

diff --git a/src/kallisto/kallisto_index/config.vsh.yaml b/src/kallisto/kallisto_index/config.vsh.yaml
new file mode 100644
index 00000000..2c4f65c7
--- /dev/null
+++ b/src/kallisto/kallisto_index/config.vsh.yaml
@@ -0,0 +1,94 @@
+name: kallisto_index
+namespace: kallisto
+description: |
+  Build a Kallisto index for the transcriptome to use Kallisto in the mapping-based mode.
+keywords: [kallisto, index]
+links:
+  homepage: https://pachterlab.github.io/kallisto/about
+  documentation: https://pachterlab.github.io/kallisto/manual
+  repository: https://github.com/pachterlab/kallisto
+  issue_tracker: https://github.com/pachterlab/kallisto/issues
+references: 
+  doi: https://doi.org/10.1038/nbt.3519
+license: BSD 2-Clause License
+
+argument_groups:
+- name: "Input"
+  arguments: 
+  - name: "--input"
+    type: file
+    description: |
+      Path to a FASTA-file containing the transcriptome sequences, either in plain text or 
+      compressed (.gz) format.
+    required: true
+  - name: "--d_list"
+    type: file
+    description: |
+      Path to a FASTA-file containing sequences to mask from quantification.
+
+- name: "Output"
+  arguments:
+  - name: "--index"
+    type: file
+    direction: output
+    example: Kallisto_index
+
+- name: "Options"
+  arguments:
+  - name: "--kmer_size"
+    type: integer
+    description: |
+      Kmer length passed to indexing step of pseudoaligners (default: '31').
+    example: 31
+  - name: "--make_unique"
+    type: boolean_true
+    description: |
+      Replace repeated target names with unique names.
+  - name: "--aa"
+    type: boolean_true
+    description: |
+      Generate index from a FASTA-file containing amino acid sequences.
+  - name: "--distiguish"
+    type: boolean_true
+    description: |
+       Generate index where sequences are distinguished by the sequence names.
+  - name: "--min_size"
+    alternatives: ["-m"]
+    type: integer
+    description: |
+      Length of minimizers (default: automatically chosen).
+  - name: "--ec_max_size"
+    alternatives: ["-e"]
+    type: integer
+    description: |
+      Maximum number of targets in an equivalence class (default: no maximum).
+  - name: "--tmp"
+    alternatives: ["-T"]
+    type: string
+    description: |
+      Path to a directory for temporary files.
+    example: "tmp"
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+  - type: docker
+    image: ubuntu:22.04
+    setup:
+      - type: docker
+        run: |
+          apt-get update && \
+          apt-get install -y --no-install-recommends wget && \
+          wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
+          tar -xzf kallisto_linux-v0.50.1.tar.gz && \
+          mv kallisto/kallisto /usr/local/bin/
+runners:
+  - type: executable
+  - type: nextflow  
diff --git a/src/kallisto/kallisto_index/help.txt b/src/kallisto/kallisto_index/help.txt
new file mode 100644
index 00000000..28778ac0
--- /dev/null
+++ b/src/kallisto/kallisto_index/help.txt
@@ -0,0 +1,21 @@
+```
+kallisto index
+```
+kallisto 0.50.1
+Builds a kallisto index
+
+Usage: kallisto index [arguments] FASTA-files
+
+Required argument:
+-i, --index=STRING          Filename for the kallisto index to be constructed 
+
+Optional argument:
+-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
+-t, --threads=INT           Number of threads to use (default: 1)
+-d, --d-list=STRING         Path to a FASTA-file containing sequences to mask from quantification
+    --make-unique           Replace repeated target names with unique names
+    --aa                    Generate index from a FASTA-file containing amino acid sequences
+    --distinguish           Generate index where sequences are distinguished by the sequence name
+-T, --tmp=STRING            Temporary directory (default: tmp)
+-m, --min-size=INT          Length of minimizers (default: automatically chosen)
+-e, --ec-max-size=INT       Maximum number of targets in an equivalence class (default: no maximum)
diff --git a/src/kallisto/kallisto_index/script.sh b/src/kallisto/kallisto_index/script.sh
new file mode 100644
index 00000000..59a5d3de
--- /dev/null
+++ b/src/kallisto/kallisto_index/script.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+unset_if_false=( par_make_unique par_aa par_distinguish )
+
+for var in "${unset_if_false[@]}"; do
+    temp_var="${!var}"
+    [[ "$temp_var" == "false" ]] && unset $var
+done
+
+if [ -n "$par_kmer_size" ]; then
+    if [[ "$par_kmer_size" -lt 1 || "$par_kmer_size" -gt 31 || $(( par_kmer_size % 2 )) -eq 0 ]]; then
+        echo "Error: Kmer size must be an odd number between 1 and 31."
+        exit 1
+    fi
+fi
+
+kallisto index \
+    -i "${par_index}" \
+    ${par_kmer_size:+--kmer-size "${par_kmer_size}"} \
+    ${par_make_unique:+--make-unique} \
+    ${par_aa:+--aa} \
+    ${par_distinguish:+--distinguish} \
+    ${par_min_size:+--min-size "${par_min_size}"} \
+    ${par_ec_max_size:+--ec-max-size "${par_ec_max_size}"} \
+    ${par_d_list:+--d-list "${par_d_list}"} \
+    ${meta_cpus:+--cpu "${meta_cpus}"} \
+    ${par_tmp:+--tmp "${par_tmp}"} \
+    "${par_input}"
+
diff --git a/src/kallisto/kallisto_index/test.sh b/src/kallisto/kallisto_index/test.sh
new file mode 100644
index 00000000..2646dcd8
--- /dev/null
+++ b/src/kallisto/kallisto_index/test.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+echo ">>>Test 1: Testing $meta_functionality_name with non-default k-mer size"
+
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/transcriptome.fasta" \
+  --index Kallisto \
+  --kmer_size 21
+
+
+echo ">>> Checking whether output exists and is correct"
+[ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
+[ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1
+
+kallisto inspect Kallisto 2> test.txt
+grep "number of k-mers: 989" test.txt || { echo "The content of the index seems to be incorrect." && exit 1; }
+
+################################################################################
+
+echo ">>>Test 2: Testing $meta_functionality_name with d_list argument"
+
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/transcriptome.fasta" \
+  --index Kallisto \
+  --d_list "$meta_resources_dir/test_data/d_list.fasta"
+
+echo ">>> Checking whether output exists and is correct"
+[ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
+[ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1
+
+kallisto inspect Kallisto 2> test.txt
+grep "number of k-mers: 959" test.txt || { echo "The content of the index seems to be incorrect." && exit 1; }
+
+echo "All tests succeeded!"
+exit 0
diff --git a/src/kallisto/kallisto_index/test_data/d_list.fasta b/src/kallisto/kallisto_index/test_data/d_list.fasta
new file mode 100644
index 00000000..ad5e05bf
--- /dev/null
+++ b/src/kallisto/kallisto_index/test_data/d_list.fasta
@@ -0,0 +1,5 @@
+>YAL067W-A CDS=1-228
+ATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTGCCTGTGACATTTCCTTTTTCGG
+TCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGT
+CTGCTTTAAACGGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTG
+GGAGTCGTATACTGTTAG
diff --git a/src/kallisto/kallisto_index/test_data/transcriptome.fasta b/src/kallisto/kallisto_index/test_data/transcriptome.fasta
new file mode 100644
index 00000000..94c06163
--- /dev/null
+++ b/src/kallisto/kallisto_index/test_data/transcriptome.fasta
@@ -0,0 +1,23 @@
+>YAL069W CDS=1-315
+ATGATCGTAAATAACACACACGTGCTTACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTC
+ACTTGTATACTGATTTTACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTC
+AGATTCCACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATGCACG
+GCACTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATAT
+CTATATCTCATTCGGCGGTCCCAAATATTGTATAA
+>YAL068W-A CDS=1-255
+ATGCACGGCACTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATT
+TTGATATCTATATCTCATTCGGCGGTCCCAAATATTGTATAACTGCCCTTAATACATACGTTATACCACT
+TTTGCACCATATACTTACCACTCCATTTATATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAA
+TCACCTAAACATAAAAATATTCTACTTTTCAACAATAATACATAA
+>YAL068C CDS=1-363
+ATGGTCAAATTAACTTCAATCGCCGCTGGTGTCGCTGCCATCGCTGCTACTGCTTCTGCAACCACCACTC
+TAGCTCAATCTGACGAAAGAGTCAACTTGGTGGAATTGGGTGTCTACGTCTCTGATATCAGAGCTCACTT
+AGCCCAATACTACATGTTCCAAGCCGCCCACCCAACTGAAACCTACCCAGTCGAAGTTGCTGAAGCCGTT
+TTCAACTACGGTGACTTCACCACCATGTTGACCGGTATTGCTCCAGACCAAGTGACCAGAATGATCACCG
+GTGTTCCATGGTACTCCAGCAGATTAAAGCCAGCCATCTCCAGTGCTCTATCCAAGGACGGTATCTACAC
+TATCGCAAACTAG
+>YAL067W-A CDS=1-228
+ATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTGCCTGTGACATTTCCTTTTTCGG
+TCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGT
+CTGCTTTAAACGGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTG
+GGAGTCGTATACTGTTAG
\ No newline at end of file

From fe56ee7c53ca30f25aa31cb9a025e17cd75b636e Mon Sep 17 00:00:00 2001
From: Sai Nirmayi Yasa <92786623+sainirmayi@users.noreply.github.com>
Date: Fri, 13 Sep 2024 09:15:19 +0200
Subject: [PATCH 18/42] change output quant file to an optional argument (#151)

---
 src/salmon/salmon_quant/config.vsh.yaml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/salmon/salmon_quant/config.vsh.yaml b/src/salmon/salmon_quant/config.vsh.yaml
index 1f96f0c9..5fa3d48f 100644
--- a/src/salmon/salmon_quant/config.vsh.yaml
+++ b/src/salmon/salmon_quant/config.vsh.yaml
@@ -24,7 +24,7 @@ argument_groups:
         description: |
           Format string describing the library.
           The library type string consists of three parts: 
-          1. Relative orientation of the reads: This part is only provided if the library is paired-end, THe possible options are
+          1. Relative orientation of the reads: This part is only provided if the library is paired-end, The possible options are
             I = inward
             O = outward
             M = matching
@@ -118,7 +118,7 @@ argument_groups:
         direction: output
         description: |
           Salmon quantification file.
-        required: true
+        required: false
         example: quant.sf
 
   - name: Basic options
@@ -327,7 +327,7 @@ argument_groups:
           If this option is provided, then the selective-alignment results will be written out in SAM-compatible format. By default, output will be directed to stdout, but an alternative file name can be provided instead.
       - name: --mapping_sam
         type: file
-        description: Path to file that should output the selective-alignment results in SAM-compatible format. THis option must be provided while using --write_mappings
+        description: Path to file that should output the selective-alignment results in SAM-compatible format. This option must be provided while using --write_mappings
         required: false
         direction: output
         example: mappings.sam

From 124d50ce5318b612e4a1e4da1be705523cd6eab7 Mon Sep 17 00:00:00 2001
From: Robrecht Cannoodt <rcannood@gmail.com>
Date: Mon, 16 Sep 2024 09:50:24 +0200
Subject: [PATCH 19/42] update changelog for viash 0.2.0

---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 846007d8..1f733203 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,4 @@
-# biobox x.x.x
+# biobox 0.2.0
 
 ## BREAKING CHANGES
 

From c3b40a15350235b00144f9f6735090d45bc24963 Mon Sep 17 00:00:00 2001
From: Robrecht Cannoodt <rcannood@gmail.com>
Date: Mon, 16 Sep 2024 09:53:23 +0200
Subject: [PATCH 20/42] update to viash 0.9.0

---
 CHANGELOG.md | 6 ++++++
 _viash.yaml  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1f733203..0370a216 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,9 @@
+# biobox x.x.x
+
+## MINOR CHANGES
+
+* Upgrade to Viash 0.9.0.
+
 # biobox 0.2.0
 
 ## BREAKING CHANGES
diff --git a/_viash.yaml b/_viash.yaml
index ab4f3828..d08f2fb2 100644
--- a/_viash.yaml
+++ b/_viash.yaml
@@ -7,7 +7,7 @@ links:
   issue_tracker: https://github.com/viash-hub/biobox/issues
   repository: https://github.com/viash-hub/biobox
 
-viash_version: 0.9.0-RC7
+viash_version: 0.9.0
 
 config_mods: |
   .requirements.commands := ['ps']

From 38f635ad57ef05550bba3a0864c81627f84f5ad2 Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Mon, 16 Sep 2024 10:44:10 +0200
Subject: [PATCH 21/42] Add agat convert genscan2gff (#100)

* add config

* add help

* add test data and expected output adn the script to obtain them

* add running script

* add test script

* update changelog

* cleanup

* fix tests

* format description

* remove unused argument --inflate-off

* update --config description

* add requirements

* create temporary directory and clean up on exit

* add GENSCAN in keywords

* add set -e to test

* fix create temporary directory

* add set -eo pipefail to test

* add set -eo pipefail to script

* fix create temporary directory

* update --config description

* cleanup changelog

* cleanup changelog

* Update deprecated variable

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   5 +
 .../agat_convert_genscan2gff/config.vsh.yaml  |  95 +++++++++++++
 src/agat/agat_convert_genscan2gff/help.txt    |  94 +++++++++++++
 src/agat/agat_convert_genscan2gff/script.sh   |  21 +++
 src/agat/agat_convert_genscan2gff/test.sh     |  35 +++++
 .../test_data/agat_convert_genscan2gff_1.gff  |  25 ++++
 .../test_data/script.sh                       |  11 ++
 .../test_data/test.genscan                    | 127 ++++++++++++++++++
 8 files changed, 413 insertions(+)
 create mode 100644 src/agat/agat_convert_genscan2gff/config.vsh.yaml
 create mode 100644 src/agat/agat_convert_genscan2gff/help.txt
 create mode 100644 src/agat/agat_convert_genscan2gff/script.sh
 create mode 100644 src/agat/agat_convert_genscan2gff/test.sh
 create mode 100644 src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff
 create mode 100755 src/agat/agat_convert_genscan2gff/test_data/script.sh
 create mode 100644 src/agat/agat_convert_genscan2gff/test_data/test.genscan

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0370a216..b31f43d9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,10 @@
 # biobox x.x.x
 
+## NEW FUNCTIONALITY
+
+* `agat`:
+  - `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
+
 ## MINOR CHANGES
 
 * Upgrade to Viash 0.9.0.
diff --git a/src/agat/agat_convert_genscan2gff/config.vsh.yaml b/src/agat/agat_convert_genscan2gff/config.vsh.yaml
new file mode 100644
index 00000000..2adce1da
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/config.vsh.yaml
@@ -0,0 +1,95 @@
+name: agat_convert_genscan2gff
+namespace: agat
+description: |
+  The script takes a GENSCAN file as input, and will translate it in gff
+  format. The GENSCAN format is described [here](http://genome.crg.es/courses/Bioinformatics2003_genefinding/results/genscan.html).
+  
+  **Known problem** 
+
+  You must have submited only DNA sequence, without any header!! Indeed the tool expects only DNA
+  sequences and does not crash/warn if an header is submited along the
+  sequence. e.g If you have an header ">seq" s-e-q are seen as the 3 first
+  nucleotides of the sequence. Then all prediction location are shifted
+  accordingly. (checked only on the [online version](http://argonaute.mit.edu/GENSCAN.html). 
+  I don't know if there is the same problem elsewhere.)
+keywords: [gene annotations, GFF conversion, GENSCAN]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_genscan2gff.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+  - commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --genscan
+        alternatives: [-g]
+        description: Input genscan bed file that will be converted.
+        type: file
+        required: true
+        direction: input
+  - name: Outputs
+    arguments:       
+      - name: --output
+        alternatives: [-o, --out, --outfile, --gff]
+        description: Output GFF file. If no output file is specified, the output will be written to STDOUT.
+        type: file
+        direction: output
+        required: true
+        example: output.gff
+  - name: Arguments
+    arguments:
+      - name: --source
+        description: |
+          The source informs about the tool used to produce the data and is stored in 2nd field of a gff file. Example: Stringtie, Maker, Augustus, etc. [default: data]
+        type: string
+        required: false
+        example: Stringtie
+      - name: --primary_tag
+        description: |
+          The primary_tag corresponds to the data type and is stored in 3rd field of a gff file. Example: gene, mRNA, CDS, etc. [default: gene]
+        type: string
+        required: false
+        example: gene
+      - name: --inflate_type
+        description: |
+          Feature type (3rd column in gff) created when inflate parameter activated [default: exon].
+        type: string
+        required: false
+        example: exon
+      - name: --verbose
+        description: add verbosity
+        type: boolean_true
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        required: false
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_convert_genscan2gff/help.txt b/src/agat/agat_convert_genscan2gff/help.txt
new file mode 100644
index 00000000..8a9e9f52
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/help.txt
@@ -0,0 +1,94 @@
+```sh
+agat_convert_genscan2gff.pl --help
+```
+ ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+Name:
+    agat_convert_genscan2gff.pl
+
+Description:
+    The script takes a genscan file as input, and will translate it in gff
+    format. The genscan format is described here:
+    http://genome.crg.es/courses/Bioinformatics2003_genefinding/results/gens
+    can.html /!\ vvv Known problem vvv /!\ You must have submited only DNA
+    sequence, wihtout any header!! Indeed the tool expects only DNA
+    sequences and does not crash/warn if an header is submited along the
+    sequence. e.g If you have an header ">seq" s-e-q are seen as the 3 first
+    nucleotides of the sequence. Then all prediction location are shifted
+    accordingly. (checked only on the online version
+    http://argonaute.mit.edu/GENSCAN.html. I don't know if there is the same
+    pronlem elsewhere.) /!\ ^^^ Known problem ^^^^ /!\
+
+Usage:
+        agat_convert_genscan2gff.pl --genscan infile.bed [ -o outfile ]
+        agat_convert_genscan2gff.pl -h
+
+Options:
+    --genscan or -g
+            Input genscan bed file that will be convert.
+
+    --source
+            The source informs about the tool used to produce the data and
+            is stored in 2nd field of a gff file. Example:
+            Stringtie,Maker,Augustus,etc. [default: data]
+
+    --primary_tag
+            The primary_tag corresponf to the data type and is stored in 3rd
+            field of a gff file. Example: gene,mRNA,CDS,etc. [default: gene]
+
+    --inflate_off
+            By default we inflate the block fields (blockCount, blockSizes,
+            blockStarts) to create subfeatures of the main feature
+            (primary_tag). Type of subfeature created based on the
+            inflate_type parameter. If you don't want this inflating
+            behaviour you can deactivate it by using the option
+            --inflate_off.
+
+    --inflate_type
+            Feature type (3rd column in gff) created when inflate parameter
+            activated [default: exon].
+
+    --verbose
+            add verbosity
+
+    -o , --output , --out , --outfile or --gff
+            Output GFF file. If no output file is specified, the output will
+            be written to STDOUT.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    -h or --help
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
diff --git a/src/agat/agat_convert_genscan2gff/script.sh b/src/agat/agat_convert_genscan2gff/script.sh
new file mode 100644
index 00000000..38afb084
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/script.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# unset flags
+[[ "$par_inflate_off" == "true" ]] && unset par_inflate_off
+[[ "$par_verbose" == "false" ]] && unset par_verbose
+
+# run agat_convert_genscan2gff
+agat_convert_genscan2gff.pl \
+  --genscan "$par_genscan" \
+  --output "$par_output" \
+  ${par_source:+--source "${par_source}"} \
+  ${par_primary_tag:+--primary_tag "${par_primary_tag}"} \
+  ${par_inflate_off:+--inflate_off} \
+  ${par_inflate_type:+--inflate_type "${par_inflate_type}"} \
+  ${par_verbose:+--verbose} \
+  ${par_config:+--config "${par_config}"}
\ No newline at end of file
diff --git a/src/agat/agat_convert_genscan2gff/test.sh b/src/agat/agat_convert_genscan2gff/test.sh
new file mode 100644
index 00000000..b666dacf
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/test.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --genscan "$test_dir/test.genscan" \
+  --output "$TMPDIR/output.gff" 
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/agat_convert_genscan2gff_1.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
diff --git a/src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff b/src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff
new file mode 100644
index 00000000..695fb46c
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff
@@ -0,0 +1,25 @@
+##gff-version 3
+unknown	genscan	gene	2223	4605	75.25	+	.	ID=gene_1
+unknown	genscan	mRNA	2223	4605	75.25	+	.	ID=mrna_1;Parent=gene_1
+unknown	genscan	exon	2223	3020	75.25	+	.	ID=exon_1;Parent=mrna_1
+unknown	genscan	exon	4249	4605	13.03	+	.	ID=exon_2;Parent=mrna_1
+unknown	genscan	CDS	2223	3020	75.25	+	0	ID=cds_1;Parent=mrna_1
+unknown	genscan	CDS	4249	4605	13.03	+	0	ID=cds_2;Parent=mrna_1
+unknown	genscan	gene	6829	8789	20.06	-	.	ID=gene_2
+unknown	genscan	mRNA	6829	8789	20.06	-	.	ID=mrna_2;Parent=gene_2
+unknown	genscan	exon	6829	7297	20.06	-	.	ID=exon_3;Parent=mrna_2
+unknown	genscan	exon	7730	7888	12.78	-	.	ID=exon_4;Parent=mrna_2
+unknown	genscan	exon	8029	8185	7.45	-	.	ID=exon_5;Parent=mrna_2
+unknown	genscan	exon	8278	8546	17.45	-	.	ID=exon_6;Parent=mrna_2
+unknown	genscan	exon	8647	8789	18.65	-	.	ID=exon_7;Parent=mrna_2
+unknown	genscan	CDS	6829	7297	20.06	-	1	ID=cds_3;Parent=mrna_2
+unknown	genscan	CDS	7730	7888	12.78	-	1	ID=cds_4;Parent=mrna_2
+unknown	genscan	CDS	8029	8185	7.45	-	2	ID=cds_5;Parent=mrna_2
+unknown	genscan	CDS	8278	8546	17.45	-	1	ID=cds_6;Parent=mrna_2
+unknown	genscan	CDS	8647	8789	18.65	-	0	ID=cds_7;Parent=mrna_2
+unknown	genscan	gene	10209	11924	16.18	+	.	ID=gene_3
+unknown	genscan	mRNA	10209	11924	16.18	+	.	ID=mrna_3;Parent=gene_3
+unknown	genscan	exon	10209	11313	16.18	+	.	ID=exon_8;Parent=mrna_3
+unknown	genscan	exon	11850	11924	3.27	+	.	ID=exon_9;Parent=mrna_3
+unknown	genscan	CDS	10209	11313	16.18	+	0	ID=cds_8;Parent=mrna_3
+unknown	genscan	CDS	11850	11924	3.27	+	2	ID=cds_9;Parent=mrna_3
diff --git a/src/agat/agat_convert_genscan2gff/test_data/script.sh b/src/agat/agat_convert_genscan2gff/test_data/script.sh
new file mode 100755
index 00000000..c1693653
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/test_data/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/test.genscan src/agat/agat_convert_genscan2gff/test_data/test.genscan
+cp -r /tmp/agat_source/t/scripts_output/out/agat_convert_genscan2gff_1.gff src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff
+
diff --git a/src/agat/agat_convert_genscan2gff/test_data/test.genscan b/src/agat/agat_convert_genscan2gff/test_data/test.genscan
new file mode 100644
index 00000000..a88037db
--- /dev/null
+++ b/src/agat/agat_convert_genscan2gff/test_data/test.genscan
@@ -0,0 +1,127 @@
+GENSCAN 1.0	Date run:  7-Mar-120	Time: 14:46:49
+
+
+
+Sequence /tmp/03_07_20-14:46:49.fasta : 12217 bp : 42.83% C+G : Isochore 1 ( 0 - 43 C+G%)
+
+
+
+Parameter matrix: HumanIso.smat
+
+
+
+Predicted genes/exons:
+
+
+
+Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
+
+----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
+
+
+
+ 1.01 Init +   2223   3020  798  2  0   55    2   924 0.940  75.25
+
+ 1.02 Term +   4249   4605  357  0  0   26   38   307 0.976  13.03
+
+ 1.03 PlyA +   4711   4716    6                              -0.45
+
+
+
+ 2.06 PlyA -   4852   4847    6                              -0.45
+
+ 2.05 Term -   7297   6829  469  0  1   13   42   387 0.281  20.06
+
+ 2.04 Intr -   7888   7730  159  0  0   85   93   144 0.998  12.78
+
+ 2.03 Intr -   8185   8029  157  2  1   65   60   144 0.787   7.45
+
+ 2.02 Intr -   8546   8278  269  1  2   36   65   287 0.946  17.45
+
+ 2.01 Init -   8789   8647  143  2  2   94   96   176 0.550  18.65
+
+ 2.00 Prom -   9720   9681   40                              -6.55
+
+
+
+ 3.00 Prom +  10160  10199   40                             -11.84
+
+ 3.01 Init +  10209  11313 1105  2  1   66   57   269 0.512  16.18
+
+ 3.02 Intr +  11850  11924   75  1  0   80   86    57 0.507   3.27
+
+
+
+Suboptimal exons with probability > 1.000
+
+
+
+Exnum Type S .Begin ...End .Len Fr Ph B/Ac Do/T CodRg P.... Tscr..
+
+----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
+
+
+
+NO EXONS FOUND AT GIVEN PROBABILITY CUTOFF
+
+
+
+
+
+Predicted peptide sequence(s):
+
+
+
+
+
+>/tmp/03_03_20-07:33:11.fasta|GENSCAN_predicted_peptide_1|384_aa
+
+MSSKNKVSKQDIDSIVESLMKKQKSYFEPRLAQIQQVGMENVQKLSAIHAELALLTASIS
+
+TVKSDVDKLKCKVENNFSAIDGHDQAFGELELKMADMEDRSRRCNIRVIGLKERLEGFNA
+
+IQYLTHSLPKWFPALADVPVEVMSAHRIYSDAKRGDNRTLIFNVLRYTTRQAILRAAKKD
+
+PLSVDDRKVRFSPDYSNFTVKRCQAFHQAKDAARNKCLDFFLLYPATLKIKEGAQYRSFT
+
+SPKEAEDYVNSAASNHAATPASPRQHGTILTIYRRIHSLYDGERARKIQLLEQAASVALT
+
+GDNWTSVRNDNYLGVTAHFIDNVWKLRCFALEVKKKKKHSRHTAEDCAEEFIDVSNRWEI
+
+NGKLTTLGTDSALIMLAAARLLPF
+
+
+
+>/tmp/03_03_20-07:33:11.fasta|GENSCAN_predicted_peptide_2|398_aa
+
+MASTMPSSSSTEDEENTPECLNKDHYHFHHYTMEYIQDKPTNVARVGGFTDKKSIAKVER
+
+CLARERQEATEDHEAIPSTSGATSLTKKLRSRSGLPIAGSGLVLPALCIICQKKEKFINR
+
+AGKRQRDPLSKAETLTVGQLQKAAELKDDQSILLHIKDKDCVALEVQYHKGCYNQYTRFM
+
+TRPEKPEKEQNEPTFDVGYKILCERIIRQRLLVNQEVLRMGQLRMAFIELVKANEGLDAS
+
+NYSIKNLERSRRADAGSQRIQIFDPDQRTPTQWKKFLSEGTKKEALAEFLYVAWKNADLT
+
+IVGKNLCLYIAHTNQCHCVTVKEGVQSVRVVEDLLLFLHAQHAAREHKAVIIKSSDTDVA
+
+VIAVSVQTDLPCSLYVFTGTGNRTRIIDITKVSSANKI
+
+
+
+>/tmp/03_03_20-07:33:11.fasta|GENSCAN_predicted_peptide_3|394_aa
+
+MQRGRAAGINGIPPEFYVAFWEQLSPFFLHMINFSIEKGGFLRDVNTALISLLMKKDKNP
+
+TDCSSYRPLSLLNSDVKIFAKLLPLRLEPHMPELVSSDQTGFIKSRTAADNIRRLLHIIA
+
+AAPGCETPMSVLSLDAMKAFDRLEWSFLWSVLEAMGFISTFIGMVKVLYSNPSARVLTGQ
+
+TFSSLFPVSRSSRQGCPLSPALFVLSLEPLAQAVRLSNLVLPICICDTQHKLSLFADDVI
+
+VFLEHPTQSLPHFLSICEEFRKLSGFKMNWSKSALMHLNDNARKSVTPVNIPLVGQLKYL
+
+GIEVFPSLNQIVKHNYSLAFTNVLKDMDRWISLPMSIQARISIIKMNGLPRIHFVSSMVP
+
+LPPPSDYWIKISAQGVRCPLAKPFTHSPYSKTKX

From 7f8bcc2b3e1ffaac9778b6acb42420b19660d1a1 Mon Sep 17 00:00:00 2001
From: Robrecht Cannoodt <rcannood@gmail.com>
Date: Tue, 17 Sep 2024 11:47:31 +0200
Subject: [PATCH 22/42] BD rhapsody sequence analysis (#96)

* wip

* fix test

* add help

* update 2.2 args

* fix bug

* extend test data

* output separate files

* analyse missing args

* tweaks to test

* fix script

* fix test

* fix test

* move small reference

* wip generate wta test data

* don't forget about umi in r1

* remove unneeded pkg

* load reference in memory just once

* fix random choices

* extend test

* add abc immunediscoverypanel

* wip abc testing code

* fix abc test; need unique instrument, run and flowcell ids for each sample

* add smk data

* add entry to changelog

* remove old test file

* adapt test for missing read

* update description

* add comment

* ensure cwl files are absolute

* Apply suggestions from code review

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* fix suggestion

* newer pipelines have docker requirements as a hint instead of a strict requirement

* rename str to content

* remove deleted resources

* fix containers

* fix script

* fix suggestion

* fix suggestion...

* fix test

* fix component name

* fix test

* apply suggestions

* fix test

* added note

* fix changelog

* fix changelog again

* splitting hairs here

---------

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +
 .../config.vsh.yaml                           |  14 +-
 .../make_rhap_reference_2.2.1_nodocker.cwl    | 115 ---
 .../bd_rhapsody_make_reference/script.py      |  12 +-
 .../test_data/script.sh                       |  47 --
 .../_process_cwl.R                            | 116 +++
 .../config.vsh.yaml                           | 661 ++++++++++++++++++
 .../bd_rhapsody_sequence_analysis/help.txt    | 167 +++++
 .../pipeline_inputs_template_2.2.1.yaml       | 203 ++++++
 .../bd_rhapsody_sequence_analysis/script.py   | 243 +++++++
 .../bd_rhapsody_sequence_analysis/test.py     | 494 +++++++++++++
 .../helpers/rhapsody_cell_label.py            | 405 +++++++++++
 .../BDAbSeq_ImmuneDiscoveryPanel.fasta        |  60 ++
 .../SampleTagSequences_HomoSapiens_ver1.fasta |  24 +
 .../test_data/reference_small.fa              |   0
 .../test_data/reference_small.gtf             |   0
 src/bd_rhapsody/test_data/script.sh           | 141 ++++
 17 files changed, 2532 insertions(+), 172 deletions(-)
 delete mode 100644 src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
 delete mode 100644 src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/_process_cwl.R
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/config.vsh.yaml
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/help.txt
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/pipeline_inputs_template_2.2.1.yaml
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/script.py
 create mode 100644 src/bd_rhapsody/bd_rhapsody_sequence_analysis/test.py
 create mode 100644 src/bd_rhapsody/helpers/rhapsody_cell_label.py
 create mode 100644 src/bd_rhapsody/test_data/BDAbSeq_ImmuneDiscoveryPanel.fasta
 create mode 100644 src/bd_rhapsody/test_data/SampleTagSequences_HomoSapiens_ver1.fasta
 rename src/bd_rhapsody/{bd_rhapsody_make_reference => }/test_data/reference_small.fa (100%)
 rename src/bd_rhapsody/{bd_rhapsody_make_reference => }/test_data/reference_small.gtf (100%)
 create mode 100644 src/bd_rhapsody/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b31f43d9..07a83c15 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,8 @@
 * `agat`:
   - `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
 
+* `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
+
 ## MINOR CHANGES
 
 * Upgrade to Viash 0.9.0.
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml b/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
index e596bf06..dc71262b 100644
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
@@ -116,12 +116,11 @@ argument_groups:
 resources:
   - type: python_script
     path: script.py
-  - path: make_rhap_reference_2.2.1_nodocker.cwl
 
 test_resources:
   - type: bash_script
     path: test.sh
-  - path: test_data
+  - path: ../test_data
 
 requirements:
   commands: [ "cwl-runner" ]
@@ -131,12 +130,19 @@ engines:
     image: bdgenomics/rhapsody:2.2.1
     setup:
       - type: apt
-        packages: [procps]
+        packages: [procps, git]
       - type: python
         packages: [cwlref-runner, cwl-runner]
       - type: docker
         run: |
-          echo "bdgenomics/rhapsody: 2.2.1" > /var/software_versions.txt
+          mkdir /var/bd_rhapsody_cwl && \
+            cd /var/bd_rhapsody_cwl && \
+            git clone https://bitbucket.org/CRSwDev/cwl.git . && \
+            git checkout 8feeace1141b24749ea6003f8e6ad6d3ad5232de
+      - type: docker
+        run:
+          - VERSION=$(ls -v /var/bd_rhapsody_cwl | grep '^v' | sed 's#v##' | tail -1)
+          - 'echo "bdgenomics/rhapsody: \"$VERSION\"" > /var/software_versions.txt'
 
 runners:
   - type: executable
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl b/src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
deleted file mode 100644
index fead2c02..00000000
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
+++ /dev/null
@@ -1,115 +0,0 @@
-requirements:
-  InlineJavascriptRequirement: {}
-class: CommandLineTool
-label: Reference Files Generator for BD Rhapsodyâ„¢ Sequencing Analysis Pipeline
-cwlVersion: v1.2
-doc: >- 
-    The Reference Files Generator creates an archive containing Genome Index and Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline. The app takes as input one or more FASTA and GTF files and produces a compressed archive in the form of a tar.gz file. The archive contains:\n  - STAR index\n  - Filtered GTF file
-
-
-baseCommand: run_reference_generator.sh 
-inputs: 
-    Genome_fasta:
-        type: File[]
-        label: Reference Genome
-        doc: |-
-            Reference genome file in FASTA format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse.
-        inputBinding:
-            prefix: --reference-genome
-            shellQuote: false
-    Gtf:
-        type: File[]
-        label: Transcript Annotations
-        doc: |-
-            Transcript annotation files in GTF format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode v42 for Human and M31 for Mouse.
-        inputBinding:
-            prefix: --gtf
-            shellQuote: false
-    Extra_sequences:
-        type: File[]?
-        label: Extra Sequences
-        doc: |-
-            Additional sequences in FASTA format to use when building the STAR index. (E.g. phiX genome)
-        inputBinding:
-            prefix: --extra-sequences
-            shellQuote: false
-    Mitochondrial_Contigs:
-        type: string[]?
-        default: ["chrM", "chrMT", "M", "MT"]
-        label: Mitochondrial Contig Names
-        doc: |-
-            Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are identified as 'nuclear fragments' in the ATACseq analysis pipeline.
-        inputBinding:
-            prefix: --mitochondrial-contigs
-            shellQuote: false
-    Filtering_off:
-        type: boolean?
-        label: Turn off filtering
-        doc: |-
-            By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features having the following attribute values are are kept:
-            - protein_coding
-            - lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97)
-            - IG_LV_gene
-            - IG_V_gene
-            - IG_V_pseudogene
-            - IG_D_gene
-            - IG_J_gene
-            - IG_J_pseudogene
-            - IG_C_gene
-            - IG_C_pseudogene
-            - TR_V_gene
-            - TR_V_pseudogene
-            - TR_D_gene
-            - TR_J_gene
-            - TR_J_pseudogene
-            - TR_C_gene
-            If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.
-        inputBinding: 
-            prefix: --filtering-off
-            shellQuote: false
-    WTA_Only:
-        type: boolean?
-        label: WTA only index
-        doc: Build a WTA only index, otherwise builds a WTA + ATAC index.
-        inputBinding:
-            prefix: --wta-only-index
-            shellQuote: false
-    Archive_prefix:
-        type: string?
-        label: Archive Prefix
-        doc: |-
-            A prefix for naming the compressed archive file containing the Reference genome index and annotation files. The default value is constructed based on the input Reference files.
-        inputBinding:
-            prefix: --archive-prefix
-            shellQuote: false
-    Extra_STAR_params:
-        type: string?
-        label: Extra STAR Params
-        doc: |-
-            Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.
-            Example:
-              --limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11
-        inputBinding:
-            prefix: --extra-star-params 
-            shellQuote: true
-  
-    Maximum_threads:
-        type: int?
-        label: Maximum Number of Threads
-        doc: |-
-            The maximum number of threads to use in the pipeline. By default, all available cores are used.
-        inputBinding:
-            prefix: --maximum-threads
-            shellQuote: false
-
-outputs:
-
-    Archive:
-        type: File
-        doc: |- 
-            A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an input in the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline.
-        id: Reference_Archive
-        label: Reference Files Archive
-        outputBinding:
-            glob: '*.tar.gz'
-
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/script.py b/src/bd_rhapsody/bd_rhapsody_make_reference/script.py
index ca635508..dcbfe933 100644
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/script.py
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/script.py
@@ -83,21 +83,21 @@ def generate_config(par: dict[str, Any], meta, config) -> str:
 
     for config_key, arg_type, par_value in config_key_value_pairs:
         if arg_type == "file":
-            str = strip_margin(f"""\
+            content = strip_margin(f"""\
                 |{config_key}:
                 |""")
             if isinstance(par_value, list):
                 for file in par_value:
-                    str += strip_margin(f"""\
+                    content += strip_margin(f"""\
                         | - class: File
                         |   location: "{file}"
                         |""")
             else:
-                str += strip_margin(f"""\
+                content += strip_margin(f"""\
                     |   class: File
                     |   location: "{par_value}"
                     |""")
-            content_list.append(str)
+            content_list.append(content)
         else:
             content_list.append(strip_margin(f"""\
                 |{config_key}: {par_value}
@@ -108,9 +108,9 @@ def generate_config(par: dict[str, Any], meta, config) -> str:
 
 def get_cwl_file(meta: dict[str, Any]) -> str:
     # create cwl file (if need be)
-    cwl_file=os.path.join(meta["resources_dir"], "make_rhap_reference_2.2.1_nodocker.cwl")
+    cwl_file="/var/bd_rhapsody_cwl/v2.2.1/Extra_Utilities/make_rhap_reference_2.2.1.cwl"
 
-    return cwl_file
+    return os.path.abspath(cwl_file)
 
 def main(par: dict[str, Any], meta: dict[str, Any]):
     config = read_config(meta["config"])
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh b/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh
deleted file mode 100644
index 8d468064..00000000
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh
+++ /dev/null
@@ -1,47 +0,0 @@
-#!/bin/bash
-
-TMP_DIR=/tmp/bd_rhapsody_make_reference
-OUT_DIR=src/bd_rhapsody/bd_rhapsody_make_reference/test_data
-
-# check if seqkit is installed
-if ! command -v seqkit &> /dev/null; then
-  echo "seqkit could not be found"
-  exit 1
-fi
-
-# create temporary directory and clean up on exit
-mkdir -p $TMP_DIR
-function clean_up {
-    rm -rf "$TMP_DIR"
-}
-trap clean_up EXIT
-
-# fetch reference
-ORIG_FA=$TMP_DIR/reference.fa.gz
-if [ ! -f $ORIG_FA ]; then
-  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz \
-    -O $ORIG_FA
-fi
-
-ORIG_GTF=$TMP_DIR/reference.gtf.gz
-if [ ! -f $ORIG_GTF ]; then
-  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz \
-    -O $ORIG_GTF
-fi
-
-# create small reference
-START=30000
-END=31500
-CHR=chr1
-
-# subset to small region
-seqkit grep -r -p "^$CHR\$" "$ORIG_FA" | \
-  seqkit subseq -r "$START:$END" > $OUT_DIR/reference_small.fa
-
-zcat "$ORIG_GTF" | \
-  awk -v FS='\t' -v OFS='\t' "
-    \$1 == \"$CHR\" && \$4 >= $START && \$5 <= $END {
-      \$4 = \$4 - $START + 1;
-      \$5 = \$5 - $START + 1;
-      print;
-    }" > $OUT_DIR/reference_small.gtf
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/_process_cwl.R b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/_process_cwl.R
new file mode 100644
index 00000000..e33b8ea7
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/_process_cwl.R
@@ -0,0 +1,116 @@
+# Extract arguments from CWL file and write them to arguments.yaml
+#
+# This script:
+#  - reads the CWL file
+#  - extracts the main workflow arguments
+#  - compares cwl arguments to viash config arguments
+#  - writes the arguments to arguments.yaml
+#
+# It can be used to update the arguments in the viash config after an
+# update to the CWL file has been made.
+#
+# Dependencies: tidyverse, jsonlite, yaml, dynutils
+#
+# Install dependencies:
+# ```R
+# install.packages(c("tidyverse", "jsonlite", "yaml", "dynutils"))
+# ```
+#
+# Usage:
+# ```bash
+# Rscript src/bd_rhapsody/bd_rhapsody_sequence_analysis/_process_cwl.R
+# ```
+
+library(tidyverse)
+
+# fetch and read cwl file
+lines <- read_lines("https://bitbucket.org/CRSwDev/cwl/raw/8feeace1141b24749ea6003f8e6ad6d3ad5232de/v2.2.1/rhapsody_pipeline_2.2.1.cwl")
+cwl_header <- lines[[1]]
+cwl_obj <- jsonlite::fromJSON(lines[-1], simplifyVector = FALSE)
+
+# detect main workflow arguments
+gr <- dynutils::list_as_tibble(cwl_obj$`$graph`)
+
+gr %>% print(n = 100)
+
+main <- gr %>% filter(gr$id == "#main")
+
+main_inputs <- main$inputs[[1]]
+
+input_ids <- main_inputs %>% map_chr("id") %>% gsub("^#main/", "", .)
+
+# check whether in config
+config <- yaml::read_yaml("src/bd_rhapsody/bd_rhapsody_sequence_analysis/config.vsh.yaml")
+config$all_arguments <- config$argument_groups %>% map("arguments") %>% list_flatten()
+arg_names <- config$all_arguments %>% map_chr("name") %>% gsub("^--", "", .)
+
+# arguments in cwl but not in config
+setdiff(tolower(input_ids), arg_names)
+
+# arguments in config but not in cwl
+setdiff(arg_names, tolower(input_ids))
+
+# create arguments from main_inputs
+arguments <- map(main_inputs, function(main_input) {
+  input_id <- main_input$id %>% gsub("^#main/", "", .)
+  input_type <- main_input$type[[2]]
+
+  if (is.list(input_type) && input_type$type == "array") {
+    multiple <- TRUE
+    input_type <- input_type$items
+  } else {
+    multiple <- FALSE
+  }
+
+  if (is.list(input_type) && input_type$type == "enum") {
+    choices <- input_type$symbols %>%
+      gsub(paste0(input_type$name, "/"), "", .)
+    input_type <- "enum"
+  } else {
+    choices <- NULL
+  }
+
+  description <-
+    if (is.null(main_input$label)) {
+      main_input$doc
+    } else if (is.null(main_input$doc)) {
+      main_input$label
+    } else {
+      paste0(main_input$label, ". ", main_input$doc)
+    }
+
+  type_map <- c(
+    "float" = "double",
+    "int" = "integer",
+    "string" = "string",
+    "boolean" = "boolean",
+    "File" = "file",
+    "enum" = "string"
+  )
+
+  out <- list(
+    name = paste0("--", tolower(input_id)),
+    type = type_map[input_type],
+    # TODO: use summary when viash 0.9 is released
+    # summary = main_input$doc,
+    # description = main_input$doc,
+    description = description,
+    multiple = multiple,
+    choices = choices,
+    info = list(
+      config_key = input_id
+    )
+  )
+
+  out[!sapply(out, is.null)]
+})
+
+
+
+yaml::write_yaml(
+  arguments,
+  "src/bd_rhapsody/bd_rhapsody_sequence_analysis/arguments.yaml",
+  handlers = list(
+    logical = yaml::verbatim_logical
+  )
+)
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/config.vsh.yaml b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/config.vsh.yaml
new file mode 100644
index 00000000..eb3eaf38
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/config.vsh.yaml
@@ -0,0 +1,661 @@
+name: bd_rhapsody_sequence_analysis
+namespace: bd_rhapsody
+description: |
+  BD Rhapsody Sequence Analysis CWL pipeline v2.2.
+
+  This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data. The supported
+  sequencing libraries are those generated by the BD Rhapsody™ assay kits, including: Whole Transcriptome
+  mRNA (WTA), Targeted mRNA, AbSeq Antibody-Oligonucleotides (ABC), Single-Cell Multiplexing (SMK),
+  TCR/BCR (VDJ), and ATAC-Seq.
+keywords: [rna-seq, single-cell, multiomic, atac-seq, targeted, abseq, tcr, bcr]
+links:
+  repository: https://bitbucket.org/CRSwDev/cwl/src/master/v2.2.1
+  documentation: https://bd-rhapsody-bioinfo-docs.genomics.bd.com
+license: Unknown
+authors:
+  - __merge__: /src/_authors/robrecht_cannoodt.yaml
+    roles: [ author, maintainer ]
+  - __merge__: /src/_authors/weiwei_schultz.yaml
+    roles: [ contributor ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: "--reads"
+        type: file
+        description: |
+          Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include:
+          
+          - WTA mRNA
+          - Targeted mRNA
+          - AbSeq
+          - Sample Multiplexing
+          - VDJ
+          
+          You may specify as many R1/R2 read pairs as you want.
+        required: false
+        multiple: true
+        example:
+          - WTALibrary_S1_L001_R1_001.fastq.gz
+          - WTALibrary_S1_L001_R2_001.fastq.gz
+        info:
+          config_key: Reads
+      - name: "--reads_atac"
+        type: file
+        description: |
+          Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries.
+          You may specify as many R1/R2/I2 files as you want.
+        required: false
+        multiple: true
+        example:
+          - ATACLibrary_S2_L001_R1_001.fastq.gz
+          - ATACLibrary_S2_L001_R2_001.fastq.gz
+          - ATACLibrary_S2_L001_I2_001.fastq.gz
+        info:
+          config_key: Reads_ATAC
+  - name: References
+    description: |
+      Assay type will be inferred from the provided reference(s).
+      Do not provide both reference_archive and targeted_reference at the same time.
+      
+      Valid reference input combinations:
+        - reference_archive: WTA only
+        - reference_archive & abseq_reference: WTA + AbSeq
+        - reference_archive & supplemental_reference: WTA + extra transgenes
+        - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes
+        - reference_archive: WTA + ATAC or ATAC only
+        - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes
+        - targeted_reference: Targeted only
+        - targeted_reference & abseq_reference: Targeted + AbSeq
+        - abseq_reference: AbSeq only
+
+      The reference_archive can be generated with the bd_rhapsody_make_reference component.
+      Alternatively, BD also provides standard references which can be downloaded from these locations:
+
+        - Human: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Human_WTA_2023-02.tar.gz
+        - Mouse: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Mouse_WTA_2023-02.tar.gz
+    arguments:
+      - name: "--reference_archive"
+        type: file
+        description: |
+          Path to Rhapsody WTA Reference in the tar.gz format.
+
+          Structure of the reference archive:
+          
+          - `BD_Rhapsody_Reference_Files/`: top level folder
+            - `star_index/`: sub-folder containing STAR index, that is files created with `STAR --runMode genomeGenerate`
+            - GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"
+        example: "RhapRef_Human_WTA_2023-02.tar.gz"
+        required: false
+        info:
+          config_key: Reference_Archive
+      - name: "--targeted_reference"
+        type: file
+        description: |
+          Path to the targeted reference file in FASTA format.
+        example: "BD_Rhapsody_Immune_Response_Panel_Hs.fasta"
+        multiple: true
+        info:
+          config_key: Targeted_Reference
+      - name: "--abseq_reference"
+        type: file
+        description: Path to the AbSeq reference file in FASTA format.  Only needed if BD AbSeq Ab-Oligos are used.
+        example: "AbSeq_reference.fasta"
+        multiple: true
+        info:
+          config_key: AbSeq_Reference
+      - name: "--supplemental_reference"
+        type: file
+        alternatives: [-s]
+        description: Path to the supplemental reference file in FASTA format.  Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment.
+        example: "supplemental_reference.fasta"
+        multiple: true
+        info:
+          config_key: Supplemental_Reference
+  - name: Outputs
+    description: Outputs for all pipeline runs
+    # based on https://bd-rhapsody-bioinfo-docs.genomics.bd.com/outputs/top_outputs.html
+    arguments:
+      - name: "--output_dir"
+        type: file
+        direction: output
+        alternatives: [-o]
+        description: "The unprocessed output directory containing all the outputs from the pipeline."
+        required: true
+        example: output_dir/
+      - name: "--output_seurat"
+        type: file
+        direction: output
+        description: "Single-cell analysis tool inputs. Seurat (.rds) input file containing RSEC molecules data table and all cell annotation metadata."
+        example: output_seurat.rds
+        required: false
+        info:
+          template: "[sample_name]_Seurat.rds"
+      - name: "--output_mudata"
+        type: file
+        direction: output
+        description: "Single-cell analysis tool inputs. Scanpy / Muon input file containing RSEC molecules data table and all cell annotation metadata."
+        example: output_mudata.h5mu
+        required: false
+        info:
+          template: "[sample_name].h5mu"
+      - name: "--metrics_summary"
+        type: file
+        direction: output
+        description: "Metrics Summary. Report containing sequencing, molecules, and cell metrics."
+        example: metrics_summary.csv
+        required: false
+        info:
+          template: "[sample_name]_Metrics_Summary.csv"
+      - name: "--pipeline_report"
+        type: file
+        direction: output
+        description: "Pipeline Report. Summary report containing the results from the sequencing analysis pipeline run."
+        example: pipeline_report.html
+        required: false
+        info:
+          template: "[sample_name]_Pipeline_Report.html"
+      - name: "--rsec_mols_per_cell"
+        type: file
+        direction: output
+        description: "Molecules per bioproduct per cell bassed on RSEC"
+        example: RSEC_MolsPerCell_MEX.zip 
+        required: false
+        info:
+          template: "[sample_name]_RSEC_MolsPerCell_MEX.zip"
+      - name: "--dbec_mols_per_cell"
+        type: file
+        direction: output
+        description: "Molecules per bioproduct per cell bassed on DBEC. DBEC data table is only output if the experiment includes targeted mRNA or AbSeq bioproducts."
+        example: DBEC_MolsPerCell_MEX.zip 
+        required: false
+        info:
+          template: "[sample_name]_DBEC_MolsPerCell_MEX.zip"
+      - name: "--rsec_mols_per_cell_unfiltered"
+        type: file
+        direction: output
+        description: "Unfiltered tables containing all cell labels with ≥10 reads."
+        example: RSEC_MolsPerCell_Unfiltered_MEX.zip 
+        required: false
+        info:
+          template: "[sample_name]_RSEC_MolsPerCell_Unfiltered_MEX.zip"
+      - name: "--bam"
+        type: file
+        direction: output
+        description: "Alignment file of R2 with associated R1 annotations for Bioproduct."
+        example: BioProduct.bam
+        required: false
+        info:
+          template: "[sample_name]_Bioproduct.bam"
+      - name: "--bam_index"
+        type: file
+        direction: output
+        description: "Index file for the alignment file."
+        example: BioProduct.bam.bai
+        required: false
+        info:
+          template: "[sample_name]_Bioproduct.bam.bai"
+      - name: "--bioproduct_stats"
+        type: file
+        direction: output
+        description: "Bioproduct Stats. Metrics from RSEC and DBEC Unique Molecular Identifier adjustment algorithms on a per-bioproduct basis."
+        example: Bioproduct_Stats.csv
+        required: false
+        info:
+          template: "[sample_name]_Bioproduct_Stats.csv"
+      - name: "--dimred_tsne"
+        type: file
+        direction: output
+        description: "t-SNE dimensionality reduction coordinates per cell index"
+        example: tSNE_coordinates.csv
+        required: false
+        info:
+          template: "[sample_name]_(assay)_tSNE_coordinates.csv"
+      - name: "--dimred_umap"
+        type: file
+        direction: output
+        description: "UMAP dimensionality reduction coordinates per cell index"
+        example: UMAP_coordinates.csv
+        required: false
+        info:
+          template: "[sample_name]_(assay)_UMAP_coordinates.csv"
+      - name: "--immune_cell_classification"
+        type: file
+        direction: output
+        description: "Immune Cell Classification. Cell type classification based on the expression of immune cell markers."
+        example: Immune_Cell_Classification.csv
+        required: false
+        info:
+          template: "[sample_name]_(assay)_cell_type_experimental.csv"
+  - name: Multiplex outputs
+    description: Outputs when multiplex option is selected
+    arguments:
+      - name: "--sample_tag_metrics"
+        type: file
+        direction: output
+        description: "Sample Tag Metrics. Metrics from the sample determination algorithm."
+        example: Sample_Tag_Metrics.csv
+        required: false
+        info:
+          template: "[sample_name]_Sample_Tag_Metrics.csv"
+      - name: "--sample_tag_calls"
+        type: file
+        direction: output
+        description: "Sample Tag Calls. Assigned Sample Tag for each putative cell"
+        example: Sample_Tag_Calls.csv
+        required: false
+        info:
+          template: "[sample_name]_Sample_Tag_Calls.csv"
+      - name: "--sample_tag_counts"
+        type: file
+        direction: output
+        description: "Sample Tag Counts. Separate data tables and metric summary for cells assigned to each sample tag. Note: For putative cells that could not be assigned a specific Sample Tag, a Multiplet_and_Undetermined.zip file is also output."
+        example: Sample_Tag1.zip
+        required: false
+        multiple: true
+        info:
+          template: "[sample_name]_Sample_Tag[number].zip"
+      - name: "--sample_tag_counts_unassigned"
+        type: file
+        direction: output
+        description: "Sample Tag Counts Unassigned. Data table and metric summary for cells that could not be assigned a specific Sample Tag."
+        example: Multiplet_and_Undetermined.zip
+        required: false
+        info:
+          template: "[sample_name]_Multiplet_and_Undetermined.zip"
+  - name: VDJ Outputs
+    description: Outputs when VDJ option selected
+    arguments:
+      - name: "--vdj_metrics"
+        type: file
+        direction: output
+        description: "VDJ Metrics. Overall metrics from the VDJ analysis."
+        example: VDJ_Metrics.csv
+        required: false
+        info:
+          template: "[sample_name]_VDJ_Metrics.csv"
+      - name: "--vdj_per_cell"
+        type: file
+        direction: output
+        description: "VDJ Per Cell. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type."
+        example: VDJ_perCell.csv
+        required: false
+        info:
+          template: "[sample_name]_VDJ_perCell.csv"
+      - name: "--vdj_per_cell_uncorrected"
+        type: file
+        direction: output
+        description: "VDJ Per Cell Uncorrected. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type."
+        example: VDJ_perCell_uncorrected.csv
+        required: false
+        info:
+          template: "[sample_name]_VDJ_perCell_uncorrected.csv"
+      - name: "--vdj_dominant_contigs"
+        type: file
+        direction: output
+        description: "VDJ Dominant Contigs. Dominant contig for each cell label chain type combination (putative cells only)."
+        example: VDJ_Dominant_Contigs_AIRR.csv
+        required: false
+        info:
+          template: "[sample_name]_VDJ_Dominant_Contigs_AIRR.csv"
+      - name: "--vdj_unfiltered_contigs"
+        type: file
+        direction: output
+        description: "VDJ Unfiltered Contigs. All contigs that were assembled and annotated successfully (all cells)."
+        example: VDJ_Unfiltered_Contigs_AIRR.csv
+        required: false
+        info:
+          template: "[sample_name]_VDJ_Unfiltered_Contigs_AIRR.csv"
+  - name: "ATAC-Seq outputs"
+    description: Outputs when ATAC-Seq option selected
+    arguments:
+      - name: "--atac_metrics"
+        type: file
+        direction: output
+        description: "ATAC Metrics. Overall metrics from the ATAC-Seq analysis."
+        example: ATAC_Metrics.csv
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Metrics.csv"
+      - name: "--atac_metrics_json"
+        type: file
+        direction: output
+        description: "ATAC Metrics JSON. Overall metrics from the ATAC-Seq analysis in JSON format."
+        example: ATAC_Metrics.json
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Metrics.json"
+      - name: "--atac_fragments"
+        type: file
+        direction: output
+        description: "ATAC Fragments. Chromosomal location, cell index, and read support for each fragment detected"
+        example: ATAC_Fragments.bed.gz
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Fragments.bed.gz"
+      - name: "--atac_fragments_index"
+        type: file
+        direction: output
+        description: "Index of ATAC Fragments."
+        example: ATAC_Fragments.bed.gz.tbi
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Fragments.bed.gz.tbi"
+      - name: "--atac_transposase_sites"
+        type: file
+        direction: output
+        description: "ATAC Transposase Sites. Chromosomal location, cell index, and read support for each transposase site detected"
+        example: ATAC_Transposase_Sites.bed.gz
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Transposase_Sites.bed.gz"
+      - name: "--atac_transposase_sites_index"
+        type: file
+        direction: output
+        description: "Index of ATAC Transposase Sites."
+        example: ATAC_Transposase_Sites.bed.gz.tbi
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Transposase_Sites.bed.gz.tbi"
+      - name: "--atac_peaks"
+        type: file
+        direction: output
+        description: "ATAC Peaks. Peak regions of transposase activity"
+        example: ATAC_Peaks.bed.gz
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Peaks.bed.gz"
+      - name: "--atac_peaks_index"
+        type: file
+        direction: output
+        description: "Index of ATAC Peaks."
+        example: ATAC_Peaks.bed.gz.tbi
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Peaks.bed.gz.tbi"
+      - name: "--atac_peak_annotation"
+        type: file
+        direction: output
+        description: "ATAC Peak Annotation. Estimated annotation of peak-to-gene connections"
+        example: peak_annotation.tsv.gz
+        required: false
+        info:
+          template: "[sample_name]_peak_annotation.tsv.gz"
+      - name: "--atac_cell_by_peak"
+        type: file
+        direction: output
+        description: "ATAC Cell by Peak. Peak regions of transposase activity per cell"
+        example: ATAC_Cell_by_Peak_MEX.zip
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Cell_by_Peak_MEX.zip"
+      - name: "--atac_cell_by_peak_unfiltered"
+        type: file
+        direction: output
+        description: "ATAC Cell by Peak Unfiltered. Unfiltered file containing all cell labels with >=1 transposase sites in peaks."
+        example: ATAC_Cell_by_Peak_Unfiltered_MEX.zip
+        required: false
+        info:
+          template: "[sample_name]_ATAC_Cell_by_Peak_Unfiltered_MEX.zip"
+      - name: "--atac_bam"
+        type: file
+        direction: output
+        description: "ATAC BAM. Alignment file for R1 and R2 with associated I2 annotations for ATAC-Seq. Only output if the BAM generation flag is set to true."
+        example: ATAC.bam
+        required: false
+        info:
+          template: "[sample_name]_ATAC.bam"
+      - name: "--atac_bam_index"
+        type: file
+        direction: output
+        description: "Index of ATAC BAM."
+        example: ATAC.bam.bai
+        required: false
+        info:
+          template: "[sample_name]_ATAC.bam.bai"
+  - name: AbSeq Cell Calling outputs
+    description: Outputs when Cell Calling Abseq is selected
+    arguments:
+      - name: "--protein_aggregates_experimental"
+        type: file
+        direction: output
+        description: "Protein Aggregates Experimental"
+        example: Protein_Aggregates_Experimental.csv
+        required: false
+        info:
+          template: "[sample_name]_Protein_Aggregates_Experimental.csv"
+  - name: Putative Cell Calling Settings
+    arguments:
+      - name: "--cell_calling_data"
+        type: string
+        description: |
+          Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC
+          
+          For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above.
+          
+          For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above.
+          
+          The default data for putative cell calling, will be determined the following way:
+          
+          - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC
+          - If only ATAC Reads exist: ATAC
+          - Otherwise: mRNA
+        choices: [mRNA, AbSeq, ATAC, mRNA_and_ATAC]
+        example: mRNA
+        info:
+          config_key: Cell_Calling_Data
+      - name: "--cell_calling_bioproduct_algorithm"
+        type: string
+        description: |
+          Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined
+          
+          By default, the Basic algorithm will be used for putative cell calling.
+        choices: [Basic, Refined]
+        example: Basic
+        info:
+          config_key: Cell_Calling_Bioproduct_Algorithm
+      - name: "--cell_calling_atac_algorithm"
+        type: string
+        description: |
+          Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined
+          
+          By default, the Basic algorithm will be used for putative cell calling.
+        choices: [Basic, Refined]
+        example: Basic
+        info:
+          config_key: Cell_Calling_ATAC_Algorithm
+      - name: "--exact_cell_count"
+        type: integer
+        description: |
+          Set a specific number (>=1) of cells as putative, based on those with the highest error-corrected read count
+        example: 10000
+        min: 1
+        info:
+          config_key: Exact_Cell_Count
+      - name: "--expected_cell_count"
+        type: integer
+        description: |
+          Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected.  Usually this can be the number of cells loaded into the Rhapsody cartridge.  If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected. 
+        example: 20000
+        min: 1
+        info:
+          config_key: Expected_Cell_Count
+  - name: Intronic Reads Settings
+    arguments:
+      - name: --exclude_intronic_reads
+        type: boolean
+        description: |
+          By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded.
+          The value can be true or false.
+        example: false
+        info:
+          config_key: Exclude_Intronic_Reads
+  - name: Multiplex Settings
+    arguments:
+      - name: "--sample_tags_version"
+        type: string
+        description: |
+          Specify the version of the Sample Tags used in the run:
+
+          * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only
+          * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the "nuclei_includes_mrna" option.
+          * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the "nuclei_atac_only" option.
+        choices: [human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only]
+        example: human
+        info:
+          config_key: Sample_Tags_Version
+      - name: "--tag_names"
+        type: string
+        description: |
+          Specify the tag number followed by '-' and the desired sample name to appear in Sample_Tag_Metrics.csv
+          Do not use the special characters: &, (), [], {},  <>, ?, |
+        multiple: true
+        example: [4-mySample, 9-myOtherSample, 6-alsoThisSample]
+        info:
+          config_key: Tag_Names
+  - name: VDJ arguments
+    arguments:
+      - name: "--vdj_version"
+        type: string
+        description: |
+          If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR
+        choices: [human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR]
+        example: human
+        info:
+          config_key: VDJ_Version
+  - name: ATAC options
+    arguments:
+      - name: "--predefined_atac_peaks"
+        type: file
+        description: An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix.
+        example: predefined_peaks.bed
+        info:
+          config_key: Predefined_ATAC_Peaks
+  - name: Additional options
+    arguments:
+      - name: "--run_name"
+        type: string
+        description: |
+          Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces.
+        default: sample
+        info:
+          config_key: Run_Name
+      - name: "--generate_bam"
+        type: boolean
+        description: |
+          Specify whether to create the BAM file output
+        default: false
+        info:
+          config_key: Generate_Bam
+      - name: "--long_reads"
+        type: boolean
+        description: |
+          Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp.
+        info:
+          config_key: Long_Reads
+  - name: Advanced options
+    description: |
+      NOTE: Only change these if you are really sure about what you are doing
+    arguments:
+      - name: "--custom_star_params"
+        type: string
+        description: |
+          Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline.
+          For reference this is the default that is used:
+
+            Short Reads: `--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000`
+            Long Reads: Same as Short Reads + `--seedPerReadNmax 10000`
+
+          This applies to fastqs provided in the Reads user input 
+          Do NOT set any non-mapping related params like `--genomeDir`, `--outSAMtype`, `--outSAMunmapped`, `--readFilesIn`, `--runThreadN`, etc.
+          We use STAR version 2.7.10b
+        example: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
+        info:
+          config_key: Custom_STAR_Params
+      - name: "--custom_bwa_mem2_params"
+        type: string
+        description: |
+          Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline
+          The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used
+          This applies to fastqs provided in the Reads_ATAC user input 
+          Do NOT set any non-mapping related params like `-C`, `-t`, etc.
+          We use bwa-mem2 version 2.2.1
+        example: "-k 16 -w 200 -r"
+        info:
+          config_key: Custom_bwa_mem2_Params
+  - name: CWL-runner arguments
+    arguments:
+      - name: "--parallel"
+        type: boolean
+        description: "Run jobs in parallel."
+        default: true
+      - name: "--timestamps"
+        type: boolean_true
+        description: "Add timestamps to the errors, warnings, and notifications."
+  - name: Undocumented arguments
+    arguments:
+      - name: --abseq_umi
+        type: integer
+        multiple: false
+        info:
+          config_key: AbSeq_UMI
+      - name: --target_analysis
+        type: boolean
+        multiple: false
+        info:
+          config_key: Target_analysis
+      - name: --vdj_jgene_evalue
+        type: double
+        description: |
+          e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001
+        multiple: false
+        info:
+          config_key: VDJ_JGene_Evalue
+      - name: --vdj_vgene_evalue
+        type: double
+        description: |
+          e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001
+        multiple: false
+        info:
+          config_key: VDJ_VGene_Evalue
+      - name: --write_filtered_reads
+        type: boolean
+        multiple: false
+        info:
+          config_key: Write_Filtered_Reads
+resources:
+  - type: python_script
+    path: script.py
+test_resources:
+  - type: python_script
+    path: test.py
+  - path: ../test_data
+  - path: ../helpers
+
+requirements:
+  commands: [ "cwl-runner" ]
+
+engines:
+  - type: docker
+    image: bdgenomics/rhapsody:2.2.1
+    setup:
+      - type: apt
+        packages: [procps, git]
+      - type: python
+        packages: [cwlref-runner, cwl-runner]
+      - type: docker
+        run: |
+          mkdir /var/bd_rhapsody_cwl && \
+            cd /var/bd_rhapsody_cwl && \
+            git clone https://bitbucket.org/CRSwDev/cwl.git . && \
+            git checkout 8feeace1141b24749ea6003f8e6ad6d3ad5232de
+      - type: docker
+        run:
+          - VERSION=$(ls -v /var/bd_rhapsody_cwl | grep '^v' | sed 's#v##' | tail -1)
+          - 'echo "bdgenomics/rhapsody: \"$VERSION\"" > /var/software_versions.txt'
+    test_setup:
+      - type: python
+        packages: [biopython, gffutils]
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/help.txt b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/help.txt
new file mode 100644
index 00000000..618faa3e
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/help.txt
@@ -0,0 +1,167 @@
+```bash
+cwl-runner src/bd_rhapsody/bd_rhapsody_sequence_analysis/rhapsody_pipeline_2.2.1_nodocker.cwl --help
+```
+
+usage: src/bd_rhapsody/bd_rhapsody_sequence_analysis/rhapsody_pipeline_2.2.1_nodocker.cwl
+       [-h] [--AbSeq_Reference ABSEQ_REFERENCE] [--AbSeq_UMI ABSEQ_UMI]
+       [--Cell_Calling_ATAC_Algorithm CELL_CALLING_ATAC_ALGORITHM]
+       [--Cell_Calling_Bioproduct_Algorithm CELL_CALLING_BIOPRODUCT_ALGORITHM]
+       [--Cell_Calling_Data CELL_CALLING_DATA]
+       [--Custom_STAR_Params CUSTOM_STAR_PARAMS]
+       [--Custom_bwa_mem2_Params CUSTOM_BWA_MEM2_PARAMS]
+       [--Exact_Cell_Count EXACT_CELL_COUNT] [--Exclude_Intronic_Reads]
+       [--Expected_Cell_Count EXPECTED_CELL_COUNT] [--Generate_Bam]
+       [--Long_Reads] [--Maximum_Threads MAXIMUM_THREADS]
+       [--Predefined_ATAC_Peaks PREDEFINED_ATAC_PEAKS] [--Reads READS]
+       [--Reads_ATAC READS_ATAC] [--Reference_Archive REFERENCE_ARCHIVE]
+       [--Run_Name RUN_NAME] [--Sample_Tags_Version SAMPLE_TAGS_VERSION]
+       [--Supplemental_Reference SUPPLEMENTAL_REFERENCE]
+       [--Tag_Names TAG_NAMES] [--Target_analysis]
+       [--Targeted_Reference TARGETED_REFERENCE]
+       [--VDJ_JGene_Evalue VDJ_JGENE_EVALUE]
+       [--VDJ_VGene_Evalue VDJ_VGENE_EVALUE] [--VDJ_Version VDJ_VERSION]
+       [--Write_Filtered_Reads]
+       [job_order]
+
+The BD Rhapsody™ assays are used to create sequencing libraries from single
+cell transcriptomes. After sequencing, the analysis pipeline takes the FASTQ
+files and a reference file for gene alignment. The pipeline generates
+molecular counts per cell, read counts per cell, metrics, and an alignment
+file.
+
+positional arguments:
+  job_order             Job input json file
+
+options:
+  -h, --help            show this help message and exit
+  --AbSeq_Reference ABSEQ_REFERENCE
+                        AbSeq Reference
+  --AbSeq_UMI ABSEQ_UMI
+  --Cell_Calling_ATAC_Algorithm CELL_CALLING_ATAC_ALGORITHM
+                        Specify the ATAC algorithm to be used for ATAC
+                        putative cell calling. The Basic algorithm is the
+                        default.
+  --Cell_Calling_Bioproduct_Algorithm CELL_CALLING_BIOPRODUCT_ALGORITHM
+                        Specify the bioproduct algorithm to be used for
+                        mRNA/AbSeq putative cell calling. The Basic algorithm
+                        is the default.
+  --Cell_Calling_Data CELL_CALLING_DATA
+                        Specify the data to be used for putative cell calling.
+                        The default data for putative cell calling will be
+                        determined the following way: - If mRNA and ATAC Reads
+                        exist, mRNA_and_ATAC is the default. - If only ATAC
+                        Reads exist, ATAC is the default. - Otherwise, mRNA is
+                        the default.
+  --Custom_STAR_Params CUSTOM_STAR_PARAMS
+                        Allows you to specify custom STAR aligner mapping
+                        parameters. Only the mapping parameters you provide
+                        here will be used with STAR, meaning that you must
+                        provide the complete list of parameters that you want
+                        to take effect. For reference, the parameters used by
+                        default in the pipeline are: 1. Short Reads:
+                        --outFilterScoreMinOverLread 0
+                        --outFilterMatchNminOverLread 0
+                        --outFilterMultimapScoreRange 0 --clip3pAdapterSeq
+                        AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+                        --seedSearchStartLmax 50 --outFilterMatchNmin 25
+                        --limitOutSJcollapsed 2000000 2. Long Reads: Same
+                        options as short reads + --seedPerReadNmax 10000
+                        Example input: --alignIntronMax 500000
+                        --outFilterScoreMinOverLread 0 --limitOutSJcollapsed
+                        2000000 Important: 1. This applies to fastqs provided
+                        in the Reads user input 2. Please do not specify any
+                        non-mapping related params like: --runThreadN,
+                        --genomeDir --outSAMtype, etc. 3. Please only use
+                        params supported by STAR version 2.7.10b
+  --Custom_bwa_mem2_Params CUSTOM_BWA_MEM2_PARAMS
+                        Allows you to specify custom bwa-mem2 mapping
+                        parameters. Only the mapping parameters you provide
+                        here will be used with bwa-mem2, meaning that you must
+                        provide the complete list of parameters that you want
+                        to take effect. The pipeline uses program default
+                        mapping parameters. Example input: -k 15 -w 200 -r 2
+                        Important: 1. This applies to fastqs provided in the
+                        Reads_ATAC user input 2. Please do not specify any
+                        non-mapping related params like: -C, -t, etc. 3.
+                        Please only use params supported by bwa-mem2 version
+                        2.2.1
+  --Exact_Cell_Count EXACT_CELL_COUNT
+                        Set a specific number (>=1) of cells as putative,
+                        based on those with the highest error-corrected read
+                        count
+  --Exclude_Intronic_Reads
+                        By default, reads aligned to exons and introns are
+                        considered and represented in molecule counts.
+                        Including intronic reads may increase sensitivity,
+                        resulting in an increase in molecule counts and the
+                        number of genes per cell for both cellular and nuclei
+                        samples. Intronic reads may indicate unspliced mRNAs
+                        and are also useful, for example, in the study of
+                        nuclei and RNA velocity. When set to true, intronic
+                        reads will be excluded.
+  --Expected_Cell_Count EXPECTED_CELL_COUNT
+                        Optional. Guide the basic putative cell calling
+                        algorithm by providing an estimate of the number of
+                        cells expected. Usually this can be the number of
+                        cells loaded into the Rhapsody cartridge. If there are
+                        multiple inflection points on the second derivative
+                        cumulative curve, this will ensure the one selected is
+                        near the expected.
+  --Generate_Bam        Default: false. A Bam read alignment file contains
+                        reads from all the input libraries, but creating it
+                        can consume a lot of compute and disk resources. By
+                        setting this field to true, the Bam file will be
+                        created. This option is shared for both Bioproduct and
+                        ATAC libraries.
+  --Long_Reads          By default, we detect if there are any reads longer
+                        than 650bp and then flag QualCLAlign to use STARlong
+                        instead of STAR. This flag can be explicitly set if it
+                        is known in advance that there are reads longer than
+                        650bp.
+  --Maximum_Threads MAXIMUM_THREADS
+                        The maximum number of threads to use in the pipeline.
+                        By default, all available cores are used.
+  --Predefined_ATAC_Peaks PREDEFINED_ATAC_PEAKS
+                        An optional BED file containing pre-established
+                        chromatin accessibility peak regions for generating
+                        the ATAC cell-by-peak matrix. Only applies to ATAC
+                        assays.
+  --Reads READS         FASTQ files from libraries that may include WTA mRNA,
+                        Targeted mRNA, AbSeq, Sample Multiplexing, and related
+                        technologies
+  --Reads_ATAC READS_ATAC
+                        FASTQ files from libraries generated using the ATAC
+                        assay protocol. Each lane of a library is expected to
+                        have 3 FASTQs - R1, R2 and I1/I2, where the index read
+                        contains the Cell Barcode and UMI sequence. Only
+                        applies to ATAC assays.
+  --Reference_Archive REFERENCE_ARCHIVE
+                        Reference Files Archive
+  --Run_Name RUN_NAME   This is a name for output files, for example
+                        Experiment1_Metrics_Summary.csv. Default if left empty
+                        is to name run based on a library. Any non-alpha
+                        numeric characters will be changed to a hyphen.
+  --Sample_Tags_Version SAMPLE_TAGS_VERSION
+                        The sample multiplexing kit version. This option
+                        should only be set for a multiplexed experiment.
+  --Supplemental_Reference SUPPLEMENTAL_REFERENCE
+                        Supplemental Reference
+  --Tag_Names TAG_NAMES
+                        Specify the Sample Tag number followed by - (hyphen)
+                        and a sample name to appear in the output files. For
+                        example: 4-Ramos. Should be alpha numeric, with + -
+                        and _ allowed. Any special characters: &, (), [], {},
+                        <>, ?, | will be corrected to underscores.
+  --Target_analysis
+  --Targeted_Reference TARGETED_REFERENCE
+                        Targeted Reference
+  --VDJ_JGene_Evalue VDJ_JGENE_EVALUE
+                        The e-value threshold for J gene call by IgBlast/PyIR,
+                        default is set as 0.001
+  --VDJ_VGene_Evalue VDJ_VGENE_EVALUE
+                        The e-value threshold for V gene call by IgBlast/PyIR,
+                        default is set as 0.001
+  --VDJ_Version VDJ_VERSION
+                        The VDJ species and chain types. This option should
+                        only be set for VDJ experiment.
+  --Write_Filtered_Reads
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/pipeline_inputs_template_2.2.1.yaml b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/pipeline_inputs_template_2.2.1.yaml
new file mode 100644
index 00000000..19728a57
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/pipeline_inputs_template_2.2.1.yaml
@@ -0,0 +1,203 @@
+#!/usr/bin/env cwl-runner
+
+cwl:tool: rhapsody
+
+# This is a template YML file used to specify the inputs for a BD Rhapsody Sequence Analysis pipeline run.
+# See the BD Rhapsody Sequence Analysis Pipeline User Guide for more details. Enter the following information:
+
+
+## Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include:
+#   - WTA mRNA
+#   - Targeted mRNA
+#   - AbSeq
+#   - Sample Multiplexing
+#   - VDJ
+# You may specify as many R1/R2 read pairs as you want.
+Reads:
+
+ - class: File
+   location: "test/WTALibrary_S1_L001_R1_001.fastq.gz"
+
+ - class: File
+   location: "test/WTALibrary_S1_L001_R2_001.fastq.gz"
+
+## Reads_ATAC (optional) - Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries.
+## You may specify as many R1/R2/I2 files as you want.
+Reads_ATAC:
+
+ - class: File
+   location: "test/ATACLibrary_S2_L001_R1_001.fastq.gz"
+
+ - class: File
+   location: "test/ATACLibrary_S2_L001_R2_001.fastq.gz"
+
+ - class: File
+   location: "test/ATACLibrary_S2_L001_I2_001.fastq.gz"
+
+
+## Assay type will be inferred from the provided reference(s)
+## Do not provide both Reference_Archive and Targeted_Reference at the same time
+##
+## Valid reference input combinations:
+##   WTA Reference_Archive                                              (WTA only)
+##   WTA Reference_Archive + AbSeq_Reference                            (WTA + AbSeq)
+##   WTA Reference_Archive + Supplemental_Reference                     (WTA + extra transgenes)
+##   WTA Reference_Archive + AbSeq_Reference + Supplemental_Reference   (WTA + AbSeq + extra transgenes)
+##   WTA+ATAC-Seq Reference_Archive                                     (WTA + ATAC, ATAC only)
+##   WTA+ATAC-Seq Reference_Archive + Supplemental_Reference            (WTA + ATAC + extra transgenes)
+##   Targeted_Reference                                                 (Targeted only)
+##   Targeted_Reference + AbSeq_Reference                               (Targeted + AbSeq)
+##   AbSeq_Reference                                                    (AbSeq only)
+
+## See the BD Rhapsody Sequence Analysis Pipeline User Guide for instructions on how to:
+##    - Obtain a pre-built Rhapsody Reference file
+##    - Create a custom Rhapsody Reference file
+
+## WTA Reference_Archive (required for WTA mRNA assay) - Path to Rhapsody WTA Reference in the tar.gz format.
+##
+##   --Structure of reference archive--
+##   BD_Rhapsody_Reference_Files/ # top level folder
+##       star_index/ # sub-folder containing STAR index
+##           [files created with STAR --runMode genomeGenerate]
+##       [GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"]
+##
+## WTA+ATAC-Seq Reference_Archive (required for ATAC-Seq or Multiomic ATAC-Seq (WTA+ATAC-Seq) assays) - Path to Rhapsody WTA+ATAC-Seq Reference in the tar.gz format.
+##
+##   --Structure of reference archive--
+##   BD_Rhapsody_Reference_Files/ # top level folder
+##       star_index/ # sub-folder containing STAR index
+##           [files created with STAR --runMode genomeGenerate]
+##       [GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"]
+##
+##       mitochondrial_contigs.txt # mitochondrial contigs in the reference genome - one contig name per line. e.g. chrMT or chrM, etc.
+##
+##       bwa-mem2_index/ # sub-folder containing bwa-mem2 index
+##          [files created with bwa-mem2 index]
+##
+Reference_Archive:
+  class: File
+  location: "test/RhapRef_Human_WTA_2023-02.tar.gz"
+# location: "test/RhapRef_Human_WTA-ATAC_2023-08.tar.gz"
+
+## Targeted_Reference (required for Targeted mRNA assay) - Path to the targeted reference file in FASTA format.
+#Targeted_Reference:
+# - class: File
+#   location: "test/BD_Rhapsody_Immune_Response_Panel_Hs.fasta"
+
+## AbSeq_Reference (optional) - Path to the AbSeq reference file in FASTA format.  Only needed if BD AbSeq Ab-Oligos are used.
+## For putative cell calling using an AbSeq dataset, please provide an AbSeq reference fasta file as the AbSeq_Reference.
+#AbSeq_Reference:
+# - class: File
+#   location: "test/AbSeq_reference.fasta"
+
+## Supplemental_Reference (optional) - Path to the supplemental reference file in FASTA format.  Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment
+#Supplemental_Reference:
+# - class: File
+#   location: "test/supplemental_reference.fasta"
+
+####################################
+## Putative Cell Calling Settings ##
+####################################
+
+## Putative cell calling dataset (optional) - Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC
+## For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above.
+## For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above.
+## The default data for putative cell calling, will be determined the following way:
+## If mRNA Reads and ATAC Reads exist:
+##    Cell_Calling_Data: mRNA_and_ATAC
+## If only ATAC Reads exist:
+##    Cell_Calling_Data: ATAC
+## Otherwise:
+##    Cell_Calling_Data: mRNA
+#Cell_Calling_Data: mRNA
+
+## Putative cell calling bioproduct algorithm (optional) - Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined
+## By default, the Basic algorithm will be used for putative cell calling.
+#Cell_Calling_Bioproduct_Algorithm: Basic
+
+## Putative cell calling ATAC algorithm (optional) - Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined
+## By default, the Basic algorithm will be used for putative cell calling.
+#Cell_Calling_ATAC_Algorithm: Basic
+
+## Exact cell count (optional) - Set a specific number (>=1) of cells as putative, based on those with the highest error-corrected read count
+#Exact_Cell_Count: 10000
+
+## Expected Cell Count (optional) - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected.  Usually this can be the number of cells loaded into the Rhapsody cartridge.  If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected. 
+#Expected_Cell_Count: 20000
+
+
+####################################
+## Intronic Reads Settings ##
+####################################
+
+## Exclude_Intronic_Reads (optional)
+## By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded.
+## The value can be true or false.
+#Exclude_Intronic_Reads: true
+
+#######################
+## Multiplex options ##
+#######################
+
+## Sample Tags Version (optional) - If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only
+## If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the "nuclei_includes_mrna" option.
+## If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the "nuclei_atac_only" option.
+#Sample_Tags_Version: human
+
+## Tag_Names (optional) - Specify the tag number followed by '-' and the desired sample name to appear in Sample_Tag_Metrics.csv
+# Do not use the special characters: &, (), [], {},  <>, ?, |
+#Tag_Names: [4-mySample, 9-myOtherSample, 6-alsoThisSample]
+
+################
+## VDJ option ##
+################
+
+## VDJ Version (optional) - If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR
+#VDJ_Version: human
+
+##################
+## ATAC options ##
+##################
+
+## Predefined ATAC Peaks - An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix.
+#Predefined_ATAC_Peaks:
+#  class: File
+#  location: "path/predefined_peaks.bed"
+
+########################
+## Additional Options ##
+########################
+
+## Run Name (optional)-  Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces.
+#Run_Name: my-experiment
+
+## Generate Bam (optional, default: false) - Specify whether to create the BAM file output
+#Generate_Bam: true
+
+## Maximum_Threads (integer, optional, default: [use all cores of CPU]) - Set the maximum number of threads to use in the read processing steps of the pipeline:  QualCLAlign, AlignmentAnalysis, VDJ assembly
+#Maximum_Threads: 16
+
+## Use STARlong (optional, default: "auto" - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp.
+## The value can be true or false.
+#Long_Reads: true
+
+########################
+## Advanced Options   ##
+########################
+## NOTE: Only change these if you are really sure about what you are doing
+
+## Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline.
+## For reference this is the default that is used:
+##   Short Reads: --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000
+##   Long Reads: Same as Short Reads + --seedPerReadNmax 10000
+## This applies to fastqs provided in the Reads user input 
+## Do NOT set any non-mapping related params like --genomeDir, --outSAMtype, --outSAMunmapped, --readFilesIn, --runThreadN, etc.
+## We use STAR version 2.7.10b
+#Custom_STAR_Params: --alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000
+
+## Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline
+## The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used
+## This applies to fastqs provided in the Reads_ATAC user input 
+## Do NOT set any non-mapping related params like -C, -t, etc.
+## We use bwa-mem2 version 2.2.1
+#Custom_bwa_mem2_Params: -k 16 -w 200 -r
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/script.py b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/script.py
new file mode 100644
index 00000000..cbddf6bf
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/script.py
@@ -0,0 +1,243 @@
+import os
+import re
+import subprocess
+import tempfile
+from typing import Any
+import yaml
+import shutil
+import glob
+
+## VIASH START
+par = {
+    'reads': [
+        'resources_test/bdrhap_5kjrt/raw/12ABC_S1_L432_R1_001_subset.fastq.gz', 
+        'resources_test/bdrhap_5kjrt/raw/12ABC_S1_L432_R2_001_subset.fastq.gz'
+    ],
+    'reads_atac': None,
+    'reference_archive': "resources_test/reference_gencodev41_chr1/reference_bd_rhapsody.tar.gz",
+    'targeted_reference': [],
+    'abseq_reference': [],
+    'supplemental_reference': [],
+    'output': 'output_dir',
+    'cell_calling_data': None,
+    'cell_calling_bioproduct_algorithm': None,
+    'cell_calling_atac_algorithm': None,
+    'exact_cell_count': None,
+    'expected_cell_count': None,
+    'exclude_intronic_reads': None,
+    'sample_tags_version': None,
+    'tag_names': [],
+    'vdj_version': None,
+    'predefined_atac_peaks': None,
+    'run_name': "sample",
+    'generate_bam': None,
+    'alignment_star_params': None,
+    'alignment_bwa_mem2_params': None,
+    'parallel': True,
+    'timestamps': False,
+    'dryrun': False
+}
+meta = {
+    'config': "target/nextflow/bd_rhaspody/bd_rhaspody_sequence_analysis/.config.vsh.yaml",
+    'resources_dir': os.path.abspath('src/bd_rhaspody/bd_rhaspody_sequence_analysis'),
+    'temp_dir': os.getenv("VIASH_TEMP"),
+    'memory_mb': None,
+    'cpus': None
+}
+## VIASH END
+
+def clean_arg(argument):
+    argument["clean_name"] = argument["name"].lstrip("-")
+    return argument
+
+def read_config(path: str) -> dict[str, Any]:
+    with open(path, 'r') as f:
+        config = yaml.safe_load(f)
+    
+    config["arguments"] = [
+        clean_arg(arg)
+        for grp in config["argument_groups"]
+        for arg in grp["arguments"]
+    ]
+    
+    return config
+
+def strip_margin(text: str) -> str:
+    return re.sub('(\n?)[ \t]*\|', '\\1', text)
+
+def process_params(par: dict[str, Any], config, temp_dir: str) -> str:
+    # check input parameters
+    assert par["reads"] or par["reads_atac"], "Pass at least one set of inputs to --reads or --reads_atac."
+
+    # output to temp dir if output_dir was not passed
+    if not par["output_dir"]:
+        par["output_dir"] = os.path.join(temp_dir, "output")
+
+    # checking sample prefix
+    if par["run_name"] and re.match("[^A-Za-z0-9]", par["run_name"]):
+        print("--run_name should only consist of letters, numbers or hyphens. Replacing all '[^A-Za-z0-9]' with '-'.", flush=True)
+        par["run_name"] = re.sub("[^A-Za-z0-9\\-]", "-", par["run_name"])
+
+    # make paths absolute
+    for argument in config["arguments"]:
+        arg_clean_name = argument["clean_name"]
+        if not par[arg_clean_name] or not argument["type"] == "file":
+            continue
+        par_value = par[arg_clean_name]
+        if isinstance(par_value, list):
+            par_value_absolute = list(map(os.path.abspath, par_value))
+        else:
+            par_value_absolute = os.path.abspath(par_value)
+        par[arg_clean_name] = par_value_absolute
+    
+    return par
+
+def generate_config(par: dict[str, Any], config) -> str:
+    content_list = [strip_margin(f"""\
+        |#!/usr/bin/env cwl-runner
+        |
+        |cwl:tool: rhapsody
+        |""")]
+
+    for argument in config["arguments"]:
+        arg_clean_name = argument["clean_name"]
+        arg_par_value = par[arg_clean_name]
+        arg_info = argument.get("info") or {} # Note: .info might be None
+        config_key = arg_info.get("config_key")
+        if arg_par_value and config_key:
+
+            if argument["type"] == "file":
+                content = strip_margin(f"""\
+                    |{config_key}:
+                    |""")
+                if isinstance(arg_par_value, list):
+                    for file in arg_par_value:
+                        content += strip_margin(f"""\
+                            | - class: File
+                            |   location: "{file}"
+                            |""")
+                else:
+                    content += strip_margin(f"""\
+                        |   class: File
+                        |   location: "{arg_par_value}"
+                        |""")
+                content_list.append(content)
+            else:
+                content_list.append(strip_margin(f"""\
+                    |{config_key}: {arg_par_value}
+                    |"""))
+
+    ## Write config to file
+    return ''.join(content_list)
+
+def generate_config_file(par: dict[str, Any], config: dict[str, Any], temp_dir: str) -> str:
+    config_file = os.path.join(temp_dir, "config.yml")
+    config_content = generate_config(par, config)
+    with open(config_file, "w") as f:
+        f.write(config_content)
+    return config_file
+
+def generate_cwl_file(meta: dict[str, Any], dir: str) -> str:
+    # create cwl file (if need be)
+    # orig_cwl_file=os.path.join(meta["resources_dir"], "rhapsody_pipeline_2.2.1_nodocker.cwl")
+    orig_cwl_file="/var/bd_rhapsody_cwl/v2.2.1/rhapsody_pipeline_2.2.1.cwl"
+
+    if not meta["memory_mb"] and not meta["cpus"]:
+        return os.path.abspath(orig_cwl_file)
+     
+    # Inject computational requirements into pipeline
+    cwl_file = os.path.join(dir, "pipeline.cwl")
+
+    # Read in the file
+    with open(orig_cwl_file, 'r') as file :
+        cwl_data = file.read()
+
+    # Inject computational requirements into pipeline
+    if meta["memory_mb"]:
+        memory = int(meta["memory_mb"]) - 2000 # keep 2gb for OS
+        cwl_data = re.sub('"ramMin": [^\n]*[^,](,?)\n', f'"ramMin": {memory}\\1\n', cwl_data)
+    if meta["cpus"]:
+        cwl_data = re.sub('"coresMin": [^\n]*[^,](,?)\n', f'"coresMin": {meta["cpus"]}\\1\n', cwl_data)
+
+    # Write the file out again
+    with open(cwl_file, 'w') as file:
+        file.write(cwl_data)
+        
+    return os.path.abspath(cwl_file)
+
+def copy_outputs(par: dict[str, Any], config: dict[str, Any]):
+    for arg in config["arguments"]:
+        par_value = par[arg["clean_name"]]
+        if par_value and arg["type"] == "file" and arg["direction"] == "output":
+            # example template: '[sample_name]_(assay)_cell_type_experimental.csv'
+            template = (arg.get("info") or {}).get("template") # Note: .info might be None
+            if template:
+                template_glob = template\
+                    .replace("[sample_name]", par["run_name"])\
+                    .replace("(assay)", "*")\
+                    .replace("[number]", "*")
+                files = glob.glob(os.path.join(par["output_dir"], template_glob))
+                if not files and arg["required"]:
+                    raise ValueError(f"Expected output file '{template_glob}' not found.")
+                elif len(files) > 1 and not arg["multiple"]:
+                    raise ValueError(f"Expected single output file '{template_glob}', but found multiple.")
+                
+                if not arg["multiple"]:
+                    shutil.copy(files[0], par_value)
+                else:
+                    # replace '*' in par_value with index
+                    for i, file in enumerate(files):
+                        shutil.copy(file, par_value.replace("*", str(i)))
+
+
+def main(par: dict[str, Any], meta: dict[str, Any], temp_dir: str):
+    config = read_config(meta["config"])
+    
+    # Preprocess params
+    par = process_params(par, config, temp_dir)
+
+    ## Process parameters
+    cmd = [
+        "cwl-runner",
+        "--no-container",
+        "--preserve-entire-environment",
+        "--outdir", par["output_dir"],
+    ]
+
+    if par["parallel"]:
+        cmd.append("--parallel")
+
+    if par["timestamps"]:
+        cmd.append("--timestamps")
+
+    # Create cwl file (if need be)
+    cwl_file = generate_cwl_file(meta, temp_dir)
+    cmd.append(cwl_file)
+
+    # Create params file
+    config_file = generate_config_file(par, config, temp_dir)
+    cmd.append(config_file)
+    
+    # keep environment variables but set TMPDIR to temp_dir
+    env = dict(os.environ)
+    env["TMPDIR"] = temp_dir
+
+    # Create output dir if not exists
+    if not os.path.exists(par["output_dir"]):
+        os.makedirs(par["output_dir"])
+
+    # Run command
+    print("> " + ' '.join(cmd), flush=True)
+    _ = subprocess.run(
+        cmd,
+        cwd=os.path.dirname(config_file),
+        env=env,
+        check=True
+    )
+
+    # Copy outputs
+    copy_outputs(par, config)
+
+if __name__ == "__main__":
+    with tempfile.TemporaryDirectory(prefix="cwl-bd_rhapsody-", dir=meta["temp_dir"]) as temp_dir:
+        main(par, meta, temp_dir)
diff --git a/src/bd_rhapsody/bd_rhapsody_sequence_analysis/test.py b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/test.py
new file mode 100644
index 00000000..aed8e80b
--- /dev/null
+++ b/src/bd_rhapsody/bd_rhapsody_sequence_analysis/test.py
@@ -0,0 +1,494 @@
+import subprocess
+import gzip
+from pathlib import Path
+from typing import Tuple
+import numpy as np
+import random
+import mudata as md
+
+## VIASH START
+meta = {
+  "name": "bd_rhapsody_sequence_analysis",
+  "executable": "target/docker/bd_rhapsody/bd_rhapsody_sequence_analysis/bd_rhapsody_sequence_analysis",
+  "resources_dir": "src/bd_rhapsody",
+  "cpus": 8,
+  "memory_mb": 4096,
+}
+## VIASH END
+
+import sys
+sys.path.append(meta["resources_dir"])
+
+from helpers.rhapsody_cell_label import index_to_sequence
+
+meta["executable"] = Path(meta["executable"])
+meta["resources_dir"] = Path(meta["resources_dir"])
+
+#########################################################################################
+
+# Generate index
+print("> Generate index", flush=True)
+# cwl_file = meta["resources_dir"] / "bd_rhapsody_make_reference.cwl"
+cwl_file = "/var/bd_rhapsody_cwl/v2.2.1/Extra_Utilities/make_rhap_reference_2.2.1.cwl"
+reference_small_gtf = meta["resources_dir"] / "test_data" / "reference_small.gtf"
+reference_small_fa = meta["resources_dir"] / "test_data" / "reference_small.fa"
+bdabseq_panel_fa = meta["resources_dir"] / "test_data" / "BDAbSeq_ImmuneDiscoveryPanel.fasta"
+sampletagsequences_fa = meta["resources_dir"] / "test_data" / "SampleTagSequences_HomoSapiens_ver1.fasta"
+
+config_file = Path("reference_config.yml")
+reference_file = Path("Rhap_reference.tar.gz")
+
+subprocess.run([
+    "cwl-runner", 
+    "--no-container",
+    "--preserve-entire-environment",
+    "--outdir",
+    ".",
+    str(cwl_file),
+    "--Genome_fasta",
+    str(reference_small_fa),
+    "--Gtf", 
+    str(reference_small_gtf),
+    "--Extra_STAR_params",
+    "--genomeSAindexNbases 4"
+])
+
+#########################################################################################
+# Load reference in memory
+
+from Bio import SeqIO
+import gffutils
+
+# Load FASTA sequence
+with open(str(reference_small_fa), "r") as handle:
+  reference_fasta_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
+with open(str(bdabseq_panel_fa), "r") as handle:
+  bdabseq_panel_fasta_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
+with open(str(sampletagsequences_fa), "r") as handle:
+  sampletagsequences_fasta_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
+
+# create in memory db
+reference_gtf_db = gffutils.create_db(
+  str(reference_small_gtf),
+  dbfn=":memory:",
+  force=True,
+  keep_order=True,
+  merge_strategy="merge",
+  sort_attribute_values=True,
+  disable_infer_transcripts=True,
+  disable_infer_genes=True
+)
+
+#############################################
+# TODO: move helper functions to separate helper file
+
+
+def generate_bd_read_metadata(
+  instrument_id: str = "A00226",
+  run_id: str = "970",
+  flowcell_id: str = "H5FGVMXY",
+  lane: int = 1,
+  tile: int = 1101,
+  x: int = 1000,
+  y: int = 1000,
+  illumina_flag: str = "1:N:0",
+  sample_id: str = "CAGAGAGG",
+) -> str:
+  """
+  Generate a FASTQ metadata line for a BD Rhapsody FASTQ file.
+
+  Args:
+    instrument_id: The instrument ID.
+    run_id: The run ID.
+    flowcell_id: The flowcell ID.
+    lane: The lane number.
+    tile: The tile number. Between 1101 and 1112 in the used example data.
+    x: The x-coordinate. Between 1000 and 32967 in the used example data.
+    y: The y-coordinate. Between 1000 and 37059 in the used example data.
+    illumina_flag: The Illumina flag. Either 1:N:0 or 2:N:0 in the used example data.
+    sample_id: The sample ID.
+  """
+  # format: @A00226:970:H5FGVDMXY:1:1101:2645:1000 2:N:0:CAGAGAGG
+  return f"@{instrument_id}:{run_id}:{flowcell_id}:{lane}:{tile}:{x}:{y} {illumina_flag}:{sample_id}"
+
+
+def generate_bd_wta_transcript(
+  transcript_length: int = 42,
+) -> str:
+  """
+  Generate a WTA transcript from a given GTF and FASTA file.
+  """
+
+  # Randomly select a gene
+  gene = random.choice(list(reference_gtf_db.features_of_type("gene")))
+
+  # Find all exons within the gene
+  exons = list(reference_gtf_db.children(gene, featuretype="exon", order_by="start"))
+
+  # Calculate total exon length
+  total_exon_length = sum(exon.end - exon.start + 1 for exon in exons)
+
+  # If total exon length is less than desired transcript length, use it as is
+  max_transcript_length = min(total_exon_length, transcript_length)
+
+  # Build the WTA transcript sequence
+  sequence = ""
+  for exon in exons:
+    exon_seq = str(reference_fasta_dict[exon.seqid].seq[exon.start - 1 : exon.end])  
+    sequence += exon_seq
+
+    # Break if desired length is reached
+    if len(sequence) >= max_transcript_length:
+      sequence = sequence[:max_transcript_length]
+      break
+  
+  # add padding if need be
+  if len(sequence) < max_transcript_length:
+    sequence += "N" * (max_transcript_length - len(sequence))
+
+  return sequence
+
+
+def generate_bd_wta_read(
+  cell_index: int = 0,
+  bead_version: str = "EnhV2",
+  umi_length: int = 14,
+  transcript_length: int = 42,
+) -> Tuple[str, str]:
+  """
+  Generate a BD Rhapsody WTA read pair for a given cell index.
+
+  Args:
+    cell_index: The cell index to generate reads for.
+    bead_version: The bead version to use for generating the cell label.
+    umi_length: The length of the UMI to generate.
+    transcript_length: The length of the transcript to generate
+
+  Returns:
+    A tuple of two strings, the first string being the R1 read and the second string being the R2 read.
+
+  More info:
+    
+    See structure of reads:
+    - https://bd-rhapsody-bioinfo-docs.genomics.bd.com/steps/top_steps.html
+    - https://bd-rhapsody-bioinfo-docs.genomics.bd.com/steps/steps_cell_label.html
+    - https://scomix.bd.com/hc/en-us/articles/360057714812-All-FAQ
+    R1 is Cell Label + UMI + PolyT -> 60 bp
+      actually, CLS1 + "GTGA" + CLS2 + "GACA" + CLS3 + UMI
+    R2 is the actual read -> 42 bp
+
+    Example R1
+    CLS1       Link CLS2      Link CLS3       UMI
+    AAAATCCTGT GTGA AACCAAAGT GACA GATAGAGGAG CGCATGTTTATAAC
+  """
+  
+  # generate metadata
+  per_row = np.floor((32967 - 1000) / 9)
+  per_col = np.floor((37059 - 1000) / 9)
+  
+  assert cell_index >= 0 and cell_index < per_row * per_col, f"cell_index must be between 0 and {per_row} * {per_col}"
+  x = 1000 + (cell_index % per_row) * 9
+  y = 1000 + (cell_index // per_row) * 9
+  instrument_id = "A00226"
+  run_id = "970"
+  flowcell_id = "H5FGVMXY"
+  meta_r1 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="1:N:0")
+  meta_r2 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="2:N:0")
+
+  # generate r1 (cls1 + link + cls2 + link + cls3 + umi)
+  assert cell_index >= 0 and cell_index < 384 * 384 * 384
+  cell_label = index_to_sequence(cell_index + 1, bead_version=bead_version)
+  # sample random umi
+  umi = "".join(random.choices("ACGT", k=umi_length))
+  quality_r1 = "I" * (len(cell_label) + len(umi))
+  r1 = f"{meta_r1}\n{cell_label}{umi}\n+\n{quality_r1}\n"
+
+  # generate r2 by extracting sequence from fasta and gtf
+  wta_transcript = generate_bd_wta_transcript(transcript_length=transcript_length)
+  quality_r2 = "I" * transcript_length
+  r2 = f"{meta_r2}\n{wta_transcript}\n+\n{quality_r2}\n"
+
+  return r1, r2
+
+def generate_bd_wta_fastq_files(
+  num_cells: int = 100,
+  num_reads_per_cell: int = 1000,
+) -> Tuple[str, str]:
+  """
+  Generate BD Rhapsody WTA FASTQ files for a given number of cells and transcripts per cell.
+
+  Args:
+    num_cells: The number of cells to generate
+    num_reads_per_cell: The number of reads to generate per cell
+
+  Returns:
+    A tuple of two strings, the first string being the R1 reads and the second string being the R2 reads.
+  """
+  r1_reads = ""
+  r2_reads = ""
+  for cell_index in range(num_cells):
+    for _ in range(num_reads_per_cell):
+      r1, r2 = generate_bd_wta_read(cell_index)
+      r1_reads += r1
+      r2_reads += r2
+
+  return r1_reads, r2_reads
+
+def generate_bd_abc_read(
+  cell_index: int = 0,
+  bead_version: str = "EnhV2",
+  umi_length: int = 14,
+  transcript_length: int = 72,
+) -> Tuple[str, str]:
+  """
+  Generate a BD Rhapsody ABC read pair for a given cell index.
+
+  Args:
+    cell_index: The cell index to generate reads for.
+    bead_version: The bead version to use for generating the cell label.
+    umi_length: The length of the UMI to generate.
+    transcript_length: The length of the transcript to generate
+
+  Returns:
+    A tuple of two strings, the first string being the R1 read and the second string being the R2 read.
+  """
+  # generate metadata
+  per_row = np.floor((32967 - 1000) / 9)
+  per_col = np.floor((37059 - 1000) / 9)
+  
+  assert cell_index >= 0 and cell_index < per_row * per_col, f"cell_index must be between 0 and {per_row} * {per_col}"
+  x = 1000 + (cell_index % per_row) * 9
+  y = 1000 + (cell_index // per_row) * 9
+  instrument_id = "A01604"
+  run_id = "19"
+  flowcell_id = "HMKLYDRXY"
+  meta_r1 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="1:N:0")
+  meta_r2 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="2:N:0")
+
+  # generate r1 (cls1 + link + cls2 + link + cls3 + umi)
+  assert cell_index >= 0 and cell_index < 384 * 384 * 384
+  cell_label = index_to_sequence(cell_index + 1, bead_version=bead_version)
+  # sample random umi
+  umi = "".join(random.choices("ACGT", k=umi_length))
+  quality_r1 = "I" * (len(cell_label) + len(umi))
+  r1 = f"{meta_r1}\n{cell_label}{umi}\n+\n{quality_r1}\n"
+
+  # generate r2 by sampling sequence from bdabseq_panel_fa
+  abseq_seq = str(random.choice(list(bdabseq_panel_fasta_dict.values())).seq)
+  abc_suffix = "AAAAAAAAAAAAAAAAAAAAAAA"
+  abc_data = abseq_seq[:transcript_length - len(abc_suffix) - 1]
+  abc_prefix = "N" + "".join(random.choices("ACGT", k=transcript_length - len(abc_data) - len(abc_suffix) - 1))
+
+  abc_transcript = f"{abc_prefix}{abc_data}{abc_suffix}"
+
+  quality_r2 = "#" + "I" * (len(abc_transcript) - 1)
+  r2 = f"{meta_r2}\n{abc_transcript}\n+\n{quality_r2}\n"
+
+  return r1, r2
+
+def generate_bd_abc_fastq_files(
+  num_cells: int = 100,
+  num_reads_per_cell: int = 1000,
+) -> Tuple[str, str]:
+  """
+  Generate BD Rhapsody ABC FASTQ files for a given number of cells and transcripts per cell.
+
+  Args:
+    num_cells: The number of cells to generate
+    num_reads_per_cell: The number of reads to generate per cell
+
+  Returns:
+    A tuple of two strings, the first string being the R1 reads and the second string being the R2 reads.
+  """
+  r1_reads = ""
+  r2_reads = ""
+  for cell_index in range(num_cells):
+    for _ in range(num_reads_per_cell):
+      r1, r2 = generate_bd_abc_read(cell_index)
+      r1_reads += r1
+      r2_reads += r2
+
+  return r1_reads, r2_reads
+
+def generate_bd_smk_read(
+  cell_index: int = 0,
+  bead_version: str = "EnhV2",
+  umi_length: int = 14,
+  transcript_length: int = 72,
+  num_sample_tags: int = 3,
+):
+  """
+  Generate a BD Rhapsody SMK read pair for a given cell index.
+
+  Args:
+    cell_index: The cell index to generate reads for.
+    bead_version: The bead version to use for generating the cell label.
+    umi_length: The length of the UMI to generate.
+    transcript_length: The length of the transcript to generate
+    num_sample_tags: The number of sample tags to use
+
+  Returns:
+    A tuple of two strings, the first string being the R1 read and the second string being the R2 read.
+  """
+  # generate metadata
+  per_row = np.floor((32967 - 1000) / 9)
+  per_col = np.floor((37059 - 1000) / 9)
+  
+  assert cell_index >= 0 and cell_index < per_row * per_col, f"cell_index must be between 0 and {per_row} * {per_col}"
+  x = 1000 + (cell_index % per_row) * 9
+  y = 1000 + (cell_index // per_row) * 9
+  instrument_id = "A00226"
+  run_id = "970"
+  flowcell_id = "H5FGVDMXY"
+  
+  meta_r1 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="1:N:0")
+  meta_r2 = generate_bd_read_metadata(instrument_id=instrument_id, run_id=run_id, flowcell_id=flowcell_id, x=x, y=y, illumina_flag="2:N:0")
+
+  # generate r1 (cls1 + link + cls2 + link + cls3 + umi)
+  assert cell_index >= 0 and cell_index < 384 * 384 * 384
+  cell_label = index_to_sequence(cell_index + 1, bead_version=bead_version)
+  # sample random umi
+  umi = "".join(random.choices("ACGT", k=umi_length))
+  quality_r1 = "I" * (len(cell_label) + len(umi))
+  r1 = f"{meta_r1}\n{cell_label}{umi}\n+\n{quality_r1}\n"
+
+  # generate r2 by selecting the cell_index %% num_sample_tags sample tags
+  sampletag_index = cell_index % num_sample_tags
+  sampletag_seq = str(list(sampletagsequences_fasta_dict.values())[sampletag_index].seq)
+  smk_data = sampletag_seq[:transcript_length]
+  smk_suffix = "A" * (transcript_length - len(smk_data))
+  quality_r2 = "I" * len(smk_data) + "#" * len(smk_suffix)
+  r2 = f"{meta_r2}\n{smk_data}{smk_suffix}\n+\n{quality_r2}\n"
+
+  return r1, r2
+
+def generate_bd_smk_fastq_files(
+  num_cells: int = 100,
+  num_reads_per_cell: int = 1000,
+  num_sample_tags: int = 3,
+) -> Tuple[str, str]:
+  """
+  Generate BD Rhapsody SMK FASTQ files for a given number of cells and transcripts per cell.
+
+  Args:
+    num_cells: The number of cells to generate
+    num_reads_per_cell: The number of reads to generate per cell
+    num_sample_tags: The number of sample tags to use
+
+  Returns:
+    A tuple of two strings, the first string being the R1 reads and the second string being the R2 reads.
+  """
+  r1_reads = ""
+  r2_reads = ""
+  for cell_index in range(num_cells):
+    for _ in range(num_reads_per_cell):
+      r1, r2 = generate_bd_smk_read(cell_index, num_sample_tags=num_sample_tags)
+      r1_reads += r1
+      r2_reads += r2
+
+  return r1_reads, r2_reads
+
+#########################################################################################
+
+# Prepare WTA, ABC, and SMK test data
+print("> Prepare WTA test data", flush=True)
+wta_reads_r1_str, wta_reads_r2_str = generate_bd_wta_fastq_files(num_cells=100, num_reads_per_cell=1000)
+with gzip.open("WTAreads_R1.fq.gz", "wt") as f:
+  f.write(wta_reads_r1_str)
+with gzip.open("WTAreads_R2.fq.gz", "wt") as f:
+  f.write(wta_reads_r2_str)
+
+print("> Prepare ABC test data", flush=True)
+abc_reads_r1_str, abc_reads_r2_str = generate_bd_abc_fastq_files(num_cells=100, num_reads_per_cell=1000)
+with gzip.open("ABCreads_R1.fq.gz", "wt") as f:
+  f.write(abc_reads_r1_str)
+with gzip.open("ABCreads_R2.fq.gz", "wt") as f:
+  f.write(abc_reads_r2_str)
+
+print("> Prepare SMK test data", flush=True)
+smk_reads_r1_str, smk_reads_r2_str = generate_bd_smk_fastq_files(num_cells=100, num_reads_per_cell=1000, num_sample_tags=3)
+with gzip.open("SMKreads_R1.fq.gz", "wt") as f:
+  f.write(smk_reads_r1_str)
+with gzip.open("SMKreads_R2.fq.gz", "wt") as f:
+  f.write(smk_reads_r2_str)
+
+#########################################################################################
+
+# Run executable
+print(f">> Run {meta['name']}", flush=True)
+output_dir = Path("output")
+subprocess.run([
+  meta['executable'],
+  "--reads=WTAreads_R1.fq.gz;WTAreads_R2.fq.gz",
+  f"--reference_archive={reference_file}",
+  "--reads=ABCreads_R1.fq.gz;ABCreads_R2.fq.gz",
+  f"--abseq_reference={bdabseq_panel_fa}",
+  "--reads=SMKreads_R1.fq.gz;SMKreads_R2.fq.gz",
+  "--tag_names=1-Sample1;2-Sample2;3-Sample3",
+  "--sample_tags_version=human",
+  "--output_dir=output",
+  "--exact_cell_count=100",
+  f"---cpus={meta['cpus'] or 1}",
+  f"---memory={meta['memory_mb'] or 2048}mb",
+  # "--output_seurat=seurat.rds",
+  "--output_mudata=mudata.h5mu",
+  "--metrics_summary=metrics_summary.csv",
+  "--pipeline_report=pipeline_report.html",
+])
+
+
+# Check if output exists
+print(">> Check if output exists", flush=True)
+assert (output_dir / "sample_Bioproduct_Stats.csv").exists()
+assert (output_dir / "sample_Metrics_Summary.csv").exists()
+assert (output_dir / "sample_Pipeline_Report.html").exists()
+assert (output_dir / "sample_RSEC_MolsPerCell_MEX.zip").exists()
+assert (output_dir / "sample_RSEC_MolsPerCell_Unfiltered_MEX.zip").exists()
+# seurat object is not generated when abc data is added
+# assert (output_dir / "sample_Seurat.rds").exists()
+assert (output_dir / "sample.h5mu").exists()
+
+# check individual outputs
+# assert Path("seurat.rds").exists()
+assert Path("mudata.h5mu").exists()
+assert Path("metrics_summary.csv").exists()
+assert Path("pipeline_report.html").exists()
+
+print(">> Check contents of output", flush=True)
+data = md.read_h5mu("mudata.h5mu")
+
+assert data.n_obs == 100, "Number of cells is incorrect"
+assert "rna" in data.mod, "RNA data is missing"
+assert "prot" in data.mod, "Protein data is missing"
+
+# check rna data
+data_rna = data.mod["rna"]
+assert data_rna.n_vars == 1, "Number of genes is incorrect"
+assert data_rna.X.sum(axis=1).min() > 950, "Number of reads per cell is incorrect"
+# assert data_rna.var.Raw_Reads.sum() == 100000, "Number of reads is incorrect"
+assert data_rna.var.Raw_Reads.sum() >= 99990 and data_rna.var.Raw_Reads.sum() <= 100010, \
+  f"Expected 100000 RNA reads, got {data_rna.var.Raw_Reads.sum()}"
+
+# check prot data
+data_prot = data.mod["prot"]
+assert data_prot.n_vars == len(bdabseq_panel_fasta_dict), "Number of proteins is incorrect"
+assert data_prot.X.sum(axis=1).min() > 950, "Number of reads per cell is incorrect"
+assert data_prot.var.Raw_Reads.sum() >= 99990 and data_prot.var.Raw_Reads.sum() <= 100010, \
+  f"Expected 100000 Prot reads, got {data_prot.var.Raw_Reads.sum()}"
+
+
+# check smk data
+expected_sample_tags = (["SampleTag01_hs", "SampleTag02_hs", "SampleTag03_hs"] * 34)[:100]
+expected_sample_names = (["Sample1", "Sample2", "Sample3"] * 34)[:100]
+sample_tags = data_rna.obs["Sample_Tag"]
+assert sample_tags.nunique() == 3, "Number of sample tags is incorrect"
+assert sample_tags.tolist() == expected_sample_tags, "Sample tags are incorrect"
+sample_names = data_rna.obs["Sample_Name"]
+assert sample_names.nunique() == 3, "Number of sample names is incorrect"
+assert sample_names.tolist() == expected_sample_names, "Sample names are incorrect"
+
+# TODO: add VDJ, ATAC, and targeted RNA to test
+
+#########################################################################################
+
+print("> Test successful", flush=True)
diff --git a/src/bd_rhapsody/helpers/rhapsody_cell_label.py b/src/bd_rhapsody/helpers/rhapsody_cell_label.py
new file mode 100644
index 00000000..601ce7be
--- /dev/null
+++ b/src/bd_rhapsody/helpers/rhapsody_cell_label.py
@@ -0,0 +1,405 @@
+#!/usr/bin/env python
+
+# copied from https://bd-rhapsody-public.s3.amazonaws.com/CellLabel/rhapsody_cell_label.py.txt
+# documented at https://bd-rhapsody-bioinfo-docs.genomics.bd.com/steps/steps_cell_label.html
+
+"""
+Rhapsody cell label structure
+Information on the cell label is captured by the combination of bases in three cell label sections (CLS1, CLS2, CLS3).
+Two common linker sequences (L1, L2) separate the three CLS.
+
+--CLS1---|-L1-|--CLS2---|-L2-|--CL3---|--UMI---|-CaptureSequence-
+
+
+Each cell label section has a whitelist of 96 or 384 possible 9 base sequences.
+All the capture oligos from a single bead will have the same cell label.
+
+----------------
+
+V1 beads:
+
+[A96_cell_key1] + [v1_linker1] + [A96_cell_key2] + [v1_linker2] + [A96_cell_key3] + [8 random base UMI] + [18 base polyT capture]
+
+
+----------------
+
+Enhanced beads:
+Enhanced beads contain two different capture oligo types, polyT and 5prime.  On any one bead, the two different capture oligo types have the same cell label sequences.
+Compared to the V1 bead, enhanced beads have shorter linker sequences, longer polyT, and 0-3 diversity insert bases at the beginning of the sequence.
+The cell label sections use the same 3 sequence whitelists as V1 beads.
+
+polyT capture oligo:
+[Enh_insert 0-3 bases] + [A96_cell_key1] + [Enh_linker1] + [A96_cell_key2] + [Enh_linker2] + [A96_cell_key3] + [8 random base UMI] + [25 base polyT capture]
+
+5prime capture oligo:
+[Enh_5p_primer] + [A96_cell_key1] + [Enh_5p_linker1] + [A96_cell_key2] + [Enh_5p_linker2] + [A96_cell_key3] + [8 random base UMI] + [Tso_capture_seq]
+
+
+----------------
+
+Enhanced V2/V3 beads:
+Enhanced V2/V3 beads have the same structure as Enhanced beads, but the cell label sections have been updated with increased diversity
+
+
+polyT capture oligo:
+[Enh_insert 0-3 bases] + [B384_cell_key1] + [Enh_linker1] + [B384_cell_key2] + [Enh_linker2] + [B384_cell_key3] + [8 random base UMI] + [25 base polyT capture]
+
+5prime capture oligo:
+[Enh_5p_primer] + [B384_cell_key1] + [Enh_5p_linker1] + [B384_cell_key2] + [Enh_5p_linker2] + [B384_cell_key3] + [8 random base UMI] + [Tso_capture_seq]
+
+
+The only difference between Enh V2 and Enh V3 beads is a different Tso_capture_seq.
+
+----------------
+
+The Rhapsody Sequence Analysis Pipeline will convert each cell label into a single integer representing a unique cell label sequence - which is used in the output files as the 'Cell_index'.
+This cell index integer is deterministic and derived from the 3 part cell label as follows: 
+
+- Get the 1-based index for each cell label section from the python sets of sequences below
+- Apply this equation:
+    (CLS1index - 1) * 384 * 384 + (CLS2index - 1) * 384 + CLS3index
+
+(See label_sections_to_index() function below)
+
+
+Example: Enhanced bead sequence:
+ACACATTGCAGTGAAGATAGTTCGACACTCAAGACA
+
+Each part identified:
+A                CACATTGCA          GTGA      AGATAGTTC          GACA      CTCAAGACA
+DiversityInsert  A96_cell_key1-33   Linker1   A96_cell_key2-78   Linker2   A96_cell_key3-21
+
+33-78-21
+(33 - 1) * 384 * 384 + (78 - 1) * 384 + 21
+=4748181
+
+
+The original sequences of cell label can be determined from the cell index integer by reversing this conversion.
+See index_to_label_sections() and index_to_sequence() functions below.
+
+"""
+
+v1_linker1 = 'ACTGGCCTGCGA'
+v1_linker2 = 'GGTAGCGGTGACA'
+
+Enh_linker1 = 'GTGA'
+Enh_linker2 = 'GACA'
+
+Enh_5p_primer = "ACAGGAAACTCATGGTGCGT"
+
+Enh_5p_linker1 = "AATG"
+Enh_5p_linker2 = "CCAC"
+
+Enh_inserts = ["", "A", "GT", "TCA"]
+
+Tso_capture_seq_Enh_EnhV2 = "TATGCGTAGTAGGTATG"
+Tso_capture_seq_EnhV3 = "GTGGAGTCGTGATTATA"
+
+A96_cell_key1 = ("GTCGCTATA","CTTGTACTA","CTTCACATA","ACACGCCGG","CGGTCCAGG","AATCGAATG","CCTAGTATA","ATTGGCTAA","AAGACATGC","AAGGCGATC",
+                 "GTGTCCTTA","GGATTAGGA","ATGGATCCA","ACATAAGCG","AACTGTATT","ACCTTGCGG","CAGGTGTAG","AGGAGATTA","GCGATTACA","ACCGGATAG",
+                 "CCACTTGGA","AGAGAAGTT","TAAGTTCGA","ACGGATATT","TGGCTCAGA","GAATCTGTA","ACCAAGGAC","AGTATCTGT","CACACACTA","ATTAAGTGC",
+                 "AAGTAACCC","AAATCCTGT","CACATTGCA","GCACTGTCA","ATACTTAGG","GCAATCCGA","ACGCAATCA","GAGTATTAG","GACGGATTA","CAGCTGACA",
+                 "CAACATATT","AACTTCTCC","CTATGAAAT","ATTATTACC","TACCGAGCA","TCTCTTCAA","TAAGCGTTA","GCCTTACAA","AGCACACAG","ACAGTTCCG",
+                 "AGTAAAGCC","CAGTTTCAC","CGTTACTAA","TTGTTCCAA","AGAAGCACT","CAGCAAGAT","CAAACCGCC","CTAACTCGC","AATATTGGG","AGAACTTCC",
+                 "CAAAGGCAC","AAGCTCAAC","TCCAGTCGA","AGCCATCAC","AACGAGAAG","CTACAGAAC","AGAGCTATG","GAGGATGGA","TGTACCTTA","ACACACAAA",
+                 "TCAGGAGGA","GAGGTGCTA","ACCCTGACC","ACAAGGATC","ATCCCGGAG","TATGTGGCA","GCTGCCAAT","ATCAGAGCT","TCGAAGTGA","ATAGACGAG",
+                 "AGCCCAATC","CAGAATCGT","ATCTCCACA","ACGAAAGGT","TAGCTTGTA","ACACGAGAT","AACCGCCTC","ATTTAGATG","CAAGCAAGC","CAAAGTGTG",
+                 "GGCAAGCAA","GAGCCAATA","ATGTAATGG","CCTGAGCAA","GAGTACATT","TGCGATCTA"
+                 )
+
+A96_cell_key2 = ("TACAGGATA","CACCAGGTA","TGTGAAGAA","GATTCATCA","CACCCAAAG","CACAAAGGC","GTGTGTCGA","CTAGGTCCT","ACAGTGGTA","TCGTTAGCA",
+                 "AGCGACACC","AAGCTACTT","TGTTCTCCA","ACGCGAAGC","CAGAAATCG","ACCAAAATG","AGTGTTGTC","TAGGGATAC","AGGGCTGGT","TCATCCTAA",
+                 "AATCCTGAA","ATCCTAGGA","ACGACCACC","TTCCATTGA","TAGTCTTGA","ACTGTTAGA","ATTCATCGT","ACTTCGAGC","TTGCGTACA","CAGTGCCCG",
+                 "GACACTTAA","AGGAGGCGC","GCCTGTTCA","GTACATCTA","AATCAGTTT","ACGATGAAT","TGACAGACA","ATTAGGCAT","GGAGTCTAA","TAGAACACA",
+                 "AAATAAATA","CCGACAAGA","CACCTACCC","AAGAGTAGA","TCATTGAGA","GACCTTAGA","CAAGACCTA","GGAATGATA","AAACGTACC","ACTATCCTC",
+                 "CCGTATCTA","ACACATGTC","TTGGTATGA","GTGCAGTAA","AGGATTCAA","AGAATGGAG","CTCTCTCAA","GCTAACTCA","ATCAACCGA","ATGAGTTAC",
+                 "ACTTGATGA","ACTTTAACT","TTGGAGGTA","GCCAATGTA","ATCCAACCG","GATGAACTG","CCATGCACA","TAGTGACTA","AAACTGCGC","ATTACCAAG",
+                 "CACTCGAGA","AACTCATTG","CTTGCTTCA","ACCTGAGTC","AGGTTCGCT","AAGGACTAT","CGTTCGGTA","AGATAGTTC","CAATTGATC","GCATGGCTA",
+                 "ACCAGGTGT","AGCTGCCGT","TATAGCCCT","AGAGGACCA","ACAATATGG","CAGCACTTC","CACTTATGT","AGTGAAAGG","AACCCTCGG","AGGCAGCTA",
+                 "AACCAAAGT","GAGTGCGAA","CGCTAAGCA","AATTATAAC","TACTAGTCA","CAACAACGG"
+                 )
+
+A96_cell_key3 = ("AAGCCTTCT","ATCATTCTG","CACAAGTAT","ACACCTTAG","GAACGACAA","AGTCTGTAC","AAATTACAG","GGCTACAGA","AATGTATCG","CAAGTAGAA",
+                 "GATCTCTTA","AACAACGCG","GGTGAGTTA","CAGGGAGGG","TCCGTCTTA","TGCATAGTA","ACTTACGAT","TGTATGCGA","GCTCCTTGA","GGCACAACA",
+                 "CTCAAGACA","ACGCTGTTG","ATATTGTAA","AAGTTTACG","CAGCCTGGC","CTATTAGCC","CAAACGTGG","AAAGTCATT","GTCTTGGCA","GATCAGCGA",
+                 "ACATTCGGC","AGTAATTAG","TGAAGCCAA","TCTACGACA","CATAACGTT","ATGGGACTC","GATAGAGGA","CTACATGCG","CAACGATCT","GTTAGCCTA",
+                 "AGTTGCATC","AAGGGAACT","ACTACATAT","CTAAGCTTC","ACGAACCAG","TACTTCGGA","AACATCCAT","AGCCTGGTT","CAAGTTTCC","CAGGCATTT",
+                 "ACGTGGGAG","TCTCACGGA","GCAACATTA","ATGGTCCGT","CTATCATGA","CAATACAAG","AAAGAGGCC","GTAGAAGCA","GCTATGGAA","ACTCCAGGG",
+                 "ACAAGTGCA","GATGGTCCA","TCCTCAATA","AATAAACAA","CTGTACGGA","CTAGATAGA","AGCTATGTG","AAATGGAGG","AGCCGCAAG","ACAGTAAAC",
+                 "AACGTGTGA","ACTGAATTC","AAGGGTCAG","TGTCTATCA","TCAGATTCA","CACGATCCG","AACAGAAAC","CATGAATGA","CGTACTACG","TTCAGCTCA",
+                 "AAGGCCGCA","GGTTGGACA","CGTCTAGGT","AATTCGGCG","CAACCTCCA","CAATAGGGT","ACAGGCTCC","ACAACTAGT","AGTTGTTCT","AATTACCGG",
+                 "ACAAACTTT","TCTCGGTTA","ACTAGACCG","ACTCATACG","ATCGAGTCT","CATAGGTCA"
+                 )
+
+B384_cell_key1 = ("TGTGTTCGC","TGTGGCGCC","TGTCTAGCG","TGGTTGTCC","TGGTTCCTC","TGGTGTGCT","TGGCGACCG","TGCTGTGGC","TGCTGGCAC","TGCTCTTCC",
+                  "TGCCTCACC","TGCCATTAT","TGATGTCTC","TGATGGCCT","TGATGCTTG","TGAAGGACC","TCTGTCTCC","TCTGATTAT","TCTGAGGTT","TCTCGTTCT",
+                  "TCTCATCCG","TCCTGGATT","TCAGCATTC","TCACGCCTT","TATGTGCAC","TATGCGGCC","TATGACGAG","TATCTCGTG","TATATGACC","TAGGCTGTG",
+                  "TACTGCGTT","TACGTGTCC","TAATCACAT","GTTGTGTTG","GTTGTGGCT","GTTGTCTGT","GTTGTCGAG","GTTGTCCTC","GTTGTATCC","GTTGGTTCT",
+                  "GTTGGCGTT","GTTGGAGCG","GTTGCTGCC","GTTGCGCAT","GTTGCAGGT","GTTGCACTG","GTTGATGAT","GTTGATACG","GTTGAAGTC","GTTCTGTGC",
+                  "GTTCTCTCG","GTTCTATAT","GTTCGTATG","GTTCGGCCT","GTTCGCGGC","GTTCGATTC","GTTCCGGTT","GTTCCGACG","GTTCACGCT","GTTATCACC",
+                  "GTTAGTCCG","GTTAGGTGT","GTTAGAGAC","GTTAGACTT","GTTACCTCT","GTTAATTCC","GTTAAGCGC","GTGTTGCTT","GTGTTCGGT","GTGTTCCAG",
+                  "GTGTTCATC","GTGTCACAC","GTGTCAAGT","GTGTACTGC","GTGGTTAGT","GTGGTACCG","GTGGCGATC","GTGCTTCTG","GTGCGTTCC","GTGCGGTAT",
+                  "GTGCGCCTT","GTGCGAACT","GTGCAGCCG","GTGCAATTG","GTGCAAGGC","GTCTTGCGC","GTCTGGCCG","GTCTGAGGC","GTCTCAGAT","GTCTCAACC",
+                  "GTCTATCGT","GTCGGTGTG","GTCGGAATC","GTCGCTCCG","GTCCTCGCC","GTCCTACCT","GTCCGCTTG","GTCCATTCT","GTCCAATAC","GTCATGTAT",
+
+                  "GTCAGTGGT","GTCAGATAG","GTATTAACT","GTATCAGTC","GTATAGCCT","GTATACTTG","GTATAAGGT","GTAGCATCG","GTACCGTCC","GTACACCTC",
+                  "GTAAGTGCC","GTAACAGAG","GGTTGTGTC","GGTTGGCTG","GGTTGACGC","GGTTCGTCG","GGTTCAGTT","GGTTATATT","GGTTAATAC","GGTGTACGT",
+                  "GGTGCCGCT","GGTGCATGC","GGTCGTTGC","GGTCGAGGT","GGTAGGCAC","GGTAGCTTG","GGTACATAG","GGTAATCTG","GGCTTGGCC","GGCTTCACG",
+                  "GGCTTATGT","GGCTTACTC","GGCTGTCTT","GGCTCTGTG","GGCTCCGGT","GGCTCACCT","GGCGTTGAG","GGCGTGTAC","GGCGTGCTG","GGCGTATCG",
+                  "GGCGCTCGT","GGCGCTACC","GGCGAGCCT","GGCGAGATC","GGCGACTTG","GGCCTCTTC","GGCCTACAG","GGCCAGCGC","GGCCAACTT","GGCATTCCT",
+                  "GGCATCCGC","GGCATAACC","GGCAACGAT","GGATGTCCG","GGATGAGAG","GGATCTGGC","GGATCCATG","GGATAGGTT","GGAGTCGTG","GGAGAAGGC",
+                  "GGACTCCTT","GGACTAGTC","GGACCGTTG","GGAATTAGT","GGAATCTCT","GGAATCGAC","GGAAGCCTC","GCTTGTAGC","GCTTGACCG","GCTTCGGAC",
+                  "GCTTCACAT","GCTTAGTCT","GCTGGATAT","GCTGGAACC","GCTGCGATG","GCTGATCAG","GCTGAGCGT","GCTCTTGTC","GCTCTCCTG","GCTCGGTCC",
+                  "GCTCCAATT","GCTATTCGC","GCTATGAGT","GCTAGTGTT","GCTAGGATC","GCTAGCACT","GCTACGTAT","GCTAACCTT","GCGTTCCGC","GCGTGTGCC",
+                  "GCGTGCATT","GCGTCGGTT","GCGTATGTG","GCGTATACT","GCGGTTCAC","GCGGTCTTG","GCGGCGTCG","GCGGCACCT","GCGCTGGAC","GCGCTCTCC",
+
+                  "GCGCGGCAG","GCGCGATAC","GCGCCGACC","GCGAGCGAG","GCGAGAGGT","GCGAATTAC","GCCTTGCAT","GCCTGCGCT","GCCTAACTG","GCCGTCCGT",
+                  "GCCGCTGTC","GCCATGCCG","GCCAGCTAT","GCCAACCAG","GCATGGTTG","GCATCGACG","GCAGGCTAG","GCAGGACGC","GCAGCCATC","GCAGATACC",
+                  "GCAGACGTT","GCACTATGT","GCACACGAG","GATTGTCAT","GATTGGTAG","GATTGCACC","GATTCTACT","GATTCGCTT","GATTAGGCC","GATTACGGT",
+                  "GATGTTGGC","GATGTTATG","GATGGCCAG","GATCGTTCG","GATCGGAGC","GATCGCCTC","GATCCTCTG","GATCCAGCG","GATACACGC","GAGTTACCT",
+                  "GAGTCGTAT","GAGTCGCCG","GAGGTGTAG","GAGGCATTG","GAGCGGACG","GAGCCTGAG","GAGATCTGT","GAGATAATT","GAGACGGCT","GACTTCGTG",
+                  "GACTGTTCT","GACTCTTAG","GACCGCATT","GAATTGAGC","GAATATTGC","GAAGGCTCT","GAAGAGACT","GAACTGCCG","GAACGCGTG","CTTGTGTAT",
+                  "CTTGTGCGC","CTTGTCATG","CTTGGTCTT","CTTGGTACC","CTTGGATGT","CTTGCTCAC","CTTGCAATC","CTTGAGGCC","CTTGACGGT","CTTCTGATC",
+                  "CTTCTCGTT","CTTCTAGGC","CTTCGTTAG","CTTATGTCC","CTTATGCTT","CTTATATAG","CTTAGGTTG","CTTAGGAGC","CTTACTTAT","CTGTTCTCG",
+                  "CTGTGCCTC","CTGTCGCAT","CTGTCGAGC","CTGTAGCTG","CTGTACGTT","CTGCTTGCC","CTGCGTAGT","CTGCACACC","CTGATGGAT","CTGAGTCAT",
+                  "CTGACGCCG","CTGAACGAG","CTCTTGTAG","CTCTTAGTT","CTCTTACCG","CTCTGCACC","CTCTCGTCC","CTCGTATTG","CTCGACTAT","CTCCTGACG",
+
+                  "CTCACTAGC","CTATACGGC","CGTTCGCTC","CGTTCACCG","CGTATAGTT","CGGTGTTCC","CGGTGTCAG","CGGTCCTGC","CGGCGACTC","CGGCACGGT",
+                  "CGGATAGCC","CGGAGAGAT","CGCTAATAG","CGCGTTGGC","CGCGCAGAG","CGCACTGCC","CCTTGTCTC","CCTTGGCGT","CCTTCTGAG","CCTTCTCCT",
+                  "CCTTCGACC","CCTTACTTG","CCTGTTCGT","CCTGTATGC","CCTCGGCCG","CCGTTAATT","CCATGTGCG","CCAGTGGTT","CCAGGCATT","CCAGGATCC",
+                  "CCAGCGTTG","CATTCCGAT","CATTATACC","CATGTTGAG","ATTGCGTGT","ATTGCGGAC","ATTGCGCCG","ATTGACTTG","ATTCGGCTG","ATTCGCGAG",
+                  "ATTCCAAGT","ATTATCTTC","ATTACTGTT","ATTACACTC","ATGTTCTAT","ATGTTACGC","ATGTGTATC","ATGTGGCAG","ATGTCTGTG","ATGGTGCAT",
+                  "ATGCTTACT","ATGCTGTCC","ATGCTCGGC","ATGAGGTTC","ATGAGAGTG","ATCTTGGCT","ATCTGTGCG","ATCGGTTCC","ATCATGCTC","ATCATCACT",
+                  "ATATCTTAT","ATAGGCGCC","AGTTGGTAT","AGTTGAGCC","AGTGCGACC","AGGTGCTAC","AGGCTTGCG","AGGCCTTCC","AGGCACCTT","AGGAATATG",
+                  "AGCGGCCAG","AGCCTGGTC","AGCCTGACT","AGCAATCCG","AGAGATGTT","AGAGAATTC","ACTCGCTTG","ACTCGACCT","ACGTACACC","ACGGATGGT",
+                  "ACCAGTCTG","ACATTCGGC","ACATGAGGT","ACACTAATT"
+                  )
+
+B384_cell_key2 = ("TTGTGTTGT","TTGTGGTAG","TTGTGCGGA","TTGTCTGTT","TTGTCTAAG","TTGTCATAT","TTGTCACGA","TTGTATGAA","TTGTACAGT","TTGGTTAAT",
+                  "TTGGTGCAA","TTGGTCGAG","TTGGTATTA","TTGGCACAG","TTGGATACA","TTGGAAGTG","TTGCGGTTA","TTGCCATTG","TTGCACGCG","TTGCAAGGT",
+                  "TTGATGTAT","TTGATAATT","TTGAGACGT","TTGACTACT","TTGACCGAA","TTCTGGTCT","TTCTGCACA","TTCTCCTTA","TTCTCCGCT","TTCTAGGTA",
+                  "TTCTAATCG","TTCGTCGTA","TTCGTAGAT","TTCGGCTTG","TTCGGAATA","TTCGCCAGA","TTCGATTGT","TTCGATCAG","TTCCTCGGT","TTCCGGCAG",
+                  "TTCCGCATT","TTCCAATTA","TTCATTGAA","TTCATGCTG","TTCAGGAGT","TTCACTATA","TTCAACTCT","TTCAACGTG","TTATGCGTT","TTATGATTG",
+                  "TTATCCTGT","TTATCCGAG","TTATATTAT","TTAGGCGCG","TTACTGGAA","TTACTAGTT","TTACGTGGT","TTACGATAT","TTACCTAGA","TTACATGAG",
+                  "TTACAGCGT","TTACACGGA","TTACACACT","TTAATCAGT","TTAATAGGA","TTAAGTGTG","TTAACCTTG","TTAACACAA","TGTTCACTT","TGTTCAAGA",
+                  "TGTTAAGTG","TGTGTTATG","TGTGTCCAA","TGTGGAGCG","TGTCAGTTA","TGTCAGAAG","TGGTTAGTT","TGGTTACAA","TGGCGTTAT","TGGCGCCAA",
+                  "TGGAGTCTT","TGCGTATTG","TGATAGAGA","TGAGGTATT","TGAGAATCT","TCTTGGTAA","TCTTCATAG","TCTGTCCTT","TCTGGAATT","TCTACCGCG",
+                  "TCGTTCGAA","TCGTCAGTG","TCGACGAGA","TCATGGCTT","TCACACTTA","TATTCCGAA","TATTATGGT","TATGCTATT","TATCAAGGA","TAGTTCAAT",
+
+                  "TAGCTGCTT","TAGAGGAAG","TACCTGTTA","TACACCTGT","GTTGTGCGT","GTTGGCTAT","GTTGCCAAG","GTTGACCTT","GTTCTGCTA","GTTCTGAAT",
+                  "GTTCTATCA","GTTCGCGTG","GTTCCTTAT","GTTAGCAGT","GTTACTGTG","GTTACTCAA","GTTAAGAGA","GTTAACTTA","GTGTCGGCA","GTGTCCATT",
+                  "GTGCTTGAG","GTGCTCGTT","GTGCTCACA","GTGCCTGGA","GTCTTGTCG","GTCTTGATT","GTCTTCCGT","GTCTTAAGA","GTCTCATCT","GTCTACGAG",
+                  "GTCGTTGCT","GTCGTGTTA","GTCGGTAAT","GTCGGATGT","GTCGAGCTG","GTCCGGACT","GTCCAACAT","GTCAGACGA","GTCAGAATT","GTCACTCTT",
+                  "GTCAAGGAA","GTATGTCTT","GTATGTACA","GTATCGGTT","GTATATGTA","GTATACAAT","GTAGTTAAG","GTAGTCGAT","GTAGCCTTA","GTAGATACT",
+                  "GTACGATTA","GTACAGTCT","GTAATTCGT","GCTTGGCAG","GCTTGCTTG","GCTTGAGGA","GCTTCATTA","GCTTATGCG","GCTGTGTAG","GCTGTCATG",
+                  "GCTGGTTGT","GCTGGACTG","GCTGCCTAA","GCTGATATT","GCTCTTAGT","GCTCTATTG","GCTCGCCGT","GCTCCGCTG","GCTATTCTG","GCTATACGA",
+                  "GCTACTAAG","GCTACATGT","GCTAACTCT","GCGTTGTAA","GCGTTCTCT","GCGTGCGTA","GCGTCTTGA","GCGTCCGAT","GCGTAAGAG","GCGCTTACG",
+                  "GCGCGGATT","GCGCCATAT","GCGCATGAA","GCGATCAAT","GCGAGCCTT","GCGAGATTG","GCGAGAACA","GCCTTGGTA","GCCTTCTAG","GCCTTCACA",
+                  "GCCTGAGTG","GCCTCACGT","GCCGGCGAA","GCCGCACAA","GCCATGCTT","GCCATATAT","GCCAATTCG","GCATTCGTT","GCATGATGT","GCAGTTGGA",
+
+                  "GCAGTGTCT","GCACTTGTG","GCAATCTGT","GCAACACTT","GATTGTATT","GATTGCGAG","GATTCCAGT","GATTCATAT","GATTATCAG","GATTAGGTT",
+                  "GATGTTGCG","GATGGATCT","GATGCTGAT","GATGCCTTG","GATCTCCTT","GATCGCTTA","GATATTGAA","GATATTACT","GAGTGTTAT","GAGCTCAGT",
+                  "GAGCGTGCT","GAGCGTCGA","GAGCGGTTG","GAGCGACTT","GAGCCGAAT","GAGATAGAT","GAGACCTAT","GACGGTCGT","GACGCAGGT","GACGATATG",
+                  "GACCTATCT","GAATTAGGA","GAATCAGCT","GAAGTTCAT","GAAGTGGTT","GAAGTATTG","GAAGGCATT","GAACGCTGT","CTTGTCCAG","CTTGGATTG",
+                  "CTTGCTGAA","CTTGCCGTG","CTTGATTCT","CTTCTGTCG","CTTCGGCGT","CTTATGAGT","CTTACCGAT","CTGTTAGGT","CTGTCGTCT","CTGTATAAT",
+                  "CTGGCTCAT","CTGGATGCG","CTGCGTGTG","CTGCGCGGT","CTGCCGATT","CTGCATTGT","CTGATTAAG","CTGAGATAT","CTGACCTGT","CTCGTATCT",
+                  "CTCGGCAAG","CTCGCAATT","CTCCTGCTT","CTCCTAAGT","CTCCGGATG","CTCCGAGCG","CTCACAGGT","CTATTCTAT","CTATTAGTG","CTATGAATT",
+                  "CTACATATT","CGTGGCATT","CGTCTTAAT","CGTCTGGTT","CGTCACTGT","CGTAGGTCT","CGGTTCGAG","CGGTTCATT","CGGTGCTCT","CGGTAATTG",
+                  "CGGCCTGAT","CGGATATAG","CGGAATATT","CGCTCCAAT","CGCGTTCGT","CGCAGGTTG","CGAGGATGT","CGAGCTGTT","CGACGGCTT","CCTTGTGTG",
+                  "CCTGTCTCA","CCTGACTAT","CCTACCTTG","CCGTAGATT","CCGGCTGGT","CATCGGACG","CATCGATAA","CATCCTTCT","CAGTTCTGT","CAGTGCCAG",
+
+                  "CAGGCACTG","CAGCCTCTT","CACTTATAT","CACTGGTCG","CACTGCATG","CACGCGTTG","CACGATGTT","CACCATCTG","CACAGGCGT","ATTGTACAA",
+                  "ATTGGTATG","ATTGCTAAT","ATTGCATAG","ATTGCAGTT","ATTCTGCAG","ATTCTACGT","ATTCGGATT","ATTCCGTTG","ATTCATCAA","ATTCAAGAG",
+                  "ATTAGCCTT","ATTAATATT","ATGTTAGAG","ATGTTAACT","ATGTAGTCG","ATGGTGTAG","ATGGATTAT","ATCTTGAAG","ATCTGATAT","ATCTCAGAA",
+                  "ATCGCTCAA","ATCGCGTCG","ATCCATGGT","ATCATGAGA","ATCATAGTT","ATCAGCGAG","ATCACCATT","ATAGTAATT","ATAGCTGTG","ATACTCTCG",
+                  "ATACCTCAT","AGTTGCGCG","AGTTGAATT","AGTTATGAT","AGTGTCCGT","AGTGGCTTG","AGTGCTTCT","AGTATCATT","AGTACACAA","AGGTATGCG",
+                  "AGGTATAGT","AGGCTACTT","AGGCCAGGT","AGGAGCGAT","AGCTTATAG","AGCTCTAGA","AGCGTGTAT","AGCGTCACA","AGCCTTCAT","AGCCTGTCG",
+                  "AGCCTCGAG","AGCACTGAA","AGATGTACG","AGAGTTAAT","AGACCTCTG","ACTTCTATA","ACTGTCGAG","ACTGTATGT","ACTCTGTAA","ACTCGCGAA",
+                  "ACTAGATCT","ACTAACGTT","ACGTTACTG","ACGTGGAAT","ACGGACTCT","ACGCCTAAT","ACGCCGTTA","ACGACGTGT","ACCTCGCAT","ACCATCATA",
+                  "ACATATATT","ACAGGCACA","ACACCTGAG","ACACATTCT"
+                  )
+
+B384_cell_key3 = ("TTGTGGCTG","TTGTGGAGT","TTGTGCGAC","TTGTCTTCA","TTGTAAGAT","TTGGTTCTG","TTGGTGCGT","TTGGTCTAC","TTGGTAACT","TTGGCGTGC",
+                  "TTGGATTAG","TTGGAGACG","TTGGAATCA","TTGCGGCGA","TTGCGCTCG","TTGCCTTAC","TTGCCGGAT","TTGCATGCT","TTGCACGTC","TTGCACCAT",
+                  "TTGAACCTG","TTCTCGCGT","TTCTCAACT","TTCTACTCA","TTCGTCCAT","TTCGGATAC","TTCGGACGT","TTCGCAATC","TTCCGGTGC","TTCCGACTG",
+                  "TTCATTATG","TTCATGGAT","TTCAGCGCA","TTCACCTCG","TTCAAGCAG","TTCAACTAC","TTATGCCAG","TTATGCATC","TTATCGTAC","TTATACCTA",
+                  "TTATAATAG","TTATAAGTC","TTAGTTAGC","TTAGCTCAT","TTAGCACTA","TTAGATATG","TTACTACGA","TTACCGTCA","TTACAGAGC","TTAATTGCA",
+                  "TTAACAGAT","TGTTGGCTA","TGTTGATGA","TGTTAAGCT","TGTGGCCGA","TGTGCTAGC","TGTGCGTCA","TGTCGCAGT","TGTCGAGCA","TGTACAACG",
+                  "TGGTTCCGA","TGGTTCACT","TGGTCAAGT","TGGCTTGTA","TGGCTGTCG","TGGCGTATG","TGGCGCGCT","TGGATGTAC","TGGACTTGC","TGGAATACT",
+                  "TGCTAGCGA","TGCGTTGCT","TGCGGTCTG","TGCGCTTAG","TGCGCGACG","TGCCTGCAT","TGCCTAGAC","TGCACGAGT","TGAGTGTGC","TGAGGCTCG",
+                  "TCTTCCGTC","TCTTATAGT","TCTTACCAT","TCTGTTGTC","TCTGTTACT","TCTGGCTAG","TCTCAGATC","TCTAGTTGA","TCTAGTACG","TCGTACTAC",
+                  "TCGGTGTAG","TCGGCTGCT","TCGCTACTG","TCGATCACG","TCGAGGCAT","TCCGGCGTC","TCCGGAGCT","TCCGCTCGT","TCCGAGTAC","TCCATTCAT",
+
+                  "TCCATGGTC","TCCAAGTCG","TCATTACGT","TCATGCACT","TCAGGTTGC","TCAGACCGT","TCACTCAGT","TCAAGCTCA","TATTGCGCA","TATTCGGCT",
+                  "TATTCCAGC","TATTCATCA","TATGTTCAG","TATGGTATG","TATGCAAGT","TATCTGGTC","TATCTGACT","TATCCAGAT","TATCAGTCG","TATCACGCT",
+                  "TAGGCGCGA","TAGGCACAT","TAGGATCGT","TAGCATTGC","TAGAGTTAC","TAGACTGAT","TACTTGTCG","TACGTCCGA","TACCGTACT","TACCGCGAT",
+                  "TACCAGGAC","TACAGAAGT","TAAGTGCAT","TAAGCTACT","GTTGACCGA","GTTCTCGAC","GTTCCTGCT","GTTATGATG","GTGCTTGCA","GTGCCGCGT",
+                  "GTATTGCTG","GTATTCCGA","GTATTAAGC","GTATGACGT","GTAGTTGTC","GTAGTACAT","GTAGCTCGA","GGTTGCTCA","GGTTGAGTA","GGTTAACGT",
+                  "GGTGTGGCA","GGTCTTCAG","GGTCGTCTA","GGTCGGCGT","GGTCCGACT","GGTCATGTC","GGTCACATG","GGTAGTGCT","GGTAGCGTC","GGTACCAGT",
+                  "GGTAAGGAT","GGCTTGTGC","GGCTTGACT","GGCTTACGA","GGCTGTAGT","GGCTGGCAG","GGCTCCATC","GGCGTGGAT","GGCGTAATC","GGCGCAAGT",
+                  "GGCGAGTAG","GGCGACCGT","GGCCTGTCA","GGCCATTGC","GGCACTCTG","GGATGTCAT","GGAGTAACT","GGAGAACGA","GGACTGGCT","GGACGTTCA",
+                  "GGAACGTGC","GCTGTCCAT","GCTGGTTCA","GCTGCAACT","GCTCGTTAC","GCTATAGAT","GCTAGTCGT","GCTACCATG","GCGTTCTGA","GCGTGTTAG",
+                  "GCGGTATCG","GCGGAGCAT","GCGCGGTGC","GCGCCTAGT","GCGCCGGCT","GCCTTCATG","GCCATACTG","GCATGTTGA","GCATGCTAC","GCAGTATAC",
+
+                  "GCAGGTACT","GCAGCGCGT","GCACCTCAT","GCAATTCGA","GATTGCCGT","GATGAACAT","GATCTTCGA","GATCTGCAT","GAGTGGCAT","GAGTCGGAC",
+                  "GAGTATGAT","GAGGCGAGT","GAGGCAACG","GAGCGCACT","GAATAGGCT","ATTGTCACT","ATTGTATCA","ATTGGTCAG","ATTGGCGAT","ATTGATCGT",
+                  "ATTCGTAGT","ATTCATACG","ATTCAGGAC","ATTACTTCA","ATTAATTAG","ATTAAGCAT","ATGTCTCTA","ATGTAGCGT","ATGGCATAC","ATGGAGATC",
+                  "ATGGACTCG","ATGGAACGA","ATGCTTCAT","ATGCTCGCT","ATGCGACGT","ATGCCGTAG","ATGAGTTCG","ATGACTATC","ATGACCGAC","ATCTTATGC",
+                  "ATCTTACTA","ATCTATCAG","ATCGTGTAC","ATCGTCTGA","ATCGGCATG","ATCGCGAGC","ATCGCAACG","ATCGATGCT","ATCGAATAG","ATCCTTCTG",
+                  "ATCCTGCGT","ATCCGCACT","ATCCATTAC","ATCCAAGCA","ATCAGATCA","ATCACACAT","ATCAACGTC","ATCAACCGA","ATATTGAGT","ATATTCGTC",
+                  "ATATTACAG","ATATCTTGA","ATATCGCAT","ATATCAATC","ATAGTCCTG","ATAGGTCTA","ATAGCTGAC","ATAGCGGTA","AGTTCGCTG","AGTTACAGC",
+                  "AGTTAACTA","AGTGCAATC","AGTCTGGTA","AGTCTGAGC","AGTCTACAT","AGTCGAACT","AGTCCATCG","AGTCATTCA","AGTATCCAG","AGTAGACTG",
+                  "AGTAATCGA","AGTAAGTGC","AGGTTGGCT","AGGTTCTAG","AGGTGTTCA","AGGTGCCAT","AGGTCTGAT","AGGTCGTAC","AGGTCAGCA","AGGCTTATC",
+                  "AGGCTATGA","AGGCCGACG","AGGCCAAGC","AGGCAGGTC","AGGCAAGAT","AGGAGCAGT","AGGACCGCT","AGGAATTAC","AGCTTGGAC","AGCTTAAGT",
+
+                  "AGCTACACG","AGCGTTACG","AGCGGTGCA","AGCGGAGTC","AGCGGACGA","AGCGCGCTA","AGCGATAGC","AGCGACTCA","AGCCTCTAC","AGCCGTCGT",
+                  "AGCATGATC","AGCACTTCG","AGCACGGCA","AGATTCTGA","AGATTAGAT","AGATGATAG","AGATATGTA","AGATACCGT","AGAGTGCGT","AGAGCCGAT",
+                  "AGACTCACT","ACTTGCCTA","ACTTGAGCA","ACTTCTAGC","ACTTCGACT","ACTTAGTAC","ACTGTTGAT","ACTGTAACG","ACTGGTATC","ACTGACGTC",
+                  "ACTGAAGCT","ACTCTGATG","ACTCCTGAC","ACTCCGCTA","ACTCAACTG","ACTATTGCA","ACTAGGCAG","ACTACGCGT","ACTAATACT","ACGTTCGTA",
+                  "ACGTGTGCT","ACGTGTATG","ACGTGGAGC","ACGTCTTCG","ACGTCAGTC","ACGGTCTCA","ACGGTCCGT","ACGGTACAG","ACGGCGCTG","ACGCTGCGA",
+                  "ACGCGTGTA","ACGCGCCAG","ACGATGTCG","ACGATGGAT","ACGATCTAC","ACGAGCTGA","ACGAGCATC","ACGAATCGT","ACGAACGCA","ACCTTGTAG",
+                  "ACCTGTTGC","ACCTGTCAT","ACCTCGATC","ACCTAGGTA","ACCTACTGA","ACCTAATCG","ACCGTAGCA","ACCGGTAGT","ACCGGCTAC","ACCGCTTCA",
+                  "ACATTGTGC","ACATTCTCG","ACATGGCTG","ACATGACGA","ACATATGAT","ACATATACG","ACAGCGTAC","ACACTTGCT","ACACTATCA","ACACGCATG",
+                  "ACACCAGTA","ACACCAACT","ACACATAGT","ACACACCTA"
+                  )
+
+
+def label_sections_to_index(label):
+    """ 
+    Return the cell_index integer based on input 3 part cell label string
+    
+    """
+
+    cl1, cl2, cl3 = [int(n) for n in label.split('-')]
+    return (cl1 - 1) * 384 * 384 + (cl2 - 1) * 384 + (cl3 - 1) + 1
+
+
+# print(label_sections_to_index('1-1-1'))
+# print(label_sections_to_index('33-78-21'))
+# print(label_sections_to_index('43-12-77'))
+# print(label_sections_to_index('96-96-96'))
+# print(label_sections_to_index('135-43-344'))
+# print(label_sections_to_index('384-384-384'))
+# print('-')
+
+#----------------------------------
+
+
+def index_to_label_sections(index):
+
+    zerobased = int(index) - 1
+
+    cl1 = (int((zerobased) / 384 / 384) % 384) + 1
+    cl2 = (int((zerobased) / 384) % 384) + 1
+    cl3 = (zerobased % 384) + 1
+
+    return f'{cl1}-{cl2}-{cl3}'
+
+
+# print(index_to_label_sections(1))
+# print(index_to_label_sections(4748181))
+# print(index_to_label_sections(6197453))
+# print(index_to_label_sections(14044896))
+# print(index_to_label_sections(19775576))
+# print(index_to_label_sections(56623104))
+# print('-')
+#----------------------------------
+
+
+def index_to_sequence(index, bead_version):
+
+    zerobased = int(index) - 1
+
+    cl1 = (int((zerobased) / 384 / 384) % 384) + 1
+    cl2 = (int((zerobased) / 384) % 384) + 1
+    cl3 = (zerobased % 384) + 1
+
+    if bead_version == 'v1':
+        cls1_sequence = A96_cell_key1[cl1-1]
+        cls2_sequence = A96_cell_key2[cl2-1]
+        cls3_sequence = A96_cell_key3[cl3-1]
+
+        return f'{cls1_sequence}{v1_linker1}{cls2_sequence}{v1_linker2}{cls3_sequence}'
+
+    elif bead_version == 'Enh':
+
+        diversityInsert = ''
+
+        if 1 <= cl1 <= 24:
+            diversityInsert = ''
+        elif 25 <= cl1 <= 48:
+            diversityInsert = 'A'
+        elif 49 <= cl1 <= 72:
+            diversityInsert = 'GT'
+        else: # 73 <= cl1 <= 96:
+            diversityInsert = 'TCA'
+
+        cls1_sequence = A96_cell_key1[cl1-1]
+        cls2_sequence = A96_cell_key2[cl2-1]
+        cls3_sequence = A96_cell_key3[cl3-1]
+
+        return f'{diversityInsert}{cls1_sequence}{Enh_linker1}{cls2_sequence}{Enh_linker2}{cls3_sequence}'
+
+    elif bead_version == 'EnhV2':
+
+        diversityInsert = ''
+        subIndex = ((cl1-1) % 96) + 1
+
+        if 1 <= subIndex <= 24:
+            diversityInsert = ''
+        elif 25 <= subIndex <= 48:
+            diversityInsert = 'A'
+        elif 49 <= subIndex <= 72:
+            diversityInsert = 'GT'
+        else: # 73 <= subIndex <= 96:
+            diversityInsert = 'TCA'
+
+        cls1_sequence = B384_cell_key1[cl1-1]
+        cls2_sequence = B384_cell_key2[cl2-1]
+        cls3_sequence = B384_cell_key3[cl3-1]
+
+        return f'{diversityInsert}{cls1_sequence}{Enh_linker1}{cls2_sequence}{Enh_linker2}{cls3_sequence}'
+
+
+# print(index_to_sequence(4748181, 'Enh'))
+# print(index_to_sequence(52923177, 'EnhV2'))
+
+#----------------------------------
+
+
+def create_cell_index_fasta_V1():
+    with open('Rhapsody_cellBarcodeV1_IndexToSequence.fasta', 'w') as f:
+        for cl1 in range(1, 96+1):
+            for cl2 in range(1, 96+1):
+                for cl3 in range(1, 96+1):
+                    index = label_sections_to_index(f'{cl1}-{cl2}-{cl3}')
+                    sequence = index_to_sequence(index, 'v1')
+                    f.write(f'>{index}\n')
+                    f.write(f'{sequence}\n')
+
+#create_cell_index_fasta_V1()
+
+
+def create_cell_index_fasta_Enh():
+    with open('Rhapsody_cellBarcodeEnh_IndexToSequence.fasta', 'w') as f:
+        for cl1 in range(1, 96+1):
+            for cl2 in range(1, 96+1):
+                for cl3 in range(1, 96+1):
+                    index = label_sections_to_index(f'{cl1}-{cl2}-{cl3}')
+                    sequence = index_to_sequence(index, 'Enh')
+                    f.write(f'>{index}\n')
+                    f.write(f'{sequence}\n')
+
+#create_cell_index_fasta_Enh()
+
+def create_cell_index_fasta_EnhV2():
+    with open('Rhapsody_cellBarcodeEnhV2_IndexToSequence.fasta', 'w') as f:
+        for cl1 in range(1, 384+1):
+            for cl2 in range(1, 384+1):
+                for cl3 in range(1, 384+1):
+                    index = label_sections_to_index(f'{cl1}-{cl2}-{cl3}')
+                    sequence = index_to_sequence(index, 'EnhV2')
+                    f.write(f'>{index}\n')
+                    f.write(f'{sequence}\n')
+
+#create_cell_index_fasta_EnhV2()
diff --git a/src/bd_rhapsody/test_data/BDAbSeq_ImmuneDiscoveryPanel.fasta b/src/bd_rhapsody/test_data/BDAbSeq_ImmuneDiscoveryPanel.fasta
new file mode 100644
index 00000000..930add4a
--- /dev/null
+++ b/src/bd_rhapsody/test_data/BDAbSeq_ImmuneDiscoveryPanel.fasta
@@ -0,0 +1,60 @@
+>CD11c:B-LY6|ITGAX|AHS0056|pAbO Catalog_940024
+ATGCGTTGCGAGAGATATGCGTAGGTTGCTGATTGG
+>CD14:MPHIP9|CD14|AHS0037|pAbO Catalog_940005
+TGGCCCGTGGTAGCGCAATGTGAGATCGTAATAAGT
+>CXCR5|CXCR5|AHS0039|pAbO Catalog_940042
+AGGAAGGTCGATTGTATAACGCGGCATTGTAACGGC
+>CD19:SJ25C1|CD19|AHS0030|pAbO Catalog_940004
+TAGTAATGTGTTCGTAGCCGGTAATAATCTTCGTGG
+>CD25:2A3|IL2RA|AHS0026|pAbO Catalog_940009
+AGTTGTATGGGTTAGCCGAGAGTAGTGCGTATGATT
+>CD27:M-T271|CD27|AHS0025|pAbO Catalog_940018
+TGTCCGGTTTAGCGAATTGGGTTGAGTCACGTAGGT
+>CD278|ICOS|AHS0012|pAbO Catalog_940043
+ATAGTCCGCCGTAATCGTTGTGTCGCTGAAAGGGTT
+>CD279:EH12-1|PDCD1|AHS0014|pAbO Catalog_940015
+ATGGTAGTATCACGACGTAGTAGGGTAATTGGCAGT
+>CD3:UCHT1|CD3E|AHS0231|pAbO Catalog_940307
+AGCTAGGTGTTATCGGCAAGTTGTACGGTGAAGTCG
+>GITR|TNFRSF18|AHS0104|pAbO Catalog_940096
+TCTGTGTGTCGGGTTGAATCGTAGTGAGTTAGCGTG
+>Tim3|HAVCR2|AHS0016|pAbO Catalog_940066
+TAGGTAGTAGTCCCGTATATCCGATCCGTGTTGTTT
+>CD4:SK3|CD4|AHS0032|pAbO Catalog_940001
+TCGGTGTTATGAGTAGGTCGTCGTGCGGTTTGATGT
+>CD45RA:HI100|PTPRC|AHS0009|pAbO Catalog_940011
+AAGCGATTGCGAAGGGTTAGTCAGTACGTTATGTTG
+>CD56:NCAM16.2|NCAM1|AHS0019|pAbO Catalog_940007
+AGAGGTTGAGTCGTAATAATAATCGGAAGGCGTTGG
+>CD62L:DREG-56|SELL|AHS0049|pAbO Catalog_940041
+ATGGTAAATATGGGCGAATGCGGGTTGTGCTAAAGT
+>CCR7|CCR7|AHS0273|pAbO Catalog_940394
+AATGTGTGATCGGCAAAGGGTTCTCGGGTTAATATG
+>CXCR6|CXCR6|AHS0148|pAbO Catalog_940234
+GTGGTTGGTTATTCGGACGGTTCTATTGTGAGCGCT
+>CD127|IL7R|AHS0028|pAbO Catalog_940012
+AGTTATTAGGCTCGTAGGTATGTTTAGGTTATCGCG
+>CD134:ACT35|TNFRSF4|AHS0013|pAbO Catalog_940060
+GGTGTTGGTAAGACGGACGGAGTAGATATTCGAGGT
+>CD28:L293|CD28|AHS0138|pAbO Catalog_940226
+TTGTTGAGGATACGATGAAGCGGTTTAAGGGTGTGG
+>CD272|BTLA|AHS0052|pAbO Catalog_940105
+GTAGGTTGATAGTCGGCGATAGTGCGGTTGAAAGCT
+>CD8:SK1|CD8A|AHS0228|pAbO Catalog_940305
+AGGACATAGAGTAGGACGAGGTAGGCTTAAATTGCT
+>HLA-DR|CD74|AHS0035|pAbO Catalog_940010
+TGTTGGTTATTCGTTAGTGCATCCGTTTGGGCGTGG
+>CD16:3G8|FCGR3A|AHS0053|pAbO Catalog_940006
+TAAATCTAATCGCGGTAACATAACGGTGGGTAAGGT
+>CD183|CXCR3|AHS0031|pAbO Catalog_940030
+AAAGTGTTGGCGTTATGTGTTCGTTAGCGGTGTGGG
+>CD196|CCR6|AHS0034|pAbO Catalog_940033
+ACGTGTTATGGTGTTGTTCGAATTGTGGTAGTCAGT
+>CD137|TNFRSF9|AHS0003|pAbO Catalog_940055
+TGACAAGCAACGAGCGATACGAAAGGCGAAATTAGT
+>CD161:HP-3G10|KLRB1|AHS0205|pAbO Catalog_940283
+TTTAGGACGATTAGTTGTGCGGCATAGGAGGTGTTC
+>IgM|IGHM|AHS0198|pAbO Catalog_940276
+TTTGGAGGGTAGCTAGTTGCAGTTCGTGGTCGTTTC
+>IgD|IGHD|AHS0058|pAbO Catalog_940026
+TGAGGGATGTATAGCGAGAATTGCGACCGTAGACTT
diff --git a/src/bd_rhapsody/test_data/SampleTagSequences_HomoSapiens_ver1.fasta b/src/bd_rhapsody/test_data/SampleTagSequences_HomoSapiens_ver1.fasta
new file mode 100644
index 00000000..3d5a42fa
--- /dev/null
+++ b/src/bd_rhapsody/test_data/SampleTagSequences_HomoSapiens_ver1.fasta
@@ -0,0 +1,24 @@
+>SampleTag01_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGATTCAAGGGCAGCCGCGTCACGATTGGATACGACTGTTGGACCGG
+>SampleTag02_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTGGATGGGATAAGTGCGTGATGGACCGAAGGGACCTCGTGGCCGG
+>SampleTag03_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCGGCTCGTGCTGCGTCGTCTCAAGTCCAGAAACTCCGTGTATCCT
+>SampleTag04_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGATTGGGAGGCTTTCGTACCGCTGCCGCCACCAGGTGATACCCGCT
+>SampleTag05_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCTCCCTGGTGTTCAATACCCGATGTGGTGGGCAGAATGTGGCTGG
+>SampleTag06_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTTACCCGCAGGAAGACGTATACCCCTCGTGCCAGGCGACCAATGC
+>SampleTag07_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTGTCTACGTCGGACCGCAAGAAGTGAGTCAGAGGCTGCACGCTGT
+>SampleTag08_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCCCCACCAGGTTGCTTTGTCGGACGAGCCCGCACAGCGCTAGGAT
+>SampleTag09_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGTGATCCGCGCAGGCACACATACCGACTCAGATGGGTTGTCCAGG
+>SampleTag10_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGCAGCCGGCGTCGTACGAGGCACAGCGGAGACTAGATGAGGCCCC
+>SampleTag11_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCGCGTCCAATTTCCGAAGCCCCGCCCTAGGAGTTCCCCTGCGTGC
+>SampleTag12_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGCCCATTCATTGCACCCGCCAGTGATCGACCCTAGTGGAGCTAAG
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.fa b/src/bd_rhapsody/test_data/reference_small.fa
similarity index 100%
rename from src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.fa
rename to src/bd_rhapsody/test_data/reference_small.fa
diff --git a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.gtf b/src/bd_rhapsody/test_data/reference_small.gtf
similarity index 100%
rename from src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.gtf
rename to src/bd_rhapsody/test_data/reference_small.gtf
diff --git a/src/bd_rhapsody/test_data/script.sh b/src/bd_rhapsody/test_data/script.sh
new file mode 100644
index 00000000..f8db0313
--- /dev/null
+++ b/src/bd_rhapsody/test_data/script.sh
@@ -0,0 +1,141 @@
+#!/bin/bash
+
+TMP_DIR=/tmp/bd_rhapsody_make_reference
+OUT_DIR=src/bd_rhapsody/test_data
+
+# check if seqkit is installed
+if ! command -v seqkit &> /dev/null; then
+  echo "seqkit could not be found"
+  exit 1
+fi
+
+# create temporary directory and clean up on exit
+mkdir -p $TMP_DIR
+function clean_up {
+    rm -rf "$TMP_DIR"
+}
+trap clean_up EXIT
+
+# fetch reference
+ORIG_FA=$TMP_DIR/reference.fa.gz
+if [ ! -f $ORIG_FA ]; then
+  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz \
+    -O $ORIG_FA
+fi
+
+ORIG_GTF=$TMP_DIR/reference.gtf.gz
+if [ ! -f $ORIG_GTF ]; then
+  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz \
+    -O $ORIG_GTF
+fi
+
+# create small reference
+START=30000
+END=31500
+CHR=chr1
+
+# subset to small region
+seqkit grep -r -p "^$CHR\$" "$ORIG_FA" | \
+  seqkit subseq -r "$START:$END" > $OUT_DIR/reference_small.fa
+
+zcat "$ORIG_GTF" | \
+  awk -v FS='\t' -v OFS='\t' "
+    \$1 == \"$CHR\" && \$4 >= $START && \$5 <= $END {
+      \$4 = \$4 - $START + 1;
+      \$5 = \$5 - $START + 1;
+      print;
+    }" > $OUT_DIR/reference_small.gtf
+
+# download bdabseq immunediscoverypanel fasta
+# note: was contained in http://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-Demo-Data-Inputs/12WTA-ABC-SMK-EB-5kJRT.tar
+cat > $OUT_DIR/BDAbSeq_ImmuneDiscoveryPanel.fasta <<EOF
+>CD11c:B-LY6|ITGAX|AHS0056|pAbO Catalog_940024
+ATGCGTTGCGAGAGATATGCGTAGGTTGCTGATTGG
+>CD14:MPHIP9|CD14|AHS0037|pAbO Catalog_940005
+TGGCCCGTGGTAGCGCAATGTGAGATCGTAATAAGT
+>CXCR5|CXCR5|AHS0039|pAbO Catalog_940042
+AGGAAGGTCGATTGTATAACGCGGCATTGTAACGGC
+>CD19:SJ25C1|CD19|AHS0030|pAbO Catalog_940004
+TAGTAATGTGTTCGTAGCCGGTAATAATCTTCGTGG
+>CD25:2A3|IL2RA|AHS0026|pAbO Catalog_940009
+AGTTGTATGGGTTAGCCGAGAGTAGTGCGTATGATT
+>CD27:M-T271|CD27|AHS0025|pAbO Catalog_940018
+TGTCCGGTTTAGCGAATTGGGTTGAGTCACGTAGGT
+>CD278|ICOS|AHS0012|pAbO Catalog_940043
+ATAGTCCGCCGTAATCGTTGTGTCGCTGAAAGGGTT
+>CD279:EH12-1|PDCD1|AHS0014|pAbO Catalog_940015
+ATGGTAGTATCACGACGTAGTAGGGTAATTGGCAGT
+>CD3:UCHT1|CD3E|AHS0231|pAbO Catalog_940307
+AGCTAGGTGTTATCGGCAAGTTGTACGGTGAAGTCG
+>GITR|TNFRSF18|AHS0104|pAbO Catalog_940096
+TCTGTGTGTCGGGTTGAATCGTAGTGAGTTAGCGTG
+>Tim3|HAVCR2|AHS0016|pAbO Catalog_940066
+TAGGTAGTAGTCCCGTATATCCGATCCGTGTTGTTT
+>CD4:SK3|CD4|AHS0032|pAbO Catalog_940001
+TCGGTGTTATGAGTAGGTCGTCGTGCGGTTTGATGT
+>CD45RA:HI100|PTPRC|AHS0009|pAbO Catalog_940011
+AAGCGATTGCGAAGGGTTAGTCAGTACGTTATGTTG
+>CD56:NCAM16.2|NCAM1|AHS0019|pAbO Catalog_940007
+AGAGGTTGAGTCGTAATAATAATCGGAAGGCGTTGG
+>CD62L:DREG-56|SELL|AHS0049|pAbO Catalog_940041
+ATGGTAAATATGGGCGAATGCGGGTTGTGCTAAAGT
+>CCR7|CCR7|AHS0273|pAbO Catalog_940394
+AATGTGTGATCGGCAAAGGGTTCTCGGGTTAATATG
+>CXCR6|CXCR6|AHS0148|pAbO Catalog_940234
+GTGGTTGGTTATTCGGACGGTTCTATTGTGAGCGCT
+>CD127|IL7R|AHS0028|pAbO Catalog_940012
+AGTTATTAGGCTCGTAGGTATGTTTAGGTTATCGCG
+>CD134:ACT35|TNFRSF4|AHS0013|pAbO Catalog_940060
+GGTGTTGGTAAGACGGACGGAGTAGATATTCGAGGT
+>CD28:L293|CD28|AHS0138|pAbO Catalog_940226
+TTGTTGAGGATACGATGAAGCGGTTTAAGGGTGTGG
+>CD272|BTLA|AHS0052|pAbO Catalog_940105
+GTAGGTTGATAGTCGGCGATAGTGCGGTTGAAAGCT
+>CD8:SK1|CD8A|AHS0228|pAbO Catalog_940305
+AGGACATAGAGTAGGACGAGGTAGGCTTAAATTGCT
+>HLA-DR|CD74|AHS0035|pAbO Catalog_940010
+TGTTGGTTATTCGTTAGTGCATCCGTTTGGGCGTGG
+>CD16:3G8|FCGR3A|AHS0053|pAbO Catalog_940006
+TAAATCTAATCGCGGTAACATAACGGTGGGTAAGGT
+>CD183|CXCR3|AHS0031|pAbO Catalog_940030
+AAAGTGTTGGCGTTATGTGTTCGTTAGCGGTGTGGG
+>CD196|CCR6|AHS0034|pAbO Catalog_940033
+ACGTGTTATGGTGTTGTTCGAATTGTGGTAGTCAGT
+>CD137|TNFRSF9|AHS0003|pAbO Catalog_940055
+TGACAAGCAACGAGCGATACGAAAGGCGAAATTAGT
+>CD161:HP-3G10|KLRB1|AHS0205|pAbO Catalog_940283
+TTTAGGACGATTAGTTGTGCGGCATAGGAGGTGTTC
+>IgM|IGHM|AHS0198|pAbO Catalog_940276
+TTTGGAGGGTAGCTAGTTGCAGTTCGTGGTCGTTTC
+>IgD|IGHD|AHS0058|pAbO Catalog_940026
+TGAGGGATGTATAGCGAGAATTGCGACCGTAGACTT
+EOF
+
+# this was obtained by running the command:
+# docker run bdgenomics/rhapsody:2.2.1 cat /rhapsody/control_files/SampleTagSequences_HomoSapiens_ver1.fasta 
+cat > $OUT_DIR/SampleTagSequences_HomoSapiens_ver1.fasta <<EOF
+>SampleTag01_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGATTCAAGGGCAGCCGCGTCACGATTGGATACGACTGTTGGACCGG
+>SampleTag02_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTGGATGGGATAAGTGCGTGATGGACCGAAGGGACCTCGTGGCCGG
+>SampleTag03_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCGGCTCGTGCTGCGTCGTCTCAAGTCCAGAAACTCCGTGTATCCT
+>SampleTag04_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGATTGGGAGGCTTTCGTACCGCTGCCGCCACCAGGTGATACCCGCT
+>SampleTag05_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCTCCCTGGTGTTCAATACCCGATGTGGTGGGCAGAATGTGGCTGG
+>SampleTag06_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTTACCCGCAGGAAGACGTATACCCCTCGTGCCAGGCGACCAATGC
+>SampleTag07_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGTGTCTACGTCGGACCGCAAGAAGTGAGTCAGAGGCTGCACGCTGT
+>SampleTag08_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCCCCACCAGGTTGCTTTGTCGGACGAGCCCGCACAGCGCTAGGAT
+>SampleTag09_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGTGATCCGCGCAGGCACACATACCGACTCAGATGGGTTGTCCAGG
+>SampleTag10_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGCAGCCGGCGTCGTACGAGGCACAGCGGAGACTAGATGAGGCCCC
+>SampleTag11_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGCGCGTCCAATTTCCGAAGCCCCGCCCTAGGAGTTCCCCTGCGTGC
+>SampleTag12_hs|stAbO
+GTTGTCAAGATGCTACCGTTCAGAGGCCCATTCATTGCACCCGCCAGTGATCGACCCTAGTGGAGCTAAG
+EOF

From 619f1bbb6d040e650233d3b0380f5298e624ecef Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Wed, 18 Sep 2024 15:45:08 +0200
Subject: [PATCH 23/42] Rsem-calculate-expression (#93)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* three rsem components initial commit

* update container setup

* Simplified container configuration

* temporarily remove version recording from config

* Complete config file

* add tests and complete config file

* change test dataset

* functional test, adjustements to scripts

* Update changelog

* Simplified test data and help.txt contents

* suggested changes, typos

* simplify, get rid of test_data folder

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |    2 +
 .../rsem_calculate_expression/config.vsh.yaml |  479 ++++++++
 src/rsem/rsem_calculate_expression/help.txt   | 1002 +++++++++++++++++
 src/rsem/rsem_calculate_expression/script.sh  |  103 ++
 src/rsem/rsem_calculate_expression/test.sh    |  116 ++
 5 files changed, 1702 insertions(+)
 create mode 100644 src/rsem/rsem_calculate_expression/config.vsh.yaml
 create mode 100644 src/rsem/rsem_calculate_expression/help.txt
 create mode 100644 src/rsem/rsem_calculate_expression/script.sh
 create mode 100644 src/rsem/rsem_calculate_expression/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 07a83c15..9bfb5606 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,8 @@
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
 
+* `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
+
 ## MINOR CHANGES
 
 * Upgrade to Viash 0.9.0.
diff --git a/src/rsem/rsem_calculate_expression/config.vsh.yaml b/src/rsem/rsem_calculate_expression/config.vsh.yaml
new file mode 100644
index 00000000..2cd950cb
--- /dev/null
+++ b/src/rsem/rsem_calculate_expression/config.vsh.yaml
@@ -0,0 +1,479 @@
+name: "rsem_calculate_expression"
+namespace: "rsem"
+description: | 
+  Calculate expression with RSEM. 
+keywords: [Transcriptome, Index, Alignment, RSEM]
+links:
+  homepage: https://deweylab.github.io/RSEM/
+  documentation: https://deweylab.github.io/RSEM/rsem-calculate-expression.html
+  repository: https://github.com/deweylab/RSEM
+references: 
+  doi: https://doi.org/10.1186/1471-2105-12-323
+license: GPL-3.0
+  
+
+argument_groups:
+- name: "Input"
+  arguments:
+  - name: "--id"
+    type: string
+    description: Sample ID.
+  - name: "--strandedness"
+    type: string
+    description: Sample strand-specificity. Must be one of unstranded, forward, reverse
+    choices: [forward, reverse, unstranded]
+  - name: "--paired"
+    type: boolean_true
+    description: Paired-end reads or not?
+  - name: "--input"
+    type: file
+    description: Input reads for quantification.
+    multiple: true
+  - name: "--index"
+    type: file
+    must_exist: false
+    description: RSEM index.  
+  - name: "--extra_args"
+    type: string
+    description: Extra rsem-calculate-expression arguments in addition to the examples.
+
+- name: "Output"
+  arguments:
+  - name: "--counts_gene"
+    type: file
+    description: Expression counts on gene level
+    example: $id.genes.results
+    direction: output
+  - name: "--counts_transcripts"
+    type: file
+    description: Expression counts on transcript level
+    example: $id.isoforms.results
+    direction: output
+  - name: "--stat"
+    type: file
+    description: RSEM statistics
+    example: $id.stat
+    direction: output
+  - name: "--logs"
+    type: file
+    description: RSEM logs
+    example: $id.log
+    direction: output
+  - name: "--bam_star"
+    type: file
+    description: BAM file generated by STAR (optional)
+    example: $id.STAR.genome.bam
+    direction: output
+  - name: "--bam_genome"
+    type: file
+    description: Genome BAM file (optional)
+    example: $id.genome.bam
+    direction: output
+  - name: "--bam_transcript"
+    type: file
+    description: Transcript BAM file (optional)
+    example: $id.transcript.bam
+    direction: output
+  - name: "--sort_bam_by_read_name"
+    type: boolean_true
+    description: |
+      Sort BAM file aligned under transcript coordidate by read name. Setting this option on will produce 
+      deterministic maximum likelihood estimations from independent runs. Note that sorting will take long 
+      time and lots of memory.
+  - name: "--no_bam_output"
+    type: boolean_true
+    description: Do not output any BAM file.
+  - name: "--sampling_for_bam"
+    type: boolean_true
+    description: |
+      When RSEM generates a BAM file, instead of outputting all alignments a read has with their posterior 
+      probabilities, one alignment is sampled according to the posterior probabilities. The sampling procedure 
+      includes the alignment to the "noise" transcript, which does not appear in the BAM file. Only the 
+      sampled alignment has a weight of 1. All other alignments have weight 0. If the "noise" transcript is 
+      sampled, all alignments appeared in the BAM file should have weight 0.
+  - name: "--output_genome_bam"
+    type: boolean_true
+    description: |
+      Generate a BAM file, 'sample_name.genome.bam', with alignments mapped to genomic coordinates and 
+      annotated with their posterior probabilities. In addition, RSEM will call samtools (included in RSEM 
+      package) to sort and index the bam file. 'sample_name.genome.sorted.bam' and 'sample_name.genome.sorted.bam.bai' 
+      will be generated.
+  - name: "--sort_bam_by_coordinate"
+    type: boolean_true
+    description: |
+      Sort RSEM generated transcript and genome BAM files by coordinates and build associated indices.
+
+- name: "Basic Options"
+  arguments:
+    - name: "--no_qualities"
+      type: boolean_true
+      description: Input reads do not contain quality scores.
+    - name: "--alignments"
+      type: boolean_true
+      description: |
+        Input file contains alignments in SAM/BAM/CRAM format. The exact file format will be determined 
+        automatically.
+    - name: "--fai"
+      type: file
+      description: |
+        If the header section of input alignment file does not contain reference sequence information, 
+        this option should be turned on. <file> is a FAI format file containing each reference sequence's 
+        name and length. Please refer to the SAM official website for the details of FAI format.
+    - name: "--bowtie2"
+      type: boolean_true
+      description: |
+        Use Bowtie 2 instead of Bowtie to align reads. Since currently RSEM does not handle indel, local 
+        and discordant alignments, the Bowtie2 parameters are set in a way to avoid those alignments. In 
+        particular, we use options '--sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score_min L,0,-0.1' 
+        by default. The last parameter of '--score_min', '-0.1', is the negative of maximum mismatch rate. 
+        This rate can be set by option '--bowtie2_mismatch_rate'. If reads are paired-end, we additionally 
+        use options '--no_mixed' and '--no_discordant'.
+    - name: "--star"
+      type: boolean_true
+      description: |
+        Use STAR to align reads. Alignment parameters are from ENCODE3's STAR-RSEM pipeline. To save 
+        computational time and memory resources, STAR's Output BAM file is unsorted. It is stored in RSEM's 
+        temporary directory with name as 'sample_name.bam'. Each STAR job will have its own private copy of 
+        the genome in memory.
+    - name: "--hisat2_hca"
+      type: boolean_true
+      description: |
+        Use HISAT2 to align reads to the transcriptome according to Human Cell Atlast.
+    - name: "--append_names"
+      type: boolean_true
+      description: |
+        If gene_name/transcript_name is available, append it to the end of gene_id/transcript_id (separated 
+        by '_') in files 'sample_name.isoforms.results' and 'sample_name.genes.results'.
+    - name: "--seed"
+      type: integer
+      description: |
+        Set the seed for the random number generators used in calculating posterior mean estimates and 
+        credibility intervals. The seed must be a non-negative 32 bit integer.
+    - name: "--single_cell_prior"
+      type: boolean_true
+      description: |
+        By default, RSEM uses Dirichlet(1) as the prior to calculate posterior mean estimates and credibility 
+        intervals. However, much less genes are expressed in single cell RNA-Seq data. Thus, if you want to 
+        compute posterior mean estimates and/or credibility intervals and you have single-cell RNA-Seq data, 
+        you are recommended to turn on this option. Then RSEM will use Dirichlet(0.1) as the prior which 
+        encourage the sparsity of the expression levels.
+    - name: "--calc_pme"
+      type: boolean_true
+      description: Run RSEM's collapsed Gibbs sampler to calculate posterior mean estimates.
+    - name: "--calc_ci"
+      type: boolean_true
+      description: |
+        Calculate 95% credibility intervals and posterior mean estimates. The credibility level can be 
+        changed by setting '--ci_credibility_level'.
+    - name: "--quiet"
+      alternatives: "-q"
+      type: boolean_true
+      description: Suppress the output of logging information.
+
+- name: "Aligner Options"
+  arguments:
+    - name: "--seed_length"
+      type: integer
+      description: |
+        Seed length used by the read aligner. Providing the correct value is important for RSEM. If RSEM 
+        runs Bowtie, it uses this value for Bowtie's seed length parameter. Any read with its or at least 
+        one of its mates' (for paired-end reads) length less than this value will be ignored. If the 
+        references are not added poly(A) tails, the minimum allowed value is 5, otherwise, the minimum 
+        allowed value is 25. Note that this script will only check if the value >= 5 and give a warning 
+        message if the value < 25 but >= 5. (Default: 25)
+      example: 25
+    - name: "--phred64_quals"
+      type: boolean_true
+      description: |
+        Input quality scores are encoded as Phred+64 (default for GA Pipeline ver. >= 1.3). This option is 
+        used by Bowtie, Bowtie 2 and HISAT2. Otherwise, quality score will be encoded as Phred+33. (Default: false)
+    - name: "--solexa_quals"
+      type: boolean_true
+      description: |
+        Input quality scores are solexa encoded (from GA Pipeline ver. < 1.3). This option is used by 
+        Bowtie, Bowtie 2 and HISAT2. Otherwise, quality score will be encoded as Phred+33. (Default: false)
+    - name: "--bowtie_n"
+      type: integer
+      description: |
+        (Bowtie parameter) max # of mismatches in the seed. (Range: 0-3, Default: 2)
+      choices: [0, 1, 2, 3]
+      example: 2
+    - name: "--bowtie_e"
+      type: integer
+      description: |
+        (Bowtie parameter) max sum of mismatch quality scores across the alignment. (Default: 99999999)
+      example: 99999999
+    - name: "--bowtie_m"
+      type: integer
+      description: |
+        (Bowtie parameter) suppress all alignments for a read if > <int> valid alignments exist. (Default: 200)
+      example: 200
+    - name: "--bowtie_chunkmbs"
+      type: integer
+      description: |
+        (Bowtie parameter) memory allocated for best first alignment calculation (Default: 0 - use Bowtie's default)
+      example: 0
+    - name: "--bowtie2_mismatch_rate"
+      type: double
+      description: |
+        (Bowtie 2 parameter) The maximum mismatch rate allowed. (Default: 0.1)
+      example: 0.1
+    - name: "--bowtie2_k"
+      type: integer
+      description: |
+        (Bowtie 2 parameter) Find up to <int> alignments per read. (Default: 200)
+      example: 200
+    - name: "--bowtie2_sensitivity_level"
+      type: string
+      description: |
+        (Bowtie 2 parameter) Set Bowtie 2's preset options in --end-to-end mode. This option controls how 
+        hard Bowtie 2 tries to find alignments. <string> must be one of "very_fast", "fast", "sensitive" 
+        and "very_sensitive". The four candidates correspond to Bowtie 2's "--very-fast", "--fast", 
+        "--sensitive" and "--very-sensitive" options. (Default: "sensitive" - use Bowtie 2's default)
+      choices: ["very_fast", "fast", "sensitive", "very_sensitive"]
+      example: sensitive
+    - name: "--star_gzipped_read_file"
+      type: boolean_true
+      description: |
+        Input read file(s) is compressed by gzip. (Default: false)
+    - name: "--star_bzipped_read_file"
+      type: boolean_true
+      description: |
+        Input read file(s) is compressed by bzip2. (Default: false)
+    - name: "--star_output_genome_bam"
+      type: boolean_true
+      description: |
+        Save the BAM file from STAR alignment under genomic coordinate to 'sample_name.STAR.genome.bam'. 
+        This file is NOT sorted by genomic coordinate. In this file, according to STAR's manual, 'paired 
+        ends of an alignment are always adjacent, and multiple alignments of a read are adjacent as well'. 
+        (Default: false)
+
+- name: "Advanced Options"
+  arguments:
+    - name: "--tag"
+      type: string
+      description: |
+        The name of the optional field used in the SAM input for identifying a read with too many valid 
+        alignments. The field should have the format <tagName>:i:<value>, where a <value> bigger than 0 
+        indicates a read with too many alignments. (Default: "")
+      example: ""
+    - name: "--fragment_length_min"
+      type: integer
+      description: |
+        Minimum read/insert length allowed. This is also the value for the Bowtie/Bowtie2 -I option. 
+        (Default: 1)
+      example: 1
+    - name: "--fragment_length_max"
+      type: integer
+      description: |
+        Maximum read/insert length allowed. This is also the value for the Bowtie/Bowtie 2 -X option. 
+        (Default: 1000)
+      example: 1000
+    - name: "--fragment_length_mean"
+      type: integer
+      description: |
+        (single-end data only) The mean of the fragment length distribution, which is assumed to be a 
+        Gaussian. (Default: -1, which disables use of the fragment length distribution)
+      example: -1
+    - name: "--gragment_length_sd"
+      type: double
+      description: |
+        (single-end data only) The standard deviation of the fragment length distribution, which is 
+        assumed to be a Gaussian. (Default: 0, which assumes that all fragments are of the same length, 
+        given by the rounded value of --fragment_length_mean).
+      example: 0
+    - name: "--estimate_rspd"
+      type: boolean_true
+      description: |
+        Set this option if you want to estimate the read start position distribution (RSPD) from data.
+        Otherwise, RSEM will use a uniform RSPD.
+    - name: "--num_rspd_bins"
+      type: integer
+      description: |
+        Number of bins in the RSPD. Only relevant when '--estimate_rspd' is specified. Use of the default 
+        setting is recommended. (Default: 20)
+      example: 20
+    - name: "--gibbs_burnin"
+      type: integer
+      description: |
+        The number of burn-in rounds for RSEM's Gibbs sampler. Each round passes over the entire data set 
+        once. If RSEM can use multiple threads, multiple Gibbs samplers will start at the same time and all 
+        samplers share the same burn-in number. (Default: 200)
+      example: 200
+    - name: "--gibbs_number_of_samples"
+      type: integer
+      description: |
+        The total number of count vectors RSEM will collect from its Gibbs samplers. (Default: 1000)
+      example: 1000
+    - name: "--gibbs_sampling_gap"
+      type: integer
+      description: |
+        The number of rounds between two succinct count vectors RSEM collects. If the count vector after 
+        round N is collected, the count vector after round N + <int> will also be collected. (Default: 1)
+      example: 1
+    - name: "--ci_credibility_level"
+      type: double
+      description: |
+        The credibility level for credibility intervals. (Default: 0.95)
+      example: 0.95
+    - name: "--ci_number_of_samples_per_count_vector"
+      type: integer
+      description: |
+        The number of read generating probability vectors sampled per sampled count vector. The crebility 
+        intervals are calculated by first sampling P(C | D) and then sampling P(Theta | C) for each sampled 
+        count vector. This option controls how many Theta vectors are sampled per sampled count vector. 
+        (Default: 50)
+      example: 50
+    - name: "--keep_intermediate_files"
+      type: boolean_true
+      description: |
+        Keep temporary files generated by RSEM. RSEM creates a temporary directory, 'sample_name.temp', 
+        into which it puts all intermediate output files. If this directory already exists, RSEM overwrites 
+        all files generated by previous RSEM runs inside of it. By default, after RSEM finishes, the 
+        temporary directory is deleted. Set this option to prevent the deletion of this directory and the 
+        intermediate files inside of it.
+    - name: "--temporary_folder"
+      type: string
+      description: |
+        Set where to put the temporary files generated by RSEM. If the folder specified does not exist, 
+        RSEM will try to create it. (Default: sample_name.temp)
+      example: sample_name.temp
+    - name: "--time"
+      type: boolean_true
+      description: |
+        Output time consumed by each step of RSEM to 'sample_name.time'.
+
+- name: "Prior-Enhanced RSEM Options"
+  arguments:
+    - name: "--run_pRSEM"
+      type: boolean_true
+      description: |
+        Running prior-enhanced RSEM (pRSEM). Prior parameters, i.e. isoform's initial pseudo-count for 
+        RSEM's Gibbs sampling, will be learned from input RNA-seq data and an external data set. When pRSEM 
+        needs and only needs ChIP-seq peak information to partition isoforms (e.g. in pRSEM's default 
+        partition model), either ChIP-seq peak file (with the '--chipseq_peak_file' option) or ChIP-seq 
+        FASTQ files for target and input and the path for Bowtie executables are required (with the 
+        '--chipseq_target_read_files <string>', '--chipseq_control_read_files <string>', and '--bowtie_path 
+        <path> options), otherwise, ChIP-seq FASTQ files for target and control and the path to Bowtie 
+        executables are required.
+    - name: "--chipseq_peak_file"
+      type: file
+      must_exist: true
+      description: |
+        Full path to a ChIP-seq peak file in ENCODE's narrowPeak, i.e. BED6+4, format. This file is used 
+        when running prior-enhanced RSEM in the default two-partition model. It partitions isoforms by 
+        whether they have ChIP-seq overlapping with their transcription start site region or not. Each 
+        partition will have its own prior parameter learned from a training set. This file can be either 
+        gzipped or ungzipped.
+    - name: "--chipseq_target_read_files"
+      type: file
+      must_exist: true
+      description: |
+        Comma-separated full path of FASTQ read file(s) for ChIP-seq target. This option is used when running 
+        prior-enhanced RSEM. It provides information to calculate ChIP-seq peaks and signals. The file(s) 
+        can be either ungzipped or gzipped with a suffix '.gz' or '.gzip'. The options '--bowtie_path <path>' 
+        and '--chipseq_control_read_files <string>' must be defined when this option is specified.
+    - name: "--chipseq_control_read_files"
+      type: file
+      must_exist: true
+      description: |
+        Comma-separated full path of FASTQ read file(s) for ChIP-seq conrol. This option is used when running 
+        prior-enhanced RSEM. It provides information to call ChIP-seq peaks. The file(s) can be either 
+        ungzipped or gzipped with a suffix '.gz' or '.gzip'. The options '--bowtie_path <path>' and 
+        '--chipseq_target_read_files <string>' must be defined when this option is specified.
+    - name: "--chipseq_read_files_multi_targets"
+      type: file
+      must_exist: true
+      description: |
+        Comma-separated full path of FASTQ read files for multiple ChIP-seq targets. This option is used when 
+        running prior-enhanced RSEM, where prior is learned from multiple complementary data sets. It provides 
+        information to calculate ChIP-seq signals. All files can be either ungzipped or gzipped with a suffix 
+        '.gz' or '.gzip'. When this option is specified, the option '--bowtie_path <path>' must be defined and 
+        the option '--partition_model <string>' will be set to 'cmb_lgt' automatically.
+    - name: "--chipseq_bed_files_multi_targets"
+      type: file
+      must_exist: true
+      description: |
+        Comma-separated full path of BED files for multiple ChIP-seq targets. This option is used when running 
+        prior-enhanced RSEM, where prior is learned from multiple complementary data sets. It provides information 
+        of ChIP-seq signals and must have at least the first six BED columns. All files can be either ungzipped 
+        or gzipped with a suffix '.gz' or '.gzip'. When this option is specified, the option '--partition_model 
+        <string>' will be set to 'cmb_lgt' automatically.
+    - name: "--cap_stacked_chipseq_reads"
+      type: boolean_true
+      description: |
+        Keep a maximum number of ChIP-seq reads that aligned to the same genomic interval. This option is used 
+        when running prior-enhanced RSEM, where prior is learned from multiple complementary data sets. This 
+        option is only in use when either '--chipseq_read_files_multi_targets <string>' or 
+        '--chipseq_bed_files_multi_targets <string>' is specified.
+    - name: "--n_max_stacked_chipseq_reads"
+      type: integer
+      description: |
+        The maximum number of stacked ChIP-seq reads to keep. This option is used when running prior-enhanced 
+        RSEM, where prior is learned from multiple complementary data sets. This option is only in use when the 
+        option '--cap_stacked_chipseq_reads' is set.
+    - name: "--partition_model"
+      type: string
+      description: |
+        A keyword to specify the partition model used by prior-enhanced RSEM. It must be one of the following 
+        keywords:
+        * pk
+        * pk_lgtnopk
+        * lm3, lm4, lm5, or lm6
+        * nopk_lm2pk, nopk_lm3pk, nopk_lm4pk, or nopk_lm5pk
+        * pk_lm2nopk, pk_lm3nopk, pk_lm4nopk, or pk_lm5nopk
+        * cmb_lgt
+        Parameters for all the above models are learned from a training set. For detailed explanations, please 
+        see prior-enhanced RSEM's paper. (Default: 'pk')
+      example: "pk"
+    
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+ 
+engines:
+  - type: docker
+    image: ubuntu:22.04
+    setup:
+    - type: apt
+      packages: 
+        - build-essential 
+        - gcc 
+        - g++ 
+        - make 
+        - wget 
+        - zlib1g-dev 
+        - unzip
+    - type: docker
+      env: 
+        - STAR_VERSION=2.7.11b
+        - RSEM_VERSION=1.3.3
+      run: |
+          apt-get update && \
+          apt-get clean && \
+          wget --no-check-certificate https://github.com/alexdobin/STAR/archive/refs/tags/2.7.11a.zip && \
+          unzip 2.7.11a.zip && \
+          cp STAR-2.7.11a/bin/Linux_x86_64_static/STAR /usr/local/bin  && \
+          cd && \
+          wget --no-check-certificate https://github.com/deweylab/RSEM/archive/refs/tags/v1.3.3.zip && \
+          unzip v1.3.3.zip && \
+          cd RSEM-1.3.3 && \
+          make && \
+          make install
+    - type: docker
+      run: |
+          echo "RSEM: `rsem-calculate-expression --version | sed -e 's/Current version: RSEM v//g'`" > /var/software_versions.txt && \
+          echo "STAR: `STAR --version`" >> /var/software_versions.txt && \
+          echo "bowtie2: `bowtie2 --version | grep -oP '\d+\.\d+\.\d+'`" >> /var/software_versions.txt && \
+          echo "bowtie: `bowtie --version | grep -oP 'bowtie-align-s version \K\d+\.\d+\.\d+'`" >> /var/software_versions.txt && \
+          echo "HISAT2: `hisat2 --version | grep -oP 'hisat2-align-s version \K\d+\.\d+\.\d+'`" >> /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
+
+
diff --git a/src/rsem/rsem_calculate_expression/help.txt b/src/rsem/rsem_calculate_expression/help.txt
new file mode 100644
index 00000000..edfa3333
--- /dev/null
+++ b/src/rsem/rsem_calculate_expression/help.txt
@@ -0,0 +1,1002 @@
+NAME
+    rsem-calculate-expression - Estimate gene and isoform expression from
+    RNA-Seq data.
+
+SYNOPSIS
+     rsem-calculate-expression [options] upstream_read_file(s) reference_name sample_name 
+     rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name 
+     rsem-calculate-expression [options] --alignments [--paired-end] input reference_name sample_name
+
+ARGUMENTS
+    upstream_read_files(s)
+        Comma-separated list of files containing single-end reads or
+        upstream reads for paired-end data. By default, these files are
+        assumed to be in FASTQ format. If the --no-qualities option is
+        specified, then FASTA format is expected.
+
+    downstream_read_file(s)
+        Comma-separated list of files containing downstream reads which are
+        paired with the upstream reads. By default, these files are assumed
+        to be in FASTQ format. If the --no-qualities option is specified,
+        then FASTA format is expected.
+
+    input
+        SAM/BAM/CRAM formatted input file. If "-" is specified for the
+        filename, the input is instead assumed to come from standard input.
+        RSEM requires all alignments of the same read group together. For
+        paired-end reads, RSEM also requires the two mates of any alignment
+        be adjacent. In addition, RSEM does not allow the SEQ and QUAL
+        fields to be empty. See Description section for how to make input
+        file obey RSEM's requirements.
+
+    reference_name
+        The name of the reference used. The user must have run
+        'rsem-prepare-reference' with this reference_name before running
+        this program.
+
+    sample_name
+        The name of the sample analyzed. All output files are prefixed by
+        this name (e.g., sample_name.genes.results)
+
+BASIC OPTIONS
+    --paired-end
+        Input reads are paired-end reads. (Default: off)
+
+    --no-qualities
+        Input reads do not contain quality scores. (Default: off)
+
+    --strandedness <none|forward|reverse>
+        This option defines the strandedness of the RNA-Seq reads. It
+        recognizes three values: 'none', 'forward', and 'reverse'. 'none'
+        refers to non-strand-specific protocols. 'forward' means all
+        (upstream) reads are derived from the forward strand. 'reverse'
+        means all (upstream) reads are derived from the reverse strand. If
+        'forward'/'reverse' is set, the '--norc'/'--nofw' Bowtie/Bowtie 2
+        option will also be enabled to avoid aligning reads to the opposite
+        strand. For Illumina TruSeq Stranded protocols, please use
+        'reverse'. (Default: 'none')
+
+    -p/--num-threads <int>
+        Number of threads to use. Both Bowtie/Bowtie2, expression estimation
+        and 'samtools sort' will use this many threads. (Default: 1)
+
+    --alignments
+        Input file contains alignments in SAM/BAM/CRAM format. The exact
+        file format will be determined automatically. (Default: off)
+
+    --fai <file>
+        If the header section of input alignment file does not contain
+        reference sequence information, this option should be turned on.
+        <file> is a FAI format file containing each reference sequence's
+        name and length. Please refer to the SAM official website for the
+        details of FAI format. (Default: off)
+
+    --bowtie2
+        Use Bowtie 2 instead of Bowtie to align reads. Since currently RSEM
+        does not handle indel, local and discordant alignments, the Bowtie2
+        parameters are set in a way to avoid those alignments. In
+        particular, we use options '--sensitive --dpad 0 --gbar 99999999
+        --mp 1,1 --np 1 --score-min L,0,-0.1' by default. The last parameter
+        of '--score-min', '-0.1', is the negative of maximum mismatch rate.
+        This rate can be set by option '--bowtie2-mismatch-rate'. If reads
+        are paired-end, we additionally use options '--no-mixed' and
+        '--no-discordant'. (Default: off)
+
+    --star
+        Use STAR to align reads. Alignment parameters are from ENCODE3's
+        STAR-RSEM pipeline. To save computational time and memory resources,
+        STAR's Output BAM file is unsorted. It is stored in RSEM's temporary
+        directory with name as 'sample_name.bam'. Each STAR job will have
+        its own private copy of the genome in memory. (Default: off)
+
+    --hisat2-hca
+        Use HISAT2 to align reads to the transcriptome according to Human
+        Cell Atlast SMART-Seq2 pipeline. In particular, we use HISAT
+        parameters "-k 10 --secondary --rg-id=$sampleToken --rg
+        SM:$sampleToken --rg LB:$sampleToken --rg PL:ILLUMINA --rg
+        PU:$sampleToken --new-summary --summary-file $sampleName.log
+        --met-file $sampleName.hisat2.met.txt --met 5 --mp 1,1 --np 1
+        --score-min L,0,-0.1 --rdg 99999999,99999999 --rfg 99999999,99999999
+        --no-spliced-alignment --no-softclip --seed 12345". If inputs are
+        paired-end reads, we additionally use parameters "--no-mixed
+        --no-discordant". (Default: off)
+
+    --append-names
+        If gene_name/transcript_name is available, append it to the end of
+        gene_id/transcript_id (separated by '_') in files
+        'sample_name.isoforms.results' and 'sample_name.genes.results'.
+        (Default: off)
+
+    --seed <uint32>
+        Set the seed for the random number generators used in calculating
+        posterior mean estimates and credibility intervals. The seed must be
+        a non-negative 32 bit integer. (Default: off)
+
+    --single-cell-prior
+        By default, RSEM uses Dirichlet(1) as the prior to calculate
+        posterior mean estimates and credibility intervals. However, much
+        less genes are expressed in single cell RNA-Seq data. Thus, if you
+        want to compute posterior mean estimates and/or credibility
+        intervals and you have single-cell RNA-Seq data, you are recommended
+        to turn on this option. Then RSEM will use Dirichlet(0.1) as the
+        prior which encourage the sparsity of the expression levels.
+        (Default: off)
+
+    --calc-pme
+        Run RSEM's collapsed Gibbs sampler to calculate posterior mean
+        estimates. (Default: off)
+
+    --calc-ci
+        Calculate 95% credibility intervals and posterior mean estimates.
+        The credibility level can be changed by setting
+        '--ci-credibility-level'. (Default: off)
+
+    -q/--quiet
+        Suppress the output of logging information. (Default: off)
+
+    -h/--help
+        Show help information.
+
+    --version
+        Show version information.
+
+OUTPUT OPTIONS
+    --sort-bam-by-read-name
+        Sort BAM file aligned under transcript coordidate by read name.
+        Setting this option on will produce deterministic maximum likelihood
+        estimations from independent runs. Note that sorting will take long
+        time and lots of memory. (Default: off)
+
+    --no-bam-output
+        Do not output any BAM file. (Default: off)
+
+    --sampling-for-bam
+        When RSEM generates a BAM file, instead of outputting all alignments
+        a read has with their posterior probabilities, one alignment is
+        sampled according to the posterior probabilities. The sampling
+        procedure includes the alignment to the "noise" transcript, which
+        does not appear in the BAM file. Only the sampled alignment has a
+        weight of 1. All other alignments have weight 0. If the "noise"
+        transcript is sampled, all alignments appeared in the BAM file
+        should have weight 0. (Default: off)
+
+    --output-genome-bam
+        Generate a BAM file, 'sample_name.genome.bam', with alignments
+        mapped to genomic coordinates and annotated with their posterior
+        probabilities. In addition, RSEM will call samtools (included in
+        RSEM package) to sort and index the bam file.
+        'sample_name.genome.sorted.bam' and
+        'sample_name.genome.sorted.bam.bai' will be generated. (Default:
+        off)
+
+    --sort-bam-by-coordinate
+        Sort RSEM generated transcript and genome BAM files by coordinates
+        and build associated indices. (Default: off)
+
+    --sort-bam-memory-per-thread <string>
+        Set the maximum memory per thread that can be used by 'samtools
+        sort'. <string> represents the memory and accepts suffices 'K/M/G'.
+        RSEM will pass <string> to the '-m' option of 'samtools sort'. Note
+        that the default used here is different from the default used by
+        samtools. (Default: 1G)
+
+ALIGNER OPTIONS
+    --seed-length <int>
+        Seed length used by the read aligner. Providing the correct value is
+        important for RSEM. If RSEM runs Bowtie, it uses this value for
+        Bowtie's seed length parameter. Any read with its or at least one of
+        its mates' (for paired-end reads) length less than this value will
+        be ignored. If the references are not added poly(A) tails, the
+        minimum allowed value is 5, otherwise, the minimum allowed value is
+        25. Note that this script will only check if the value >= 5 and give
+        a warning message if the value < 25 but >= 5. (Default: 25)
+
+    --phred33-quals
+        Input quality scores are encoded as Phred+33. This option is used by
+        Bowtie, Bowtie 2 and HISAT2. (Default: on)
+
+    --phred64-quals
+        Input quality scores are encoded as Phred+64 (default for GA
+        Pipeline ver. >= 1.3). This option is used by Bowtie, Bowtie 2 and
+        HISAT2. (Default: off)
+
+    --solexa-quals
+        Input quality scores are solexa encoded (from GA Pipeline ver. <
+        1.3). This option is used by Bowtie, Bowtie 2 and HISAT2. (Default:
+        off)
+
+    --bowtie-path <path>
+        The path to the Bowtie executables. (Default: the path to the Bowtie
+        executables is assumed to be in the user's PATH environment
+        variable)
+
+    --bowtie-n <int>
+        (Bowtie parameter) max # of mismatches in the seed. (Range: 0-3,
+        Default: 2)
+
+    --bowtie-e <int>
+        (Bowtie parameter) max sum of mismatch quality scores across the
+        alignment. (Default: 99999999)
+
+    --bowtie-m <int>
+        (Bowtie parameter) suppress all alignments for a read if > <int>
+        valid alignments exist. (Default: 200)
+
+    --bowtie-chunkmbs <int>
+        (Bowtie parameter) memory allocated for best first alignment
+        calculation (Default: 0 - use Bowtie's default)
+
+    --bowtie2-path <path>
+        (Bowtie 2 parameter) The path to the Bowtie 2 executables. (Default:
+        the path to the Bowtie 2 executables is assumed to be in the user's
+        PATH environment variable)
+
+    --bowtie2-mismatch-rate <double>
+        (Bowtie 2 parameter) The maximum mismatch rate allowed. (Default:
+        0.1)
+
+    --bowtie2-k <int>
+        (Bowtie 2 parameter) Find up to <int> alignments per read. (Default:
+        200)
+
+    --bowtie2-sensitivity-level <string>
+        (Bowtie 2 parameter) Set Bowtie 2's preset options in --end-to-end
+        mode. This option controls how hard Bowtie 2 tries to find
+        alignments. <string> must be one of "very_fast", "fast", "sensitive"
+        and "very_sensitive". The four candidates correspond to Bowtie 2's
+        "--very-fast", "--fast", "--sensitive" and "--very-sensitive"
+        options. (Default: "sensitive" - use Bowtie 2's default)
+
+    --star-path <path>
+        The path to STAR's executable. (Default: the path to STAR executable
+        is assumed to be in user's PATH environment variable)
+
+    --star-gzipped-read-file
+        (STAR parameter) Input read file(s) is compressed by gzip. (Default:
+        off)
+
+    --star-bzipped-read-file
+        (STAR parameter) Input read file(s) is compressed by bzip2.
+        (Default: off)
+
+    --star-output-genome-bam
+        (STAR parameter) Save the BAM file from STAR alignment under genomic
+        coordinate to 'sample_name.STAR.genome.bam'. This file is NOT sorted
+        by genomic coordinate. In this file, according to STAR's manual,
+        'paired ends of an alignment are always adjacent, and multiple
+        alignments of a read are adjacent as well'. (Default: off)
+
+    --hisat2-path <path>
+        The path to HISAT2's executable. (Default: the path to HISAT2
+        executable is assumed to be in user's PATH environment variable)
+
+ADVANCED OPTIONS
+    --tag <string>
+        The name of the optional field used in the SAM input for identifying
+        a read with too many valid alignments. The field should have the
+        format <tagName>:i:<value>, where a <value> bigger than 0 indicates
+        a read with too many alignments. (Default: "")
+
+    --fragment-length-min <int>
+        Minimum read/insert length allowed. This is also the value for the
+        Bowtie/Bowtie2 -I option. (Default: 1)
+
+    --fragment-length-max <int>
+        Maximum read/insert length allowed. This is also the value for the
+        Bowtie/Bowtie 2 -X option. (Default: 1000)
+
+    --fragment-length-mean <double>
+        (single-end data only) The mean of the fragment length distribution,
+        which is assumed to be a Gaussian. (Default: -1, which disables use
+        of the fragment length distribution)
+
+    --fragment-length-sd <double>
+        (single-end data only) The standard deviation of the fragment length
+        distribution, which is assumed to be a Gaussian. (Default: 0, which
+        assumes that all fragments are of the same length, given by the
+        rounded value of --fragment-length-mean)
+
+    --estimate-rspd
+        Set this option if you want to estimate the read start position
+        distribution (RSPD) from data. Otherwise, RSEM will use a uniform
+        RSPD. (Default: off)
+
+    --num-rspd-bins <int>
+        Number of bins in the RSPD. Only relevant when '--estimate-rspd' is
+        specified. Use of the default setting is recommended. (Default: 20)
+
+    --gibbs-burnin <int>
+        The number of burn-in rounds for RSEM's Gibbs sampler. Each round
+        passes over the entire data set once. If RSEM can use multiple
+        threads, multiple Gibbs samplers will start at the same time and all
+        samplers share the same burn-in number. (Default: 200)
+
+    --gibbs-number-of-samples <int>
+        The total number of count vectors RSEM will collect from its Gibbs
+        samplers. (Default: 1000)
+
+    --gibbs-sampling-gap <int>
+        The number of rounds between two succinct count vectors RSEM
+        collects. If the count vector after round N is collected, the count
+        vector after round N + <int> will also be collected. (Default: 1)
+
+    --ci-credibility-level <double>
+        The credibility level for credibility intervals. (Default: 0.95)
+
+    --ci-memory <int>
+        Maximum size (in memory, MB) of the auxiliary buffer used for
+        computing credibility intervals (CI). (Default: 1024)
+
+    --ci-number-of-samples-per-count-vector <int>
+        The number of read generating probability vectors sampled per
+        sampled count vector. The crebility intervals are calculated by
+        first sampling P(C | D) and then sampling P(Theta | C) for each
+        sampled count vector. This option controls how many Theta vectors
+        are sampled per sampled count vector. (Default: 50)
+
+    --keep-intermediate-files
+        Keep temporary files generated by RSEM. RSEM creates a temporary
+        directory, 'sample_name.temp', into which it puts all intermediate
+        output files. If this directory already exists, RSEM overwrites all
+        files generated by previous RSEM runs inside of it. By default,
+        after RSEM finishes, the temporary directory is deleted. Set this
+        option to prevent the deletion of this directory and the
+        intermediate files inside of it. (Default: off)
+
+    --temporary-folder <string>
+        Set where to put the temporary files generated by RSEM. If the
+        folder specified does not exist, RSEM will try to create it.
+        (Default: sample_name.temp)
+
+    --time
+        Output time consumed by each step of RSEM to 'sample_name.time'.
+        (Default: off)
+
+PRIOR-ENHANCED RSEM OPTIONS
+    --run-pRSEM
+        Running prior-enhanced RSEM (pRSEM). Prior parameters, i.e.
+        isoform's initial pseudo-count for RSEM's Gibbs sampling, will be
+        learned from input RNA-seq data and an external data set. When pRSEM
+        needs and only needs ChIP-seq peak information to partition isoforms
+        (e.g. in pRSEM's default partition model), either ChIP-seq peak file
+        (with the '--chipseq-peak-file' option) or ChIP-seq FASTQ files for
+        target and input and the path for Bowtie executables are required
+        (with the '--chipseq-target-read-files <string>',
+        '--chipseq-control-read-files <string>', and '--bowtie-path <path>
+        options), otherwise, ChIP-seq FASTQ files for target and control and
+        the path to Bowtie executables are required. (Default: off)
+
+    --chipseq-peak-file <string>
+        Full path to a ChIP-seq peak file in ENCODE's narrowPeak, i.e.
+        BED6+4, format. This file is used when running prior-enhanced RSEM
+        in the default two-partition model. It partitions isoforms by
+        whether they have ChIP-seq overlapping with their transcription
+        start site region or not. Each partition will have its own prior
+        parameter learned from a training set. This file can be either
+        gzipped or ungzipped. (Default: "")
+
+    --chipseq-target-read-files <string>
+        Comma-separated full path of FASTQ read file(s) for ChIP-seq target.
+        This option is used when running prior-enhanced RSEM. It provides
+        information to calculate ChIP-seq peaks and signals. The file(s) can
+        be either ungzipped or gzipped with a suffix '.gz' or '.gzip'. The
+        options '--bowtie-path <path>' and '--chipseq-control-read-files
+        <string>' must be defined when this option is specified. (Default:
+        "")
+
+    --chipseq-control-read-files <string>
+        Comma-separated full path of FASTQ read file(s) for ChIP-seq conrol.
+        This option is used when running prior-enhanced RSEM. It provides
+        information to call ChIP-seq peaks. The file(s) can be either
+        ungzipped or gzipped with a suffix '.gz' or '.gzip'. The options
+        '--bowtie-path <path>' and '--chipseq-target-read-files <string>'
+        must be defined when this option is specified. (Default: "")
+
+    --chipseq-read-files-multi-targets <string>
+        Comma-separated full path of FASTQ read files for multiple ChIP-seq
+        targets. This option is used when running prior-enhanced RSEM, where
+        prior is learned from multiple complementary data sets. It provides
+        information to calculate ChIP-seq signals. All files can be either
+        ungzipped or gzipped with a suffix '.gz' or '.gzip'. When this
+        option is specified, the option '--bowtie-path <path>' must be
+        defined and the option '--partition-model <string>' will be set to
+        'cmb_lgt' automatically. (Default: "")
+
+    --chipseq-bed-files-multi-targets <string>
+        Comma-separated full path of BED files for multiple ChIP-seq
+        targets. This option is used when running prior-enhanced RSEM, where
+        prior is learned from multiple complementary data sets. It provides
+        information of ChIP-seq signals and must have at least the first six
+        BED columns. All files can be either ungzipped or gzipped with a
+        suffix '.gz' or '.gzip'. When this option is specified, the option
+        '--partition-model <string>' will be set to 'cmb_lgt' automatically.
+        (Default: "")
+
+    --cap-stacked-chipseq-reads
+        Keep a maximum number of ChIP-seq reads that aligned to the same
+        genomic interval. This option is used when running prior-enhanced
+        RSEM, where prior is learned from multiple complementary data sets.
+        This option is only in use when either
+        '--chipseq-read-files-multi-targets <string>' or
+        '--chipseq-bed-files-multi-targets <string>' is specified. (Default:
+        off)
+
+    --n-max-stacked-chipseq-reads <int>
+        The maximum number of stacked ChIP-seq reads to keep. This option is
+        used when running prior-enhanced RSEM, where prior is learned from
+        multiple complementary data sets. This option is only in use when
+        the option '--cap-stacked-chipseq-reads' is set. (Default: 5)
+
+    --partition-model <string>
+        A keyword to specify the partition model used by prior-enhanced
+        RSEM. It must be one of the following keywords:
+
+        - pk
+          Partitioned by whether an isoform has a ChIP-seq peak overlapping
+          with its transcription start site (TSS) region. The TSS region is
+          defined as [TSS-500bp, TSS+500bp]. For simplicity, we refer this
+          type of peak as 'TSS peak' when explaining other keywords.
+
+        - pk_lgtnopk
+          First partitioned by TSS peak. Then, for isoforms in the 'no TSS
+          peak' set, a logistic model is employed to further classify them
+          into two partitions.
+
+        - lm3, lm4, lm5, or lm6
+          Based on their ChIP-seq signals, isoforms are classified into 3,
+          4, 5, or 6 partitions by a linear regression model.
+
+        - nopk_lm2pk, nopk_lm3pk, nopk_lm4pk, or nopk_lm5pk
+          First partitioned by TSS peak. Then, for isoforms in the 'with TSS
+          peak' set, a linear regression model is employed to further
+          classify them into 2, 3, 4, or 5 partitions.
+
+        - pk_lm2nopk, pk_lm3nopk, pk_lm4nopk, or pk_lm5nopk
+          First partitioned by TSS peak. Then, for isoforms in the 'no TSS
+          peak' set, a linear regression model is employed to further
+          classify them into 2, 3, 4, or 5 partitions.
+
+        - cmb_lgt
+          Using a logistic regression to combine TSS signals from multiple
+          complementary data sets and partition training set isoform into
+          'expressed' and 'not expressed'. This partition model is only in
+          use when either '--chipseq-read-files-multi-targets <string>' or
+          '--chipseq-bed-files-multi-targets <string> is specified.
+
+        Parameters for all the above models are learned from a training set.
+        For detailed explanations, please see prior-enhanced RSEM's paper.
+        (Default: 'pk')
+
+DEPRECATED OPTIONS
+        The options in this section are deprecated. They are here only for
+        compatibility reasons and may be removed in future releases.
+
+    --sam
+        Inputs are alignments in SAM format. (Default: off)
+
+    --bam
+        Inputs are alignments in BAM format. (Default: off)
+
+    --strand-specific
+        Equivalent to '--strandedness forward'. (Default: off)
+
+    --forward-prob <double>
+        Probability of generating a read from the forward strand of a
+        transcript. Set to 1 for a strand-specific protocol where all
+        (upstream) reads are derived from the forward strand, 0 for a
+        strand-specific protocol where all (upstream) read are derived from
+        the reverse strand, or 0.5 for a non-strand-specific protocol.
+        (Default: off)
+
+DESCRIPTION
+    In its default mode, this program aligns input reads against a reference
+    transcriptome with Bowtie and calculates expression values using the
+    alignments. RSEM assumes the data are single-end reads with quality
+    scores, unless the '--paired-end' or '--no-qualities' options are
+    specified. Alternatively, users can use STAR to align reads using the
+    '--star' option. RSEM has provided options in 'rsem-prepare-reference'
+    to prepare STAR's genome indices. Users may use an alternative aligner
+    by specifying '--alignments', and providing an alignment file in
+    SAM/BAM/CRAM format. However, users should make sure that they align
+    against the indices generated by 'rsem-prepare-reference' and the
+    alignment file satisfies the requirements mentioned in ARGUMENTS
+    section.
+
+    One simple way to make the alignment file satisfying RSEM's requirements
+    is to use the 'convert-sam-for-rsem' script. This script accepts
+    SAM/BAM/CRAM files as input and outputs a BAM file. For example, type
+    the following command to convert a SAM file, 'input.sam', to a
+    ready-for-use BAM file, 'input_for_rsem.bam':
+
+      convert-sam-for-rsem input.sam input_for_rsem
+
+    For details, please refer to 'convert-sam-for-rsem's documentation page.
+
+NOTES
+    1. Users must run 'rsem-prepare-reference' with the appropriate
+    reference before using this program.
+
+    2. For single-end data, it is strongly recommended that the user provide
+    the fragment length distribution parameters (--fragment-length-mean and
+    --fragment-length-sd). For paired-end data, RSEM will automatically
+    learn a fragment length distribution from the data.
+
+    3. Some aligner parameters have default values different from their
+    original settings.
+
+    4. With the '--calc-pme' option, posterior mean estimates will be
+    calculated in addition to maximum likelihood estimates.
+
+    5. With the '--calc-ci' option, 95% credibility intervals and posterior
+    mean estimates will be calculated in addition to maximum likelihood
+    estimates.
+
+    6. The temporary directory and all intermediate files will be removed
+    when RSEM finishes unless '--keep-intermediate-files' is specified.
+
+    With the '--run-pRSEM' option and associated options (see section
+    'PRIOR-ENHANCED RSEM OPTIONS' above for details), prior-enhanced RSEM
+    will be running. Prior parameters will be learned from supplied external
+    data set(s) and assigned as initial pseudo-counts for isoforms in the
+    corresponding partition for Gibbs sampling.
+
+OUTPUT
+    sample_name.isoforms.results
+        File containing isoform level expression estimates. The first line
+        contains column names separated by the tab character. The format of
+        each line in the rest of this file is:
+
+        transcript_id gene_id length effective_length expected_count TPM
+        FPKM IsoPct [posterior_mean_count
+        posterior_standard_deviation_of_count pme_TPM pme_FPKM
+        IsoPct_from_pme_TPM TPM_ci_lower_bound TPM_ci_upper_bound
+        TPM_coefficient_of_quartile_variation FPKM_ci_lower_bound
+        FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation]
+
+        Fields are separated by the tab character. Fields within "[]" are
+        optional. They will not be presented if neither '--calc-pme' nor
+        '--calc-ci' is set.
+
+        'transcript_id' is the transcript name of this transcript. 'gene_id'
+        is the gene name of the gene which this transcript belongs to
+        (denote this gene as its parent gene). If no gene information is
+        provided, 'gene_id' and 'transcript_id' are the same.
+
+        'length' is this transcript's sequence length (poly(A) tail is not
+        counted). 'effective_length' counts only the positions that can
+        generate a valid fragment. If no poly(A) tail is added,
+        'effective_length' is equal to transcript length - mean fragment
+        length + 1. If one transcript's effective length is less than 1,
+        this transcript's both effective length and abundance estimates are
+        set to 0.
+
+        'expected_count' is the sum of the posterior probability of each
+        read comes from this transcript over all reads. Because 1) each read
+        aligning to this transcript has a probability of being generated
+        from background noise; 2) RSEM may filter some alignable low quality
+        reads, the sum of expected counts for all transcript are generally
+        less than the total number of reads aligned.
+
+        'TPM' stands for Transcripts Per Million. It is a relative measure
+        of transcript abundance. The sum of all transcripts' TPM is 1
+        million. 'FPKM' stands for Fragments Per Kilobase of transcript per
+        Million mapped reads. It is another relative measure of transcript
+        abundance. If we define l_bar be the mean transcript length in a
+        sample, which can be calculated as
+
+        l_bar = \sum_i TPM_i / 10^6 * effective_length_i (i goes through
+        every transcript),
+
+        the following equation is hold:
+
+        FPKM_i = 10^3 / l_bar * TPM_i.
+
+        We can see that the sum of FPKM is not a constant across samples.
+
+        'IsoPct' stands for isoform percentage. It is the percentage of this
+        transcript's abandunce over its parent gene's abandunce. If its
+        parent gene has only one isoform or the gene information is not
+        provided, this field will be set to 100.
+
+        'posterior_mean_count', 'pme_TPM', 'pme_FPKM' are posterior mean
+        estimates calculated by RSEM's Gibbs sampler.
+        'posterior_standard_deviation_of_count' is the posterior standard
+        deviation of counts. 'IsoPct_from_pme_TPM' is the isoform percentage
+        calculated from 'pme_TPM' values.
+
+        'TPM_ci_lower_bound', 'TPM_ci_upper_bound', 'FPKM_ci_lower_bound'
+        and 'FPKM_ci_upper_bound' are lower(l) and upper(u) bounds of 95%
+        credibility intervals for TPM and FPKM values. The bounds are
+        inclusive (i.e. [l, u]).
+
+        'TPM_coefficient_of_quartile_variation' and
+        'FPKM_coefficient_of_quartile_variation' are coefficients of
+        quartile variation (CQV) for TPM and FPKM values. CQV is a robust
+        way of measuring the ratio between the standard deviation and the
+        mean. It is defined as
+
+        CQV := (Q3 - Q1) / (Q3 + Q1),
+
+        where Q1 and Q3 are the first and third quartiles.
+
+    sample_name.genes.results
+        File containing gene level expression estimates. The first line
+        contains column names separated by the tab character. The format of
+        each line in the rest of this file is:
+
+        gene_id transcript_id(s) length effective_length expected_count TPM
+        FPKM [posterior_mean_count posterior_standard_deviation_of_count
+        pme_TPM pme_FPKM TPM_ci_lower_bound TPM_ci_upper_bound
+        TPM_coefficient_of_quartile_variation FPKM_ci_lower_bound
+        FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation]
+
+        Fields are separated by the tab character. Fields within "[]" are
+        optional. They will not be presented if neither '--calc-pme' nor
+        '--calc-ci' is set.
+
+        'transcript_id(s)' is a comma-separated list of transcript_ids
+        belonging to this gene. If no gene information is provided,
+        'gene_id' and 'transcript_id(s)' are identical (the
+        'transcript_id').
+
+        A gene's 'length' and 'effective_length' are defined as the weighted
+        average of its transcripts' lengths and effective lengths (weighted
+        by 'IsoPct'). A gene's abundance estimates are just the sum of its
+        transcripts' abundance estimates.
+
+    sample_name.alleles.results
+        Only generated when the RSEM references are built with
+        allele-specific transcripts.
+
+        This file contains allele level expression estimates for
+        allele-specific expression calculation. The first line contains
+        column names separated by the tab character. The format of each line
+        in the rest of this file is:
+
+        allele_id transcript_id gene_id length effective_length
+        expected_count TPM FPKM AlleleIsoPct AlleleGenePct
+        [posterior_mean_count posterior_standard_deviation_of_count pme_TPM
+        pme_FPKM AlleleIsoPct_from_pme_TPM AlleleGenePct_from_pme_TPM
+        TPM_ci_lower_bound TPM_ci_upper_bound
+        TPM_coefficient_of_quartile_variation FPKM_ci_lower_bound
+        FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation]
+
+        Fields are separated by the tab character. Fields within "[]" are
+        optional. They will not be presented if neither '--calc-pme' nor
+        '--calc-ci' is set.
+
+        'allele_id' is the allele-specific name of this allele-specific
+        transcript.
+
+        'AlleleIsoPct' stands for allele-specific percentage on isoform
+        level. It is the percentage of this allele-specific transcript's
+        abundance over its parent transcript's abundance. If its parent
+        transcript has only one allele variant form, this field will be set
+        to 100.
+
+        'AlleleGenePct' stands for allele-specific percentage on gene level.
+        It is the percentage of this allele-specific transcript's abundance
+        over its parent gene's abundance.
+
+        'AlleleIsoPct_from_pme_TPM' and 'AlleleGenePct_from_pme_TPM' have
+        similar meanings. They are calculated based on posterior mean
+        estimates.
+
+        Please note that if this file is present, the fields 'length' and
+        'effective_length' in 'sample_name.isoforms.results' should be
+        interpreted similarly as the corresponding definitions in
+        'sample_name.genes.results'.
+
+    sample_name.transcript.bam
+        Only generated when --no-bam-output is not specified.
+
+        'sample_name.transcript.bam' is a BAM-formatted file of read
+        alignments in transcript coordinates. The MAPQ field of each
+        alignment is set to min(100, floor(-10 * log10(1.0 - w) + 0.5)),
+        where w is the posterior probability of that alignment being the
+        true mapping of a read. In addition, RSEM pads a new tag ZW:f:value,
+        where value is a single precision floating number representing the
+        posterior probability. Because this file contains all alignment
+        lines produced by bowtie or user-specified aligners, it can also be
+        used as a replacement of the aligner generated BAM/SAM file.
+
+    sample_name.transcript.sorted.bam and
+    sample_name.transcript.sorted.bam.bai
+        Only generated when --no-bam-output is not specified and
+        --sort-bam-by-coordinate is specified.
+
+        'sample_name.transcript.sorted.bam' and
+        'sample_name.transcript.sorted.bam.bai' are the sorted BAM file and
+        indices generated by samtools (included in RSEM package).
+
+    sample_name.genome.bam
+        Only generated when --no-bam-output is not specified and
+        --output-genome-bam is specified.
+
+        'sample_name.genome.bam' is a BAM-formatted file of read alignments
+        in genomic coordinates. Alignments of reads that have identical
+        genomic coordinates (i.e., alignments to different isoforms that
+        share the same genomic region) are collapsed into one alignment. The
+        MAPQ field of each alignment is set to min(100, floor(-10 *
+        log10(1.0 - w) + 0.5)), where w is the posterior probability of that
+        alignment being the true mapping of a read. In addition, RSEM pads a
+        new tag ZW:f:value, where value is a single precision floating
+        number representing the posterior probability. If an alignment is
+        spliced, a XS:A:value tag is also added, where value is either '+'
+        or '-' indicating the strand of the transcript it aligns to.
+
+    sample_name.genome.sorted.bam and sample_name.genome.sorted.bam.bai
+        Only generated when --no-bam-output is not specified, and
+        --sort-bam-by-coordinate and --output-genome-bam are specified.
+
+        'sample_name.genome.sorted.bam' and
+        'sample_name.genome.sorted.bam.bai' are the sorted BAM file and
+        indices generated by samtools (included in RSEM package).
+
+    sample_name.time
+        Only generated when --time is specified.
+
+        It contains time (in seconds) consumed by aligning reads, estimating
+        expression levels and calculating credibility intervals.
+
+    sample_name.log
+        Only generated when --alignments is not specified.
+
+        It captures alignment statistics outputted from the user-specified
+        aligner.
+
+    sample_name.stat
+        This is a folder instead of a file. All model related statistics are
+        stored in this folder. Use 'rsem-plot-model' can generate plots
+        using this folder.
+
+        'sample_name.stat/sample_name.cnt' contains alignment statistics.
+        The format and meanings of each field are described in
+        'cnt_file_description.txt' under RSEM directory.
+
+        'sample_name.stat/sample_name.model' stores RNA-Seq model parameters
+        learned from the data. The format and meanings of each filed of this
+        file are described in 'model_file_description.txt' under RSEM
+        directory.
+
+        The following four output files will be generated only by
+        prior-enhanced RSEM
+
+        - 'sample_name.stat/sample_name_prsem.all_tr_features'
+          It stores isofrom features for deriving and assigning pRSEM prior.
+          The first line is a header and the rest is one isoform per line.
+          The description for each column is:
+
+          * trid: transcript ID from input annotation
+
+          * geneid: gene ID from input anntation
+
+          * chrom: isoform's chromosome name
+
+          * strand: isoform's strand name
+
+          * start: isoform's end with the lowest genomic loci
+
+          * end: isoform's end with the highest genomic loci
+
+          * tss_mpp: average mappability of [TSS-500bp, TSS+500bp], where
+            TSS is isoform's transcription start site, i.e. 5'-end
+
+          * body_mpp: average mappability of (TSS+500bp, TES-500bp), where
+            TES is isoform's transcription end site, i.e. 3'-end
+
+          * tes_mpp: average mappability of [TES-500bp, TES+500bp]
+
+          * pme_count: isoform's fragment or read count from RSEM's
+            posterior mean estimates
+
+          * tss: isoform's TSS loci
+
+          * tss_pk: equal to 1 if isoform's [TSS-500bp, TSS+500bp] region
+            overlaps with a RNA Pol II peak; 0 otherwise
+
+          * is_training: equal to 1 if isoform is in the training set where
+            Pol II prior is learned; 0 otherwise
+
+        - 'sample_name.stat/sample_name_prsem.all_tr_prior'
+          It stores prior parameters for every isoform. This file does not
+          have a header. Each line contains a prior parameter and an
+          isoform's transcript ID delimited by ` # `.
+
+        - 'sample_name.stat/sample_name_uniform_prior_1.isoforms.results'
+          RSEM's posterior mean estimates on the isoform level with an
+          initial pseudo-count of one for every isoform. It is in the same
+          format as the 'sample_name.isoforms.results'.
+
+        - 'sample_name.stat/sample_name_uniform_prior_1.genes.results'
+          RSEM's posterior mean estimates on the gene level with an initial
+          pseudo-count of one for every isoform. It is in the same format as
+          the 'sample_name.genes.results'.
+
+        When learning prior from multiple external data sets in
+        prior-enhanced RSEM, two additional output files will be generated.
+
+        - 'sample_name.stat/sample_name.pval_LL'
+          It stores a p-value and a log-likelihood. The p-value indicates
+          whether the combination of multiple complementary data sets is
+          informative for RNA-seq quantification. The log-likelihood shows
+          how well pRSEM's Dirichlet-multinomial model fits the read counts
+          of partitioned training set isoforms.
+
+        - 'sample_name.stat/sample_name.lgt_mdl.RData'
+          It stores an R object named 'glmmdl', which is a logistic
+          regression model on the training set isoforms and multiple
+          external data sets.
+
+        In addition, extra columns will be added to
+        'sample_name.stat/all_tr_features'
+
+        * is_expr: equal to 1 if isoform has an abundance >= 1 TPM and a
+          non-zero read count from RSEM's posterior mean estimates; 0
+          otherwise
+
+        * "$external_data_set_basename": log10 of external data's signal at
+          [TSS-500, TSS+500]. Signal is the number of reads aligned within
+          that interval and normalized to RPKM by read depth and interval
+          length. It will be set to -4 if no read aligned to that interval.
+
+          There are multiple columns like this one, where each represents an
+          external data set.
+
+        * prd_expr_prob: predicted probability from logistic regression
+          model on whether this isoform is expressed or not. A probability
+          higher than 0.5 is considered as expressed
+
+        * partition: group index, to which this isoforms is partitioned
+
+        * prior: prior parameter for this isoform
+
+EXAMPLES
+    Assume the path to the bowtie executables is in the user's PATH
+    environment variable. Reference files are under '/ref' with name
+    'mouse_125'.
+
+    1) '/data/mmliver.fq', single-end reads with quality scores. Quality
+    scores are encoded as for 'GA pipeline version >= 1.3'. We want to use 8
+    threads and generate a genome BAM file. In addition, we want to append
+    gene/transcript names to the result files:
+
+     rsem-calculate-expression --phred64-quals \
+                               -p 8 \
+                               --append-names \
+                               --output-genome-bam \
+                               /data/mmliver.fq \
+                               /ref/mouse_125 \
+                               mmliver_single_quals
+
+    2) '/data/mmliver_1.fq' and '/data/mmliver_2.fq', stranded paired-end
+    reads with quality scores. Suppose the library is prepared using TruSeq
+    Stranded Kit, which means the first mate should map to the reverse
+    strand. Quality scores are in SANGER format. We want to use 8 threads
+    and do not generate a genome BAM file:
+
+     rsem-calculate-expression -p 8 \
+                               --paired-end \
+                               --strandedness reverse \
+                               /data/mmliver_1.fq \
+                               /data/mmliver_2.fq \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
+    3) '/data/mmliver.fa', single-end reads without quality scores. We want
+    to use 8 threads:
+
+     rsem-calculate-expression -p 8 \
+                               --no-qualities \
+                               /data/mmliver.fa \
+                               /ref/mouse_125 \
+                               mmliver_single_without_quals
+
+    4) Data are the same as 1). This time we assume the bowtie executables
+    are under '/sw/bowtie'. We want to take a fragment length distribution
+    into consideration. We set the fragment length mean to 150 and the
+    standard deviation to 35. In addition to a BAM file, we also want to
+    generate credibility intervals. We allow RSEM to use 1GB of memory for
+    CI calculation:
+
+     rsem-calculate-expression --bowtie-path /sw/bowtie \
+                               --phred64-quals \
+                               --fragment-length-mean 150.0 \
+                               --fragment-length-sd 35.0 \
+                               -p 8 \
+                               --output-genome-bam \
+                               --calc-ci \
+                               --ci-memory 1024 \
+                               /data/mmliver.fq \
+                               /ref/mouse_125 \
+                               mmliver_single_quals
+
+    5) '/data/mmliver_paired_end_quals.bam', BAM-formatted alignments for
+    paired-end reads with quality scores. We want to use 8 threads:
+
+     rsem-calculate-expression --paired-end \
+                               --alignments \
+                               -p 8 \
+                               /data/mmliver_paired_end_quals.bam \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
+    6) '/data/mmliver_1.fq.gz' and '/data/mmliver_2.fq.gz', paired-end reads
+    with quality scores and read files are compressed by gzip. We want to
+    use STAR to aligned reads and assume STAR executable is '/sw/STAR'.
+    Suppose we want to use 8 threads and do not generate a genome BAM file:
+
+     rsem-calculate-expression --paired-end \
+                               --star \
+                               --star-path /sw/STAR \
+                               --gzipped-read-file \
+                               --paired-end \
+                               -p 8 \
+                               /data/mmliver_1.fq.gz \
+                               /data/mmliver_2.fq.gz \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
+    7) In the above example, suppose we want to run prior-enhanced RSEM
+    instead. Assuming we want to learn priors from a ChIP-seq peak file
+    '/data/mmlive.narrowPeak.gz':
+
+     rsem-calculate-expression --star \
+                               --star-path /sw/STAR \
+                               --gzipped-read-file \
+                               --paired-end \
+                               --calc-pme \
+                               --run-pRSEM \
+                               --chipseq-peak-file /data/mmliver.narrowPeak.gz \
+                               -p 8 \
+                               /data/mmliver_1.fq.gz \
+                               /data/mmliver_2.fq.gz \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
+    8) Similar to the example in 7), suppose we want to use the partition
+    model 'pk_lm2nopk' (partitioning isoforms by Pol II TSS peak first and
+    then partitioning 'no TSS peak' isoforms into two bins by a linear
+    regression model), and we want to partition isoforms by RNA Pol II's
+    ChIP-seq read files '/data/mmliver_PolIIRep1.fq.gz' and
+    '/data/mmliver_PolIIRep2.fq.gz', and the control ChIP-seq read files
+    '/data/mmliver_ChIPseqCtrl.fq.gz'. Also, assuming Bowtie's executables
+    are under '/sw/bowtie/':
+
+     rsem-calculate-expression --star \
+                               --star-path /sw/STAR \
+                               --gzipped-read-file \
+                               --paired-end \
+                               --calc-pme \
+                               --run-pRSEM \
+                               --chipseq-target-read-files /data/mmliver_PolIIRep1.fq.gz,/data/mmliver_PolIIRep2.fq.gz \
+                               --chipseq-control-read-files /data/mmliver_ChIPseqCtrl.fq.gz \
+                               --partition-model pk_lm2nopk \
+                               --bowtie-path /sw/bowtie \
+                               -p 8 \
+                               /data/mmliver_1.fq.gz \
+                               /data/mmliver_2.fq.gz \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
+    9) Similar to the example in 8), suppose we want to derive prior from
+    four histone modification ChIP-seq read data sets:
+    '/data/H3K27Ac.fastq.gz', '/data/H3K4me1.fastq.gz',
+    '/data/H3K4me2.fastq.gz', and '/data/H3K4me3.fastq.gz'. Also, assuming
+    Bowtie's executables are under '/sw/bowtie/':
+
+     rsem-calculate-expression --star \
+                               --star-path /sw/STAR \
+                               --gzipped-read-file \
+                               --paired-end \
+                               --calc-pme \
+                               --run-pRSEM \
+                               --partition-model cmb_lgt \
+                               --chipseq-read-files-multi-targets /data/H3K27Ac.fastq.gz,/data/H3K4me1.fastq.gz,/data/H3K4me2.fastq.gz,/data/H3K4me3.fastq.gz \
+                               --bowtie-path /sw/bowtie \
+                               -p 8 \
+                               /data/mmliver_1.fq.gz \
+                               /data/mmliver_2.fq.gz \
+                               /ref/mouse_125 \
+                               mmliver_paired_end_quals
+
diff --git a/src/rsem/rsem_calculate_expression/script.sh b/src/rsem/rsem_calculate_expression/script.sh
new file mode 100644
index 00000000..e8c6ce5d
--- /dev/null
+++ b/src/rsem/rsem_calculate_expression/script.sh
@@ -0,0 +1,103 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+function clean_up {
+    rm -rf "$tmpdir"
+}
+trap clean_up EXIT
+
+tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
+
+if [ "$par_strandedness" == 'forward' ]; then
+    strandedness='--strandedness forward'
+elif [ "$par_strandedness" == 'reverse' ]; then
+    strandedness="--strandedness reverse"
+else
+    strandedness=''
+fi
+
+IFS=";" read -ra input <<< $par_input
+
+INDEX=$(find -L $meta_resources_dir/$par_index -name "*.grp" | sed 's/\.grp$//')
+
+unset_if_false=( par_paired par_quiet par_no_bam_output par_sampling_for_bam par_no_qualities 
+                 par_alignments par_bowtie2 par_star par_hisat2_hca par_append_names 
+                 par_single_cell_prior par_calc_pme par_calc_ci par_phred64_quals 
+                 par_solexa_quals par_star_gzipped_read_file par_star_bzipped_read_file 
+                 par_star_output_genome_bam par_estimate_rspd par_keep_intermediate_files 
+                 par_time par_run_pRSEM par_cap_stacked_chipseq_reads par_sort_bam_by_read_name )
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+rsem-calculate-expression \
+    ${par_quiet:+-q} \
+    ${par_no_bam_output:+--no-bam-output} \
+    ${par_sampling_for_bam:+--sampling-for-bam} \
+    ${par_no_qualities:+--no-qualities} \
+    ${par_alignments:+--alignments} \
+    ${par_bowtie2:+--bowtie2} \
+    ${par_star:+--star} \
+    ${par_hisat2_hca:+--hisat2-hca} \
+    ${par_append_names:+--append-names} \
+    ${par_single_cell_prior:+--single-cell-prior} \
+    ${par_calc_pme:+--calc-pme} \
+    ${par_calc_ci:+--calc-ci} \
+    ${par_phred64_quals:+--phred64-quals} \
+    ${par_solexa_quals:+--solexa-quals} \
+    ${par_star_gzipped_read_file:+--star-gzipped-read-file} \
+    ${par_star_bzipped_read_file:+--star-bzipped-read-file} \
+    ${par_star_output_genome_bam:+--star-output-genome-bam} \
+    ${par_estimate_rspd:+--estimate-rspd} \
+    ${par_keep_intermediate_files:+--keep-intermediate-files} \
+    ${par_time:+--time} \
+    ${par_run_pRSEM:+--run-pRSEM} \
+    ${par_cap_stacked_chipseq_reads:+--cap-stacked-chipseq-reads} \
+    ${par_sort_bam_by_read_name:+--sort-bam-by-read-name} \
+    ${par_counts_gene:+--counts-gene "$par_counts_gene"} \
+    ${par_counts_transcripts:+--counts-transcripts "$par_counts_transcripts"} \
+    ${par_stat:+--stat "$par_stat"} \
+    ${par_bam_star:+--bam-star "$par_bam_star"} \
+    ${par_bam_genome:+--bam-genome "$par_bam_genome"} \
+    ${par_bam_transcript:+--bam-transcript "$par_bam_transcript"} \
+    ${par_fai:+--fai "$par_fai"} \
+    ${par_seed:+--seed "$par_seed"} \
+    ${par_seed_length:+--seed-length "$par_seed_length"} \
+    ${par_bowtie_n:+--bowtie-n "$par_bowtie_n"} \
+    ${par_bowtie_e:+--bowtie-e "$par_bowtie_e"} \
+    ${par_bowtie_m:+--bowtie-m "$par_bowtie_m"} \
+    ${par_bowtie_chunkmbs:+--bowtie-chunkmbs "$par_bowtie_chunkmbs"} \
+    ${par_bowtie2_mismatch_rate:+--bowtie2-mismatch-rate "$par_bowtie2_mismatch_rate"} \
+    ${par_bowtie2_k:+--bowtie2-k "$par_bowtie2_k"} \
+    ${par_bowtie2_sensitivity_level:+--bowtie2-sensitivity-level "$par_bowtie2_sensitivity_level"} \
+    ${par_tag:+--tag "$par_tag"} \
+    ${par_fragment_length_min:+--fragment-length-min "$par_fragment_length_min"} \
+    ${par_fragment_length_max:+--fragment-length-max "$par_fragment_length_max"} \
+    ${par_fragment_length_mean:+--fragment-length-mean "$par_fragment_length_mean"} \
+    ${par_fragment_length_sd:+--fragment-length-sd "$par_fragment_length_sd"} \
+    ${par_num_rspd_bins:+--num-rspd-bins "$par_num_rspd_bins"} \
+    ${par_gibbs_burnin:+--gibbs-burnin "$par_gibbs_burnin"} \
+    ${par_gibbs_number_of_samples:+--gibbs-number-of-samples "$par_gibbs_number_of_samples"} \
+    ${par_gibbs_sampling_gap:+--gibbs-sampling-gap "$par_gibbs_sampling_gap"} \
+    ${par_ci_credibility_level:+--ci-credibility-level "$par_ci_credibility_level"} \
+    ${par_ci_number_of_samples_per_count_vector:+--ci-number-of-samples-per-count-vector "$par_ci_number_of_samples_per_count_vector"} \
+    ${par_temporary_folder:+--temporary-folder "$par_temporary_folder"} \
+    ${par_chipseq_peak_file:+--chipseq-peak-file "$par_chipseq_peak_file"} \
+    ${par_chipseq_target_read_files:+--chipseq-target-read-files "$par_chipseq_target_read_files"} \
+    ${par_chipseq_control_read_files:+--chipseq-control-read-files "$par_chipseq_control_read_files"} \
+    ${par_chipseq_read_files_multi_targets:+--chipseq-read-files-multi-targets "$par_chipseq_read_files_multi_targets"} \
+    ${par_chipseq_bed_files_multi_targets:+--chipseq-bed-files-multi-targets "$par_chipseq_bed_files_multi_targets"} \
+    ${par_n_max_stacked_chipseq_reads:+--n-max-stacked-chipseq-reads "$par_n_max_stacked_chipseq_reads"} \
+    ${par_partition_model:+--partition-model "$par_partition_model"} \
+    $strandedness \
+    ${par_paired:+--paired-end} \
+    ${input[*]} \
+    $INDEX \
+    $par_id
+   
diff --git a/src/rsem/rsem_calculate_expression/test.sh b/src/rsem/rsem_calculate_expression/test.sh
new file mode 100644
index 00000000..c9ede884
--- /dev/null
+++ b/src/rsem/rsem_calculate_expression/test.sh
@@ -0,0 +1,116 @@
+#!/bin/bash
+
+echo ">>> Testing $meta_executable"
+
+test_dir="${meta_resources_dir}/test_data"
+
+# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/rsem.tar.gz
+# gunzip -k rsem.tar.gz
+# tar -xf rsem.tar
+# mv $test_dir/rsem $meta_resources_dir
+
+echo "> Prepare test data"
+
+cat > reads_R1.fastq <<'EOF'
+@SEQ_ID1
+ACGCTGCCTCATAAGCCTCACACAT
++
+IIIIIIIIIIIIIIIIIIIIIIIII
+@SEQ_ID2
+ACCCGCAAGATTAGGCTCCGTACAC
++
+!!!!!!!!!!!!!!!!!!!!!!!!!
+EOF
+
+cat > reads_R2.fastq <<'EOF'
+@SEQ_ID1
+ATGTGTGAGGCTTATGAGGCAGCGT
++
+IIIIIIIIIIIIIIIIIIIIIIIII
+@SEQ_ID2
+GTGTACGGAGCCTAATCTTGCAGGG
++
+!!!!!!!!!!!!!!!!!!!!!!!!!
+EOF
+
+cat > genome.fasta <<'EOF'
+>chr1
+TGGCATGAGCCAACGAACGCTGCCTCATAAGCCTCACACATCCGCGCCTATGTTGTGACTCTCTGTGAGCGTTCGTGGG
+GCTCGTCACCACTATGGTTGGCCGGTTAGTAGTGTGACTCCTGGTTTTCTGGAGCTTCTTTAAACCGTAGTCCAGTCAA
+TGCGAATGGCACTTCACGACGGACTGTCCTTAGGTGTGAGGCTTATGAGGCACTCAGGGGA
+EOF
+
+cat > genes.gtf <<'EOF'
+chr1	example_source	gene	0	50	.	+	.	gene_id "gene1"; transcript_id "transcript1";
+chr1	example_source	exon	20	40	.	+	.	gene_id "gene1"; transcript_id "transcript1";
+chr1	example_source	gene	100	219	.	+	.	gene_id "gene2"; transcript_id "transcript2";
+chr1	example_source	exon	191	210	.	+	.	gene_id "gene2"; transcript_id "transcript2";
+EOF
+
+cat > ref.cnt <<'EOF'
+1 0 0 1
+0 0 0
+0 3
+0	1
+Inf	0
+EOF
+
+cat > ref.genes.results <<'EOF'
+gene_id	transcript_id(s)	length	effective_length	expected_count	TPM	FPKM
+gene1	transcript1	21.00	21.00	0.00	0.00	0.00
+gene2	transcript2	20.00	20.00	0.00	0.00	0.00
+EOF
+
+cat > ref.isoforms.results <<'EOF'
+transcript_id	gene_id	length	effective_length	expected_count	TPM	FPKM	IsoPct
+transcript1	gene1	21	21.00	0.00	0.00	0.00	0.00
+transcript2	gene2	20	20.00	0.00	0.00	0.00	0.00
+EOF
+
+
+echo "> Generate index"
+
+rsem-prepare-reference \
+  --gtf "genes.gtf" \
+  "genome.fasta" \
+  "index"
+
+mkdir index
+mv index.* index/
+  
+STAR \
+  ${meta_cpus:+--runThreadN $meta_cpus} \
+  --runMode genomeGenerate \
+  --genomeDir "index/" \
+  --genomeFastaFiles "genome.fasta" \
+  --sjdbGTFfile "genes.gtf" \
+  --genomeSAindexNbases 2
+  
+#########################################################################################
+
+echo ">>> Test 1: Paired-end reads using STAR to align reads"
+"$meta_executable" \
+	--star \
+	--paired \
+	--input "reads_R1.fastq;reads_R2.fastq" \
+	--index index \
+	--id test \
+	--seed 1 \
+	--quiet
+
+echo ">>> Checking whether output exists"
+[ ! -f "test.genes.results" ] && echo "Gene level expression counts file does not exist!" && exit 1
+[ ! -s "test.genes.results" ] && echo "Gene level expression counts file is empty!" && exit 1
+[ ! -f "test.isoforms.results" ] && echo "Transcript level expression counts file does not exist!" && exit 1
+[ ! -s "test.isoforms.results" ] && echo "Transcript level expression counts file is empty!" && exit 1
+[ ! -d "test.stat" ] && echo "Stats file does not exist!" && exit 1
+
+echo ">>> Check wheter output is correct"
+diff ref.genes.results test.genes.results || { echo "Gene level expression counts file is incorrect!"; exit 1; }
+diff ref.isoforms.results test.isoforms.results || { echo "Transcript level expression counts file is incorrect!"; exit 1; }
+diff ref.cnt test.stat/test.cnt || { echo "Stats file is incorrect!"; exit 1; }
+
+#####################################################################################################
+
+echo "All tests succeeded!"
+exit 0

From bc9cc0a6ce4e0b87a4ce47561b4812b449e101ca Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Thu, 19 Sep 2024 05:48:45 +0200
Subject: [PATCH 24/42] Kallisto quant (#152)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* complete component

* Update changelog

* add help.txt

* apply suggested changes (changelog, config)
---
 CHANGELOG.md                                  |  16 ++-
 src/kallisto/kallisto_quant/config.vsh.yaml   | 105 ++++++++++++++++++
 src/kallisto/kallisto_quant/help.txt          |  33 ++++++
 src/kallisto/kallisto_quant/script.sh         |  46 ++++++++
 src/kallisto/kallisto_quant/test.sh           |  53 +++++++++
 .../kallisto_quant/test_data/abundance_1.tsv  |   2 +
 .../kallisto_quant/test_data/abundance_2.tsv  |   2 +
 .../test_data/index/transcriptome.idx         | Bin 0 -> 1583 bytes
 .../kallisto_quant/test_data/reads/A_R1.fastq |   4 +
 .../kallisto_quant/test_data/reads/A_R2.fastq |   4 +
 .../kallisto_quant/test_data/script.sh        |  11 ++
 11 files changed, 270 insertions(+), 6 deletions(-)
 create mode 100644 src/kallisto/kallisto_quant/config.vsh.yaml
 create mode 100644 src/kallisto/kallisto_quant/help.txt
 create mode 100644 src/kallisto/kallisto_quant/script.sh
 create mode 100644 src/kallisto/kallisto_quant/test.sh
 create mode 100644 src/kallisto/kallisto_quant/test_data/abundance_1.tsv
 create mode 100644 src/kallisto/kallisto_quant/test_data/abundance_2.tsv
 create mode 100644 src/kallisto/kallisto_quant/test_data/index/transcriptome.idx
 create mode 100644 src/kallisto/kallisto_quant/test_data/reads/A_R1.fastq
 create mode 100644 src/kallisto/kallisto_quant/test_data/reads/A_R2.fastq
 create mode 100755 src/kallisto/kallisto_quant/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9bfb5606..5d380e54 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -65,6 +65,16 @@
 
 * `fastqc`: High throughput sequence quality control analysis tool (PR #92).
 
+* `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from
+  metatranscriptomic data (PR #146).
+
+* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
+
+* `kallisto`:
+    - `kallisto_index`: Create a kallisto index (PR #149).
+    - `kallisto_quant`: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads (PR #152).
+
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
@@ -161,13 +171,7 @@
     - `bedtools_getfasta`: extract sequences from a FASTA file for each of the
                            intervals defined in a BED/GFF/VCF file (PR #59).
 
-* `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic 
-               data. (PR #146)
-
-* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
 
-* `kallisto`:
-    - `kallisto_index`: Create a kallisto index (PR #149).
 
 
 ## MINOR CHANGES
diff --git a/src/kallisto/kallisto_quant/config.vsh.yaml b/src/kallisto/kallisto_quant/config.vsh.yaml
new file mode 100644
index 00000000..e92ac6b3
--- /dev/null
+++ b/src/kallisto/kallisto_quant/config.vsh.yaml
@@ -0,0 +1,105 @@
+name: kallisto_quant
+namespace: kallisto
+description: |
+  Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
+keywords: [kallisto, quant, pseudoalignment]
+links:
+  homepage: https://pachterlab.github.io/kallisto/about
+  documentation: https://pachterlab.github.io/kallisto/manual
+  repository: https://github.com/pachterlab/kallisto
+  issue_tracker: https://github.com/pachterlab/kallisto/issues
+references: 
+  doi: 10.1038/nbt.3519
+license: BSD 2-Clause License
+
+argument_groups:
+- name: "Input"
+  arguments:
+  - name: "--input"
+    type: file
+    description: List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
+    direction: "input"
+    multiple: true
+    required: true
+  - name: "--index"
+    alternatives: ["-i"]
+    type: file
+    description: Kallisto genome index.
+    must_exist: true
+    required: true
+
+- name: "Output"
+  arguments:
+  - name: "--output_dir"
+    alternatives: ["-o"]
+    type: string
+    description: Directory to write output to.
+    required: true
+
+- name: "Options"
+  arguments:
+  - name: "--single"
+    type: boolean_true
+    description: Single end mode.
+  - name: "--single_overhang"
+    type: boolean_true
+    description: Include reads where unobserved rest of fragment is predicted to lie outside a transcript.
+  - name: "--fr_stranded"
+    type: boolean_true
+    description: Strand specific reads, first read forward.
+  - name: "--rf_stranded"
+    type: boolean_true
+    description: Strand specific reads, first read reverse.
+  - name: "--fragment_length"
+    alternatives: ["-l"]
+    type: double
+    description: The estimated average fragment length.
+  - name: "--sd"
+    alternatives: ["-s"]
+    type: double
+    description: |
+      The estimated standard deviation of the fragment length (default: -l, -s values are estimated 
+      from paired end data, but are required when using --single).
+  - name: "--plaintext"
+    type: boolean_true
+    description: Output plaintext instead of HDF5.
+  - name: "--bootstrap_samples"
+    alternatives: ["-b"]
+    type: integer
+    description: |
+      Number of bootstrap samples to draw. Default: '0'
+    example: 0
+  - name: "--seed"
+    type: integer
+    description: |
+      Random seed for bootstrap. Default: '42'
+    example: 42
+
+
+resources:
+- type: bash_script
+  path: script.sh
+
+test_resources:
+- type: bash_script
+  path: test.sh
+- type: file
+  path: test_data
+
+engines:
+  - type: docker
+    image: ubuntu:22.04
+    setup:
+      - type: docker
+        run: |
+          apt-get update && \
+          apt-get install -y --no-install-recommends wget && \
+          wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
+          tar -xzf kallisto_linux-v0.50.1.tar.gz && \
+          mv kallisto/kallisto /usr/local/bin/
+      - type: docker
+        run: |
+          echo "kallisto: $(kallisto version | sed 's/kallisto, version //')" > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow  
diff --git a/src/kallisto/kallisto_quant/help.txt b/src/kallisto/kallisto_quant/help.txt
new file mode 100644
index 00000000..7022571b
--- /dev/null
+++ b/src/kallisto/kallisto_quant/help.txt
@@ -0,0 +1,33 @@
+```
+kallisto quant
+```
+
+kallisto 0.50.1
+Computes equivalence classes for reads and quantifies abundances
+
+Usage: kallisto quant [arguments] FASTQ-files
+
+Required arguments:
+-i, --index=STRING            Filename for the kallisto index to be used for
+                              quantification
+-o, --output-dir=STRING       Directory to write output to
+
+Optional arguments:
+-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
+    --seed=INT                Seed for the bootstrap sampling (default: 42)
+    --plaintext               Output plaintext instead of HDF5
+    --single                  Quantify single-end reads
+    --single-overhang         Include reads where unobserved rest of fragment is
+                              predicted to lie outside a transcript
+    --fr-stranded             Strand specific reads, first read forward
+    --rf-stranded             Strand specific reads, first read reverse
+-l, --fragment-length=DOUBLE  Estimated average fragment length
+-s, --sd=DOUBLE               Estimated standard deviation of fragment length
+                              (default: -l, -s values are estimated from paired
+                               end data, but are required when using --single)
+-p, --priors                  Priors for the EM algorithm, either as raw counts or as
+                              probabilities. Pseudocounts are added to raw reads to
+                              prevent zero valued priors. Supplied in the same order
+                              as the transcripts in the transcriptome
+-t, --threads=INT             Number of threads to use (default: 1)
+    --verbose                 Print out progress information every 1M proccessed reads
\ No newline at end of file
diff --git a/src/kallisto/kallisto_quant/script.sh b/src/kallisto/kallisto_quant/script.sh
new file mode 100644
index 00000000..a7105cd1
--- /dev/null
+++ b/src/kallisto/kallisto_quant/script.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+unset_if_false=( par_single par_single_overhang par_rf_stranded par_fr_stranded par_plaintext )
+
+for var in "${unset_if_false[@]}"; do
+    temp_var="${!var}"
+    [[ "$temp_var" == "false" ]] && unset $var
+done
+
+IFS=";" read -ra input <<< $par_input
+
+# Check if par_single is not set and ensure even number of input files
+if [ -z "$par_single" ]; then
+    if [ $((${#input[@]} % 2)) -ne 0 ]; then
+        echo "Error: When running in paired-end mode, the number of input files must be even."
+        echo "Number of input files provided: ${#input[@]}"
+        exit 1
+    fi
+fi
+
+
+mkdir -p $par_output_dir
+
+
+kallisto quant \
+    ${meta_cpus:+--threads $meta_cpus} \
+    -i $par_index \
+    ${par_gtf:+--gtf "${par_gtf}"} \
+    ${par_single:+--single} \
+    ${par_single_overhang:+--single-overhang} \
+    ${par_fr_stranded:+--fr-stranded} \
+    ${par_rf_stranded:+--rf-stranded} \
+    ${par_plaintext:+--plaintext} \
+    ${par_bootstrap_samples:+--bootstrap-samples "${par_bootstrap_samples}"} \
+    ${par_fragment_length:+--fragment-length "${par_fragment_length}"} \
+    ${par_sd:+--sd "${par_sd}"} \
+    ${par_seed:+--seed "${par_seed}"} \
+    -o $par_output_dir \
+    ${input[*]}
+
+
diff --git a/src/kallisto/kallisto_quant/test.sh b/src/kallisto/kallisto_quant/test.sh
new file mode 100644
index 00000000..28e2e3ad
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+
+echo ">>> Testing $meta_functionality_name"
+
+echo ">>> Test 1: Testing for paired-end reads"
+"$meta_executable" \
+  --index "$meta_resources_dir/test_data/index/transcriptome.idx" \
+  --rf_stranded \
+  --output_dir . \
+  --input "$meta_resources_dir/test_data/reads/A_R1.fastq;$meta_resources_dir/test_data/reads/A_R2.fastq"
+
+echo ">>> Checking whether output exists"
+[ ! -f "run_info.json" ] && echo "run_info.json does not exist!" && exit 1
+[ ! -s "run_info.json" ] && echo "run_info.json is empty!" && exit 1
+[ ! -f "abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
+[ ! -s "abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
+[ ! -f "abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
+[ ! -s "abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
+
+echo ">>> Checking if output is correct"
+diff "abundance.tsv" "$meta_resources_dir/test_data/abundance_1.tsv" || { echo "abundance.tsv is not correct"; exit 1; }
+
+rm -rf abundance.tsv abundance.h5 run_info.json
+
+################################################################################
+
+echo ">>> Test 2: Testing for single-end reads"
+"$meta_executable" \
+  --index "$meta_resources_dir/test_data/index/transcriptome.idx" \
+  --rf_stranded \
+  --output_dir . \
+  --single \
+  --input "$meta_resources_dir/test_data/reads/A_R1.fastq" \
+  --fragment_length 101 \
+  --sd 50
+
+echo ">>> Checking whether output exists"
+[ ! -f "run_info.json" ] && echo "run_info.json does not exist!" && exit 1
+[ ! -s "run_info.json" ] && echo "run_info.json is empty!" && exit 1
+[ ! -f "abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
+[ ! -s "abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
+[ ! -f "abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
+[ ! -s "abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
+
+echo ">>> Checking if output is correct"
+diff "abundance.tsv" "$meta_resources_dir/test_data/abundance_2.tsv" || { echo "abundance.tsv is not correct"; exit 1; }
+
+rm -rf abundance.tsv abundance.h5 run_info.json
+
+################################################################################
+
+echo "All tests succeeded!"
+exit 0
diff --git a/src/kallisto/kallisto_quant/test_data/abundance_1.tsv b/src/kallisto/kallisto_quant/test_data/abundance_1.tsv
new file mode 100644
index 00000000..1de99e54
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test_data/abundance_1.tsv
@@ -0,0 +1,2 @@
+target_id	length	eff_length	est_counts	tpm
+Sheila	35	36	0	-nan
diff --git a/src/kallisto/kallisto_quant/test_data/abundance_2.tsv b/src/kallisto/kallisto_quant/test_data/abundance_2.tsv
new file mode 100644
index 00000000..6b3e9055
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test_data/abundance_2.tsv
@@ -0,0 +1,2 @@
+target_id	length	eff_length	est_counts	tpm
+Sheila	35	15.0373	0	-nan
diff --git a/src/kallisto/kallisto_quant/test_data/index/transcriptome.idx b/src/kallisto/kallisto_quant/test_data/index/transcriptome.idx
new file mode 100644
index 0000000000000000000000000000000000000000..194fec14b9b858e324345e535e00764a36100db7
GIT binary patch
literal 1583
zcmd;OfPjTinh{9b$1B#!18H#}2Jt~a8A36bm2ogIJfAt+63T~DAce8EHEN$uybqZC
zfg=W{5v~BrV208#c|}Et0E`c#VftWv7=48WCIhA&B!LvnOc?C|Rl)?N*?=^%HkesZ
zX$A)<1EwA(4x?e}ahVTO2ct*TLqcLSJR#vQnjS{e1FUQS(dg*`Sq@nqrq10t#1V*{
z9o-$_AjH`nh=7E<vpG~hSUD?@!wD0B833cPN+<(WgGxvc2+l~&%t;i2a2Tu%5N;I!
Giva)u(jTM%

literal 0
HcmV?d00001

diff --git a/src/kallisto/kallisto_quant/test_data/reads/A_R1.fastq b/src/kallisto/kallisto_quant/test_data/reads/A_R1.fastq
new file mode 100644
index 00000000..999ed649
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test_data/reads/A_R1.fastq
@@ -0,0 +1,4 @@
+@1
+GCTAGCTCAGAAAAAAAAAATCGTCGCGTGCGCGT
++
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
diff --git a/src/kallisto/kallisto_quant/test_data/reads/A_R2.fastq b/src/kallisto/kallisto_quant/test_data/reads/A_R2.fastq
new file mode 100644
index 00000000..999ed649
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test_data/reads/A_R2.fastq
@@ -0,0 +1,4 @@
+@1
+GCTAGCTCAGAAAAAAAAAATCGTCGCGTGCGCGT
++
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
diff --git a/src/kallisto/kallisto_quant/test_data/script.sh b/src/kallisto/kallisto_quant/test_data/script.sh
new file mode 100755
index 00000000..6d684b29
--- /dev/null
+++ b/src/kallisto/kallisto_quant/test_data/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/snakemake-wrappers ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
+fi
+
+# copy test data
+cp -r /tmp/snakemake-wrappers/bio/kallisto/quant/test/* src/kallisto/kallisto_quant/test_data
+
+rm src/kallisto/kallisto_quant/test_data/Snakefile
\ No newline at end of file

From 63de1c1a145e4535d047cbc285724790546022a0 Mon Sep 17 00:00:00 2001
From: Sai Nirmayi Yasa <92786623+sainirmayi@users.noreply.github.com>
Date: Mon, 23 Sep 2024 14:46:08 +0200
Subject: [PATCH 25/42] Add trimgalore (#117)

* add trimgalore

* fix test

* make output arguments optional

* fix script

* fix script and update test

* update changelog

* apply code review suggestions

* separate input fastqc file arguments from other arguments

* apply suggested change

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                   |   2 +
 src/trimgalore/config.vsh.yaml | 297 +++++++++++++++++++++++++++
 src/trimgalore/help.txt        | 355 +++++++++++++++++++++++++++++++++
 src/trimgalore/script.sh       | 126 ++++++++++++
 src/trimgalore/test.sh         | 125 ++++++++++++
 5 files changed, 905 insertions(+)
 create mode 100644 src/trimgalore/config.vsh.yaml
 create mode 100644 src/trimgalore/help.txt
 create mode 100755 src/trimgalore/script.sh
 create mode 100644 src/trimgalore/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5d380e54..0613fa25 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -75,6 +75,8 @@
     - `kallisto_quant`: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads (PR #152).
 
 
+* `trimgalore`: Quality and adapter trimming for fastq files (PR #117). 
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/trimgalore/config.vsh.yaml b/src/trimgalore/config.vsh.yaml
new file mode 100644
index 00000000..ae12fb10
--- /dev/null
+++ b/src/trimgalore/config.vsh.yaml
@@ -0,0 +1,297 @@
+name: trimgalore
+description: | 
+  A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. 
+keywords: ["trimming", "adapters"]
+links:
+  homepage: https://github.com/FelixKrueger/TrimGalore
+  documentation: https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md
+  repository: https://github.com/FelixKrueger/TrimGalore
+references: 
+  doi: 10.5281/zenodo.7598955
+license: GPL-3.0 
+requirements:
+  commands: [trim_galore]
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Input
+    arguments:   
+      - name: "--input"
+        type: file
+        description: Input files. Note that paired-end files need to be supplied in a pairwise fashion, e.g. file1_1.fq file1_2.fq SRR2_1.fq.gz SRR2_2.fq.gz
+        required: true
+        multiple: true
+        example: sample1_r1.fq;sample1_r2.fq;sample2_r1.fq;sample2_r2.fq
+  - name: Trimming options
+    arguments: 
+      - name: --quality
+        alternatives: -q
+        type: integer
+        description: Trim low-quality ends (below the specified Phred score) from reads in addition to adapter removal. For RRBS samples, quality trimming will be performed first, and adapter trimming is carried in a second round. Other files are quality and adapter trimmed in a single pass. The algorithm is the same as the one used by BWA (Subtract INT from all qualities; compute partial sums from all indices to the end of the sequence; cut sequence at the index at which the sum is minimal). 
+        example: 20
+      - name: --phred33
+        type: boolean_true
+        description: Instructs Cutadapt to use ASCII+33 quality scores as Phred scores (Sanger/Illumina 1.9+ encoding) for quality trimming. 
+      - name: --phred64
+        type: boolean_true
+        description: Instructs Cutadapt to use ASCII+64 quality scores as Phred scores (Illumina 1.5 encoding) for quality trimming.
+      - name: --fastqc
+        type: boolean_true
+        description: Run FastQC in the default mode on the FastQ file once trimming is complete.
+      - name: --fastqc_args
+        type: string
+        description: Passes extra arguments (excluding files) to FastQC. If more than one argument is to be passed to FastQC they must be in the form "arg1 arg2 ...". Passing extra arguments will automatically invoke FastQC, so --fastqc does not have to be specified separately.
+        example: "--nogroup --noextract"
+      - name: --fastqc_contaminants
+        type: file
+        description: Specifies a non-default file which contains the list of contaminants for FastQC to screen overrepresented sequences against. The file must contain sets of named contaminants in the form name[tab]sequence. Lines prefixed with a hash will be ignored.
+        example: "contaminants.txt"
+      - name: --fastqc_adapters
+        type: file
+        description: Specifies a non-default file which contains the list of adapter sequences which which FasstQC will explicity search against the library. The file must contain sets of named adapters in the form name[tab]sequence.  Lines prefixed with a hash will be ignored.
+        example: "adapters.txt"
+      - name: --fastqc_limits
+        type: file
+        description: Specifies a non-default file which contains a set of criteria which FastQC will use to determine the warn/error limits for the various modules. This file can also be used to selectively remove some modules from the output all together.  The format needs to mirror the default limits.txt file found in the Configuration folder.
+        example: "limits.txt"
+      - name: --adapter
+        alternatives: -a
+        type: string
+        description: |
+          Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase or Illumina small RNA adapter sequence was used. A single base may also be given as e.g. -a A{10}, to be expanded to -a AAAAAAAAAA. 
+          At a special request, multiple adapters can also be specified like so: 
+            -a  " AGCTCCCG -a TTTCATTATAT -a TTTATTCGGATTTAT" -a2 " AGCTAGCG -a TCTCTTATAT -a TTTCGGATTTAT", 
+          or so:
+            -a "file:../multiple_adapters.fa" -a2 "file:../different_adapters.fa"
+          Potentially in conjucntion with the parameter "-n 3" to trim all adapters. 
+        example: AGCTCCCG
+      - name: --adapter2 
+        alternatives: -a2
+        type: string
+        description: Optional adapter sequence to be trimmed off read 2 of paired-end files. This option requires '--paired' to be specified as well. If the libraries to be trimmed are smallRNA then a2 will be set to the Illumina small RNA 5' adapter automatically (GATCGTCGGACT). A single base may also be given as e.g. -a2 A{10}, to be expanded to -a2 AAAAAAAAAA.
+        required: false
+        example: AGCTCCCG
+      - name: --illumina
+        type: boolean_true
+        description: Adapter sequence to be trimmed is the first 13bp of the Illumina universal adapter 'AGATCGGAAGAGC' instead of the default auto-detection of adapter sequence.
+      - name: --stranded_illumina 
+        type: boolean_true
+        description: Adapter sequence to be trimmed is the first 13bp of the Illumina stranded mRNA or Total RNA adapter 'ACTGTCTCTTATA' instead of the default auto-detection of adapter sequence. 
+      - name: --nextera
+        type: boolean_true
+        description: Adapter sequence to be trimmed is the first 12bp of the Nextera adapter 'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence.
+      - name: --small_rna 
+        type: boolean_true
+        description: Adapter sequence to be trimmed is the first 12bp of the Illumina Small RNA 3' Adapter 'TGGAATTCTCGG' instead of the default auto-detection of adapter sequence. Selecting to trim smallRNA adapters will also lower the --length value to 18bp. If the smallRNA libraries are paired-end then a automatically (GATCGTCGGACT) unless -a 2 had been defined explicitly.
+      - name: --consider_already_trimmed 
+        type: integer
+        description: During adapter auto-detection, the limit set by this argument allows the user to set a threshold up to which the file is considered already adapter-trimmed. If no adapter sequence exceeds this threshold, no additional adapter trimming will be performed (technically, the adapter is set to '-a X'). Quality trimming is still performed as usual.
+        required: false
+      - name: --max_length 
+        type: integer
+        description: Discard reads that are longer than the specified value after trimming. This is only advised for smallRNA sequencing to remove non-small RNA sequences.
+        required: false
+      - name: --stringency 
+        type: integer
+        description: Overlap with adapter sequence required to trim a sequence. Defaults to a very stringent setting of 1, i.e. even a single bp of overlapping sequence will be trimmed off from the 3' end of any read.
+        required: false
+        example: 1
+      - name: --error_rate
+        alternatives: -e 
+        type: double
+        description: Maximum allowed error rate (no. of errors divided by the length of the matching region)
+        required: false
+        example: 0.1
+      - name: --gzip
+        type: boolean_true
+        description: Compress the output file with GZIP. If the input files are GZIP-compressed the output files will automatically be GZIP compressed as well. As of v0.2.8 the compression will take place on the fly.
+      - name: --dont_gzip 
+        type: boolean_true
+        description: Output files won't be compressed with GZIP. This option overrides --gzip.
+      - name: --length 
+        type: integer 
+        description: Discard reads that became shorter than the specified length because of either quality or adapter trimming. A value of '0' effectively disables this behaviour. For paired-end files, both reads of a read-pair need to be longer than the specified length to be printed out to validated paired-end files. If only one read became too short there is the possibility of keeping such unpaired single-end reads using the --retain_unpaired option.
+        required: false
+        example: 20 
+      - name: --max_n 
+        type: integer
+        description: The total number of Ns a read may contain before it will be removed altogether.In a paired-end setting, either read exceeding this limit will result in the entire pair being removed from the trimmed output files. If COUNT is a number between 0 and 1, it is interpreted as a fraction of the read length.
+        required: false
+      - name: --trim_n
+        type: boolean_true
+        description: Removes Ns from either side of the read. This option does currently not work in RRBS mode.
+      - name: --no_report_file
+        type: boolean_true
+        description: If specified no report file will be generated.
+      - name: --suppress_warn
+        type: boolean_true
+        description: If specified any output to STDOUT or STDERR will be suppressed.
+      - name: --clip_R1
+        type: integer
+        description: Instructs TrimGalore to remove given number of bp from the 5' end of read 1 (or single-end reads). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. 
+        required: false
+      - name: --clip_R2 
+        type: integer
+        description: Instructs TrimGalore to remove given number bp from the 5' end of read 2 (paired-end reads only). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. For paired-end BS-Seq, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation. 
+        required: false
+      - name: --three_prime_clip_R1 
+        type: integer
+        description: Instructs Trim Galore to remove spacified number of bp from the 3' end of read 1 (or single-end reads) AFTER adapter/quality trimming has been performed. This may remove some  bias from the 3' end that is not directly related to adapter sequence or basecall quality.
+        required: false
+      - name: --three_prime_clip_R2 
+        type: integer
+        description: Instructs Trim Galore to remove <int> bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is not directly related to adapter sequence or basecall quality.
+        required: false
+      - name: --nextseq 
+        type: integer 
+        description: This enables the option '--nextseq-trim=3'CUTOFF' within Cutadapt, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases. This is mutually exlusive with '-q INT'.
+        required: false
+      - name: --basename
+        type: string
+        description: Use specified name (PREFERRED_NAME) as the basename for output files, instead of deriving the filenames from the input files. Single-end data would be called PREFERRED_NAME_trimmed.fq(.gz), or PREFERRED_NAME_val_1.fq(.gz) and PREFERRED_NAME_val_2.fq(.gz) for paired-end data. --basename only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        required: false
+  - name: Specific trimming options without adapter/quality trimming
+    arguments: 
+      - name: --hardtrim5 
+        type: integer
+        description: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to <int> bp at the 5'-end. Once hard-trimming of files is complete, Trim Galore will exit. Hard-trimmed output files will end in .<int>_5prime.fq(.gz). 
+        required: false
+      - name: --hardtrim3 
+        type: integer
+        description: Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences to <int> bp at the 3'-end. Once hard-trimming of files is complete, Trim Galore will exit. Hard-trimmed output files will end in .<int>_3prime.fq(.gz). 
+        required: false
+      - name: --clock
+        type: boolean_true
+        description: In this mode, reads are trimmed in a specific way that is currently used for the Mouse Epigenetic Clock.
+      - name: --polyA
+        type: boolean_true
+        description: This is a new, still experimental, trimming mode to identify and remove poly-A tails from sequences. When --polyA is selected, Trim Galore attempts to identify from the first supplied sample whether sequences contain more often a stretch of either 'AAAAAAAAAA' or 'TTTTTTTTTT'. This determines if Read 1 of a paired-end end file, or single-end files, are trimmed for PolyA or PolyT. In case of paired-end sequencing, Read2 is trimmed for the complementary base from the start of the reads. The auto-detection uses a default of A{20} for Read1 (3'-end trimming) and T{150} for Read2 (5'-end trimming). These values may be changed manually using the options -a and -a2. In addition to trimming the sequences, white spaces are replaced with _ and it records in the read ID how many bases were trimmed so it can later be used to identify PolyA trimmed sequences. This is currently done by writing tags to both the start ("32:A:") and end ("_PolyA:32") of the reads. The poly-A trimming mode expects that sequences were both adapter and quality  before looking for Poly-A tails, and it is the user's responsibility to carry out an initial round of trimming. 
+      - name: --implicon    
+        type: boolean_true
+        description: | 
+          This is a special mode of operation for paired-end data, such as required for the IMPLICON method, where a UMI sequence is getting transferred from the start of Read 2 to the readID of both reads. Following this, Trim Galore will exit. In it's current implementation, the UMI carrying reads come in the following format
+            Read 1  5' FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 3'
+            Read 2  3' UUUUUUUUFFFFFFFFFFFFFFFFFFFFFFFFFFFF 5'
+          Where UUUUUUUU is a random 8-mer unique molecular identifier (UMI) and FFFFFFF... is the actual fragment to be sequenced. The UMI of Read 2 (R2) is written into the read ID of both reads and removed from the actual sequence.
+  - name: RRBS-specific options
+    arguments: 
+      - name: --rrbs 
+        type: boolean_true
+        description: Specifies that the input file was an MspI digested RRBS sample (recognition site is CCGG). Single-end or Read 1 sequences (paired-end) which were adapter-trimmed will have a further 2 bp removed from their 3' end. Sequences which were merely trimmed because of poor quality will not be shortened further. Read 2 of paired-end libraries will in addition have the first 2 bp removed from the 5' end (by setting '--clip_r2 2'). This is to avoid using artificial methylation calls from the filled-in cytosine positions close to the 3' MspI site in sequenced fragments. This option is not recommended for users of the Tecan Ovation RRBS Methyl-Seq with TrueMethyl oxBS 1-16 kit (see below).
+      - name: --non_directional
+        type: boolean_true
+        description: Selecting this option for non-directional RRBS libraries will screen quality-trimmed sequences for 'CAA' or 'CGA' at the start of the read and, if found, removes the first two basepairs. Like with the option '--rrbs' this avoids using cytosine positions that were filled-in during the end-repair step. '--non_directional' requires '--rrbs' to be specified as well. Note that this option does not set '--clip_r2 2' in paired-end mode.
+      - name: --keep 
+        type: boolean_true
+        description: Keep the quality trimmed intermediate file. 
+  - name: Paired-end specific options
+    arguments: 
+      - name: --paired 
+        type: boolean_true
+        description: This option performs length trimming of quality/adapter/RRBS trimmed reads for paired-end files. To pass the validation test, both sequences of a sequence pair are required to have a certain minimum length which is governed by the option --length (see above). If only one read passes this length threshold the other read can be rescued (see option --retain_unpaired). Using this option lets you discard too short read pairs without disturbing the sequence-by-sequence order of FastQ files which is required by many aligners. Trim Galore expects paired-end files to be supplied in a pairwise fashion, e.g. file1_1.fq file1_2.fq SRR2_1.fq.gz SRR2_2.fq.gz ... .
+      - name: --retain_unpaired  
+        type: boolean_true
+        description: If only one of the two paired-end reads became too short, the longer read will be written to either '.unpaired_1.fq' or '.unpaired_2.fq' output files. The length cutoff for unpaired single-end reads is governed by the parameters -r1/--length_1 and -r2/--length_2. 
+      - name: --length_1
+        alternatives: -r1
+        type: integer 
+        description: Unpaired single-end read length cutoff needed for read 1 to be written to '.unpaired_1.fq' output file. These reads may be mapped in single-end mode.
+        example: 35 
+        required: false
+      - name: --length_2
+        alternatives: -r2
+        type: integer 
+        description: Unpaired single-end read length cutoff needed for read 2 to be written to '.unpaired_2.fq' output file. These reads may be mapped in single-end mode.
+        required: false
+        example: 35   
+  - name: Output
+    arguments:
+      - name: --output_dir
+        alternatives: -o
+        type: file
+        description: If specified all output will be written to this directory instead of the current directory. 
+        direction: output
+        required: true
+        default: trimmed_output
+      - name: --trimmed_r1
+        type: file
+        required: false
+        description: Output file for read 1. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: read_1.fastq
+      - name: --trimmed_r2
+        type: file
+        required: false
+        description: Output file for read 2. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: read_2.fastq
+      - name: --trimming_report_r1
+        type: file
+        required: false
+        description: Trimming report for read 1. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: read_1.trimming_report.txt
+      - name: --trimming_report_r2
+        type: file
+        description: Trimming report for read 1. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        required: false
+        example: read_2.trimming_report.txt
+      - name: --trimmed_fastqc_html_1
+        type: file
+        required: false
+        description: FastQC report for trimmed (single-end) reads (or read 1 for paired-end). Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: read_1.fastqc.html
+      - name: --trimmed_fastqc_html_2
+        type: file
+        description: FastQC report for trimmed reads (read2 for paired-end). Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        required: false
+        example: read_2.fastqc.html
+      - name: --trimmed_fastqc_zip_1
+        type: file
+        required: false
+        description: FastQC results for trimmed (single-end) reads (or read 1 for paired-end). Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: read_1.fastqc.zip
+      - name: --trimmed_fastqc_zip_2
+        type: file
+        description: FastQC results for trimmed reads (read2 for paired-end). Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        required: false
+        example: read_2.fastqc.zip
+      - name: --unpaired_r1
+        type: file
+        required: false
+        description: Output file for unpired read 1. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: unpaired_read_1.fastq
+      - name: --unpaired_r2
+        type: file
+        required: false
+        description: Output file for unpaired read 2. Only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+        direction: output
+        example: unpaired_read_2.fastq
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+    
+engines:
+- type: docker
+  image: quay.io/biocontainers/trim-galore:0.6.10--hdfd78af_0
+  setup:
+    - type: docker
+      run: |
+        echo "TrimGalore: `trim_galore --version | sed -n 's/.*version\s\+\([0-9]\+\.[0-9]\+\.[0-9]\+\).*/\1/p'`" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/trimgalore/help.txt b/src/trimgalore/help.txt
new file mode 100644
index 00000000..4bf38e99
--- /dev/null
+++ b/src/trimgalore/help.txt
@@ -0,0 +1,355 @@
+
+ USAGE:
+
+trim_galore [options] <filename(s)>
+
+
+-h/--help               Print this help message and exits.
+
+-v/--version            Print the version information and exits.
+
+-q/--quality <INT>      Trim low-quality ends from reads in addition to adapter removal. For
+                        RRBS samples, quality trimming will be performed first, and adapter
+                        trimming is carried in a second round. Other files are quality and adapter
+                        trimmed in a single pass. The algorithm is the same as the one used by BWA
+                        (Subtract INT from all qualities; compute partial sums from all indices
+                        to the end of the sequence; cut sequence at the index at which the sum is
+                        minimal). Default Phred score: 20.
+
+--phred33               Instructs Cutadapt to use ASCII+33 quality scores as Phred scores
+                        (Sanger/Illumina 1.9+ encoding) for quality trimming. Default: ON.
+
+--phred64               Instructs Cutadapt to use ASCII+64 quality scores as Phred scores
+                        (Illumina 1.5 encoding) for quality trimming.
+
+--fastqc                Run FastQC in the default mode on the FastQ file once trimming is complete.
+
+--fastqc_args "<ARGS>"  Passes extra arguments to FastQC. If more than one argument is to be passed
+                        to FastQC they must be in the form "arg1 arg2 etc.". An example would be:
+                        --fastqc_args "--nogroup --outdir /home/". Passing extra arguments will
+                        automatically invoke FastQC, so --fastqc does not have to be specified
+                        separately.
+
+-a/--adapter <STRING>   Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will
+                        try to auto-detect whether the Illumina universal, Nextera transposase or Illumina
+                        small RNA adapter sequence was used. Also see '--illumina', '--nextera' and
+                        '--small_rna'. If no adapter can be detected within the first 1 million sequences
+                        of the first file specified or if there is a tie between several adapter sequences,
+                        Trim Galore defaults to '--illumina' (as long as the Illumina adapter was one of the
+                        options, else '--nextera' is the default). A single base
+                        may also be given as e.g. -a A{10}, to be expanded to -a AAAAAAAAAA.
+
+                        At a special request, multiple adapters can also be specified like so:
+                        -a  " AGCTCCCG -a TTTCATTATAT -a TTTATTCGGATTTAT"
+                        -a2 " AGCTAGCG -a TCTCTTATAT -a TTTCGGATTTAT", or so:
+                        -a "file:../multiple_adapters.fa"
+                        -a2 "file:../different_adapters.fa"
+                        Potentially in conjucntion with the parameter "-n 3" to trim all adapters. Please note
+                        that this is NOT needed for standard trimming! 
+                        More Information here: https://github.com/FelixKrueger/TrimGalore/issues/86
+
+-a2/--adapter2 <STRING> Optional adapter sequence to be trimmed off read 2 of paired-end files. This
+                        option requires '--paired' to be specified as well. If the libraries to be trimmed
+                        are smallRNA then a2 will be set to the Illumina small RNA 5' adapter automatically
+                        (GATCGTCGGACT). A single base may also be given as e.g. -a2 A{10}, to be expanded
+                        to -a2 AAAAAAAAAA.
+
+--illumina              Adapter sequence to be trimmed is the first 13bp of the Illumina universal adapter
+                        'AGATCGGAAGAGC' instead of the default auto-detection of adapter sequence.
+
+--stranded_illumina     Adapter sequence to be trimmed is the first 13bp of the Illumina stranded mRNA or Total
+                        RNA adapter 'ACTGTCTCTTATA' instead of the default auto-detection of adapter sequence. 
+                        Note that this sequence resembles the Nextera sequence with an additional A from A-tailing.
+                        Please also see https://github.com/FelixKrueger/TrimGalore/issues/127 or 
+                        https://support.illumina.com/bulletins/2020/06/trimming-t-overhang-options-for-the-illumina-rna-library-prep-wo.html
+                        for further information. This sequence is currently NOT included in the adapter auto-detection.
+
+--nextera               Adapter sequence to be trimmed is the first 12bp of the Nextera adapter
+                        'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence.
+
+--small_rna             Adapter sequence to be trimmed is the first 12bp of the Illumina Small RNA 3' Adapter
+                        'TGGAATTCTCGG' instead of the default auto-detection of adapter sequence. Selecting
+                        to trim smallRNA adapters will also lower the --length value to 18bp. If the smallRNA
+                        libraries are paired-end then a2 will be set to the Illumina small RNA 5' adapter
+                        automatically (GATCGTCGGACT) unless -a 2 had been defined explicitly.
+
+--consider_already_trimmed <INT>     During adapter auto-detection, the limit set by <INT> allows the user to 
+                        set a threshold up to which the file is considered already adapter-trimmed. If no adapter
+                        sequence exceeds this threshold, no additional adapter trimming will be performed (technically,
+                        the adapter is set to '-a X'). Quality trimming is still performed as usual.
+                        Default: NOT SELECTED (i.e. normal auto-detection precedence rules apply).                     
+
+--max_length <INT>      Discard reads that are longer than <INT> bp after trimming. This is only advised for
+                        smallRNA sequencing to remove non-small RNA sequences.
+
+
+--stringency <INT>      Overlap with adapter sequence required to trim a sequence. Defaults to a
+                        very stringent setting of 1, i.e. even a single bp of overlapping sequence
+                        will be trimmed off from the 3' end of any read.
+
+-e <ERROR RATE>         Maximum allowed error rate (no. of errors divided by the length of the matching
+                        region) (default: 0.1)
+
+--gzip                  Compress the output file with GZIP. If the input files are GZIP-compressed
+                        the output files will automatically be GZIP compressed as well. As of v0.2.8 the
+                        compression will take place on the fly.
+
+--dont_gzip             Output files won't be compressed with GZIP. This option overrides --gzip.
+
+--length <INT>          Discard reads that became shorter than length INT because of either
+                        quality or adapter trimming. A value of '0' effectively disables
+                        this behaviour. Default: 20 bp.
+
+                        For paired-end files, both reads of a read-pair need to be longer than
+                        <INT> bp to be printed out to validated paired-end files (see option --paired).
+                        If only one read became too short there is the possibility of keeping such
+                        unpaired single-end reads (see --retain_unpaired). Default pair-cutoff: 20 bp.
+
+--max_n COUNT           The total number of Ns a read may contain before it will be removed altogether.
+                        In a paired-end setting, either read exceeding this limit will result in the entire
+                        pair being removed from the trimmed output files. If COUNT is a number between 0 and 1,
+                        it is interpreted as a fraction of the read length.
+
+--trim-n                Removes Ns from either side of the read. This option does currently not work in RRBS mode.
+
+-o/--output_dir <DIR>   If specified all output will be written to this directory instead of the current
+                        directory. If the directory doesn't exist it will be created for you.
+
+--no_report_file        If specified no report file will be generated.
+
+--suppress_warn         If specified any output to STDOUT or STDERR will be suppressed.
+
+--clip_R1 <int>         Instructs Trim Galore to remove <int> bp from the 5' end of read 1 (or single-end
+                        reads). This may be useful if the qualities were very poor, or if there is some
+                        sort of unwanted bias at the 5' end. Default: OFF.
+
+--clip_R2 <int>         Instructs Trim Galore to remove <int> bp from the 5' end of read 2 (paired-end reads
+                        only). This may be useful if the qualities were very poor, or if there is some sort
+                        of unwanted bias at the 5' end. For paired-end BS-Seq, it is recommended to remove
+                        the first few bp because the end-repair reaction may introduce a bias towards low
+                        methylation. Please refer to the M-bias plot section in the Bismark User Guide for
+                        some examples. Default: OFF.
+
+--three_prime_clip_R1 <int>     Instructs Trim Galore to remove <int> bp from the 3' end of read 1 (or single-end
+                        reads) AFTER adapter/quality trimming has been performed. This may remove some unwanted
+                        bias from the 3' end that is not directly related to adapter sequence or basecall quality.
+                        Default: OFF.
+
+--three_prime_clip_R2 <int>     Instructs Trim Galore to remove <int> bp from the 3' end of read 2 AFTER
+                        adapter/quality trimming has been performed. This may remove some unwanted bias from
+                        the 3' end that is not directly related to adapter sequence or basecall quality.
+                        Default: OFF.
+
+--2colour/--nextseq INT This enables the option '--nextseq-trim=3'CUTOFF' within Cutadapt, which will set a quality
+                        cutoff (that is normally given with -q instead), but qualities of G bases are ignored.
+                        This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without
+                        any signal are called as high-quality G bases. This is mutually exlusive with '-q INT'.
+
+
+--path_to_cutadapt </path/to/cutadapt>     You may use this option to specify a path to the Cutadapt executable,
+                        e.g. /my/home/cutadapt-1.7.1/bin/cutadapt. Else it is assumed that Cutadapt is in
+                        the PATH.
+
+--basename <PREFERRED_NAME>	Use PREFERRED_NAME as the basename for output files, instead of deriving the filenames from
+                        the input files. Single-end data would be called PREFERRED_NAME_trimmed.fq(.gz), or
+                        PREFERRED_NAME_val_1.fq(.gz) and PREFERRED_NAME_val_2.fq(.gz) for paired-end data. --basename
+                        only works when 1 file (single-end) or 2 files (paired-end) are specified, but not for longer lists.
+
+-j/--cores INT          Number of cores to be used for trimming [default: 1]. For Cutadapt to work with multiple cores, it
+                        requires Python 3 as well as parallel gzip (pigz) installed on the system. Trim Galore attempts to detect
+                        the version of Python used by calling Cutadapt. If Python 2 is detected, --cores is set to 1. If the Python
+                        version cannot be detected, Python 3 is assumed and we let Cutadapt handle potential issues itself.
+                        
+                        If pigz cannot be detected on your system, Trim Galore reverts to using gzip compression. Please note
+                        that gzip compression will slow down multi-core processes so much that it is hardly worthwhile, please 
+                        see: https://github.com/FelixKrueger/TrimGalore/issues/16#issuecomment-458557103 for more info).
+						
+                        Actual core usage: It should be mentioned that the actual number of cores used is a little convoluted.
+                        Assuming that Python 3 is used and pigz is installed, --cores 2 would use 2 cores to read the input
+                        (probably not at a high usage though), 2 cores to write to the output (at moderately high usage), and 
+                        2 cores for Cutadapt itself + 2 additional cores for Cutadapt (not sure what they are used for) + 1 core
+                        for Trim Galore itself. So this can be up to 9 cores, even though most of them won't be used at 100% for
+                        most of the time. Paired-end processing uses twice as many cores for the validation (= writing out) step.
+                        --cores 4 would then be: 4 (read) + 4 (write) + 4 (Cutadapt) + 2 (extra Cutadapt) +	1 (Trim Galore) = 15.
+
+                        It seems that --cores 4 could be a sweet spot, anything above has diminishing returns.
+			
+
+
+SPECIFIC TRIMMING - without adapter/quality trimming
+
+--hardtrim5 <int>       Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences
+                        to <int> bp at the 5'-end. Once hard-trimming of files is complete, Trim Galore will exit.
+                        Hard-trimmed output files will end in .<int>_5prime.fq(.gz). Here is an example:
+
+                        before:         CCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAAT
+                        --hardtrim5 20: CCTAAGGAAACAAGTACACT
+
+--hardtrim3 <int>       Instead of performing adapter-/quality trimming, this option will simply hard-trim sequences
+                        to <int> bp at the 3'-end. Once hard-trimming of files is complete, Trim Galore will exit.
+                        Hard-trimmed output files will end in .<int>_3prime.fq(.gz). Here is an example:
+
+                        before:         CCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAAT
+                        --hardtrim3 20:                                                   TTTTTAAGAAAATGGAAAAT
+
+--clock                 In this mode, reads are trimmed in a specific way that is currently used for the Mouse
+                        Epigenetic Clock (see here: Multi-tissue DNA methylation age predictor in mouse, Stubbs et al.,
+                        Genome Biology, 2017 18:68 https://doi.org/10.1186/s13059-017-1203-5). Following this, Trim Galore
+                        will exit.
+
+                        In it's current implementation, the dual-UMI RRBS reads come in the following format:
+
+                        Read 1  5' UUUUUUUU CAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF TACTG UUUUUUUU 3'
+                        Read 2  3' UUUUUUUU GTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ATGAC UUUUUUUU 5'
+
+                        Where UUUUUUUU is a random 8-mer unique molecular identifier (UMI), CAGTA is a constant region,
+                        and FFFFFFF... is the actual RRBS-Fragment to be sequenced. The UMIs for Read 1 (R1) and
+                        Read 2 (R2), as well as the fixed sequences (F1 or F2), are written into the read ID and
+                        removed from the actual sequence. Here is an example:
+
+                        R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:N:0: CGATGTTT
+                            ATCTAGTTCAGTACGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG
+                        R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:N:0: CGATGTTT
+                            CAATTTTGCAGTACAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA
+
+                        R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:N:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT
+                                         CGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG
+                        R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:N:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT
+                                         CAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA
+
+                        Following clock trimming, the resulting files (.clock_UMI.R1.fq(.gz) and .clock_UMI.R2.fq(.gz))
+                        should be adapter- and quality trimmed with Trim Galore as usual. In addition, reads need to be trimmed
+                        by 15bp from their 3' end to get rid of potential UMI and fixed sequences. The command is:
+
+                        trim_galore --paired --three_prime_clip_R1 15 --three_prime_clip_R2 15 *.clock_UMI.R1.fq.gz *.clock_UMI.R2.fq.gz
+
+                        Following this, reads should be aligned with Bismark and deduplicated with UmiBam
+                        in '--dual_index' mode (see here: https://github.com/FelixKrueger/Umi-Grinder). UmiBam recognises
+                        the UMIs within this pattern: R1:(ATCTAGTT):R2:(CAATTTTG): as (UMI R1) and (UMI R2).
+
+--polyA                 This is a new, still experimental, trimming mode to identify and remove poly-A tails from sequences.
+                        When --polyA is selected, Trim Galore attempts to identify from the first supplied sample whether
+                        sequences contain more often a stretch of either 'AAAAAAAAAA' or 'TTTTTTTTTT'. This determines
+                        if Read 1 of a paired-end end file, or single-end files, are trimmed for PolyA or PolyT. In case of
+                        paired-end sequencing, Read2 is trimmed for the complementary base from the start of the reads. The
+                        auto-detection uses a default of A{20} for Read1 (3'-end trimming) and T{150} for Read2 (5'-end trimming).
+                        These values may be changed manually using the options -a and -a2.
+
+                        In addition to trimming the sequences, white spaces are replaced with _ and it records in the read ID
+                        how many bases were trimmed so it can later be used to identify PolyA trimmed sequences. This is currently done
+                        by writing tags to both the start ("32:A:") and end ("_PolyA:32") of the reads in the following example:
+
+                        @READ-ID:1:1102:22039:36996 1:N:0:CCTAATCC
+                        GCCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAATAAAAACTTTATAAACACCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
+                        @32:A:READ-ID:1:1102:22039:36996_1:N:0:CCTAATCC_PolyA:32
+                        GCCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAATAAAAACTTTATAAACACC
+
+                        PLEASE NOTE: The poly-A trimming mode expects that sequences were both adapter and quality trimmed
+                        before looking for Poly-A tails, and it is the user's responsibility to carry out an initial round of
+                        trimming. The following sequence:
+ 
+                        1) trim_galore file.fastq.gz
+                        2) trim_galore --polyA file_trimmed.fq.gz
+                        3) zcat file_trimmed_trimmed.fq.gz | grep -A 3 PolyA | grep -v ^-- > PolyA_trimmed.fastq
+
+                        Will 1) trim qualities and Illumina adapter contamination, 2) find and remove PolyA contamination.
+                        Finally, if desired, 3) will specifically find PolyA trimmed sequences to a specific FastQ file of your choice.
+
+--implicon              This is a special mode of operation for paired-end data, such as required for the IMPLICON method, where a UMI sequence
+                        is getting transferred from the start of Read 2 to the readID of both reads. Following this, Trim Galore will exit.
+
+                        In it's current implementation, the UMI carrying reads come in the following format:
+
+                        Read 1  5' FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 3'
+                        Read 2  3' UUUUUUUUFFFFFFFFFFFFFFFFFFFFFFFFFFFF 5'
+
+                        Where UUUUUUUU is a random 8-mer unique molecular identifier (UMI) and FFFFFFF... is the actual fragment to be
+                        sequenced. The UMI of Read 2 (R2) is written into the read ID of both reads and removed from the actual sequence.
+                        Here is an example:
+
+                        R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:N:0: CGATGTTT
+                            ATCTAGTTCAGTACGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG
+                        R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:N:0: CGATGTTT
+                            CAATTTTGCAGTACAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA
+                        
+                        After --implicon trimming:
+                        R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:N:0: CGATGTTT:CAATTTTG
+                            ATCTAGTTCAGTACGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG
+                        R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:N:0: CGATGTTT:CAATTTTG
+                                    CAGTACAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA
+
+RRBS-specific options (MspI digested material):
+
+--rrbs                  Specifies that the input file was an MspI digested RRBS sample (recognition
+                        site: CCGG). Single-end or Read 1 sequences (paired-end) which were adapter-trimmed
+                        will have a further 2 bp removed from their 3' end. Sequences which were merely
+                        trimmed because of poor quality will not be shortened further. Read 2 of paired-end
+                        libraries will in addition have the first 2 bp removed from the 5' end (by setting
+                        '--clip_r2 2'). This is to avoid using artificial methylation calls from the filled-in
+                        cytosine positions close to the 3' MspI site in sequenced fragments.
+                        This option is not recommended for users of the Tecan Ovation RRBS Methyl-Seq with TrueMethyl
+                        oxBS 1-16 kit (see below).
+
+--non_directional       Selecting this option for non-directional RRBS libraries will screen
+                        quality-trimmed sequences for 'CAA' or 'CGA' at the start of the read
+                        and, if found, removes the first two basepairs. Like with the option
+                        '--rrbs' this avoids using cytosine positions that were filled-in
+                        during the end-repair step. '--non_directional' requires '--rrbs' to
+                        be specified as well. Note that this option does not set '--clip_r2 2' in
+                        paired-end mode.
+
+--keep                  Keep the quality trimmed intermediate file. Default: off, which means
+                        the temporary file is being deleted after adapter trimming. Only has
+                        an effect for RRBS samples since other FastQ files are not trimmed
+                        for poor qualities separately.
+
+
+Note for RRBS using the Tecan Ovation RRBS Methyl-Seq with TrueMethyl oxBS 1-16 kit:
+
+Owing to the fact that the Tecan Ovation RRBS kit attaches a varying number of nucleotides (0-3) after each MspI
+site Trim Galore should be run WITHOUT the option --rrbs. This trimming is accomplished in a subsequent
+diversity trimming step afterwards (see their manual).
+
+
+
+Note for RRBS using MseI:
+
+If your DNA material was digested with MseI (recognition motif: TTAA) instead of MspI it is NOT necessary
+to specify --rrbs or --non_directional since virtually all reads should start with the sequence
+'TAA', and this holds true for both directional and non-directional libraries. As the end-repair of 'TAA'
+restricted sites does not involve any cytosines it does not need to be treated especially. Instead, simply
+run Trim Galore! in the standard (i.e. non-RRBS) mode.
+
+
+
+
+Paired-end specific options:
+
+--paired                This option performs length trimming of quality/adapter/RRBS trimmed reads for
+                        paired-end files. To pass the validation test, both sequences of a sequence pair
+                        are required to have a certain minimum length which is governed by the option
+                        --length (see above). If only one read passes this length threshold the
+                        other read can be rescued (see option --retain_unpaired). Using this option lets
+                        you discard too short read pairs without disturbing the sequence-by-sequence order
+                        of FastQ files which is required by many aligners.
+
+                        Trim Galore! expects paired-end files to be supplied in a pairwise fashion, e.g.
+                        file1_1.fq file1_2.fq SRR2_1.fq.gz SRR2_2.fq.gz ... .
+
+
+--retain_unpaired       If only one of the two paired-end reads became too short, the longer
+                        read will be written to either '.unpaired_1.fq' or '.unpaired_2.fq'
+                        output files. The length cutoff for unpaired single-end reads is
+                        governed by the parameters -r1/--length_1 and -r2/--length_2. Default: OFF.
+
+-r1/--length_1 <INT>    Unpaired single-end read length cutoff needed for read 1 to be written to
+                        '.unpaired_1.fq' output file. These reads may be mapped in single-end mode.
+                        Default: 35 bp.
+
+-r2/--length_2 <INT>    Unpaired single-end read length cutoff needed for read 2 to be written to
+                        '.unpaired_2.fq' output file. These reads may be mapped in single-end mode.
+                        Default: 35 bp.
+
+Last modified on 02 02 2023.
+
diff --git a/src/trimgalore/script.sh b/src/trimgalore/script.sh
new file mode 100755
index 00000000..1cceea4b
--- /dev/null
+++ b/src/trimgalore/script.sh
@@ -0,0 +1,126 @@
+#!/bin/bash
+
+set -eo pipefail
+
+[[ ! -d $output_dir ]] && mkdir -p $par_output_dir
+
+IFS=";" read -ra input <<< $par_input
+
+unset_if_false=( 
+    par_phred33 
+    par_phred64 
+    par_fastqc 
+    par_illumina 
+    par_stranded_illumina 
+    par_nextera 
+    par_small_rna 
+    par_gzip 
+    par_dont_gzip 
+    par_trim_n 
+    par_no_report_file 
+    par_suppress_warn 
+    par_clock 
+    par_polyA 
+    par_implicon 
+    par_rrbs 
+    par_non_directional 
+    par_keep 
+    par_paired 
+    par_retain_unpaired 
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Add FastQC file arguments to fastqc_args
+fastqc_args="${par_fastqc_args}"
+if [ -f "$par_fastqc_contaminants" ]; then 
+    fastqc_args+=" --contaminants $par_fastqc_contaminants"
+fi
+if [ -f "$par_fastqc_adapters" ]; then 
+    fastqc_args+=" --adapters $par_fastqc_adapters"
+fi
+if [ -f "$par_fastqc_limits" ]; then 
+    fastqc_args+=" --limits $par_fastqc_limits"
+fi
+
+trim_galore \
+    ${par_quality:+-q "${par_quality}"} \
+    ${par_phred33:+--phred33} \
+    ${par_phred64:+--phred64 } \
+    ${par_fastqc:+--fastqc } \
+    ${fastqc_args:+--fastqc_args "${fastqc_args}"} \
+    ${par_adapter:+-a "${par_adapter}"} \
+    ${par_adapter2:+-a2 "${par_adapter2}"} \
+    ${par_illumina:+--illumina} \
+    ${par_stranded_illumina:+--stranded_illumina} \
+    ${par_nextera:+--nextera} \
+    ${par_small_rna:+--small_rna} \
+    ${par_consider_already_trimmed:+--consider_already_trimmed "${par_consider_already_trimmed}"} \
+    ${par_max_length:+--max_length "${par_max_length}"} \
+    ${par_stringency:+--stringency "${par_stringency}"} \
+    ${par_error_rate:+-e "${par_error_rate}"} \
+    ${par_gzip:+--gzip} \
+    ${par_dont_gzip:+--dont_gzip} \
+    ${par_length:+--length "${par_length}"} \
+    ${par_max_n:+--max_n "${par_max_n}"} \
+    ${par_trim_n:+--trim-n "${par_trim_n}"} \
+    ${par_no_report_file:+--no_report_file} \
+    ${par_suppress_warn:+--suppress_warn} \
+    ${par_clip_R1:+--clip_R1 "${par_clip_R1}"} \
+    ${par_clip_R2:+--clip_R2 "${par_clip_R2}"} \
+    ${par_three_prime_clip_R1:+--three_prime_clip_R1 "${par_three_prime_clip_R1}"} \
+    ${par_three_prime_clip_R2:+--three_prime_clip_R2 "${par_three_prime_clip_R2}"} \
+    ${par_nextseq:+--nextseq "${par_nextseq}"} \
+    ${par_basename:+-basename "${par_basename}"} \
+    ${par_hardtrim5:+--hardtrim5 "${par_hardtrim5}"} \
+    ${par_hardtrim3:+--hardtrim3 "${par_hardtrim3}"} \
+    ${par_clock:+--clock} \
+    ${par_polyA:+--polyA} \
+    ${par_implicon:+--implicon "${par_implicon}"} \
+    ${par_rrbs:+--rrbs} \
+    ${par_non_directional:+--non_directional} \
+    ${par_keep:+--keep} \
+    ${par_paired:+--paired} \
+    ${par_retain_unpaired:+--retain_unpaired} \
+    ${par_length_1:+-r1 "${par_length_1}"} \
+    ${par_length_2:+-r2 "${par_length_2}"} \
+    ${meta_cpus:+-j "${meta_cpus}"} \
+    -o $par_output_dir \
+    ${input[*]}
+
+if [ $par_paired == "true" ]; then     
+
+    input_r1=$(basename -- "${input[0]}")
+    input_r2=$(basename -- "${input[1]}")
+    [[ ! -z "$par_trimmed_r1" ]] && mv $par_output_dir/*val_1.f*q* $par_trimmed_r1
+    [[ ! -z "$par_trimmed_r2" ]] && mv $par_output_dir/*val_2.f*q* $par_trimmed_r2
+    [[ ! -z "$par_trimming_report_r1" ]] && mv $par_output_dir/${input_r1}_trimming_report.txt $par_trimming_report_r1
+    [[ ! -z "$par_trimming_report_r2" ]] && mv $par_output_dir/${input_r2}_trimming_report.txt $par_trimming_report_r2
+    
+    if [ "$par_fastqc" == "true" ]; then 
+        [[ ! -z "$par_trimmed_fastqc_html_1" ]] && mv $par_output_dir/*val_1_fastqc.html $par_trimmed_fastqc_html_1
+        [[ ! -z "$par_trimmed_fastqc_html_2" ]] && mv $par_output_dir/*val_2_fastqc.html $par_trimmed_fastqc_html_2
+        [[ ! -z "$par_trimmed_fastqc_zip_1" ]] && mv $par_output_dir/*val_1_fastqc.zip $par_trimmed_fastqc_zip_1
+        [[ ! -z "$par_trimmed_fastqc_zip_2" ]] && mv $par_output_dir/*val_2_fastqc.zip $par_trimmed_fastqc_zip_2
+    fi
+    
+    if [ "$par_retain_unpaired" == "true" ]; then
+        [[ ! -z "$par_unpaired_r1" ]] && mv $par_output_dir/*.unpaired_1.f*q* $par_unpaired_r1
+        [[ ! -z "$par_unpaired_r2" ]] && mv $par_output_dir/*.unpaired_2.f*q* $par_unpaired_r2
+    fi
+
+else
+    
+    input_r1=$(basename -- "${input[0]}")
+    [[ ! -z "$par_trimmed_r1" ]] && mv $par_output_dir/*_trimmed.fq* $par_trimmed_r1
+    [[ ! -z "$par_trimming_report_r1" ]] && mv $par_output_dir/${input_r1}_trimming_report.txt $par_trimming_report_r1
+    
+    if [ "$par_fastqc" == "true" ]; then 
+        [[ ! -z "$par_trimmed_fastqc_html_1" ]] && mv $par_output_dir/*_trimmed_fastqc.html $par_trimmed_fastqc_html_1
+        [[ ! -z "$par_trimmed_fastqc_zip_1" ]] && mv $par_output_dir/*_trimmed_fastqc.zip $par_trimmed_fastqc_zip_1
+    fi
+
+fi
\ No newline at end of file
diff --git a/src/trimgalore/test.sh b/src/trimgalore/test.sh
new file mode 100644
index 00000000..8cb3ccdb
--- /dev/null
+++ b/src/trimgalore/test.sh
@@ -0,0 +1,125 @@
+#!/bin/bash
+
+set -eo pipefail
+
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_doesnt_exist() {
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
+}
+assert_file_empty() {
+  [ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_file_not_contains() {
+  grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
+}
+
+#################################################################
+
+echo ">>> Prepare test data"
+
+cat > example_R1.fastq <<'EOF'
+@SRR6357071.22842410 22842410/1 kraken:taxid|4932
+CAAGTTTTCATCTTCAACAGCTGATTGACTTCTTTGTGGTATGCCTCGATATATTTTTCTTTTTCTTTAATATCTTTATTATAGGTGATTGCCTCATCGTA
++
+BBBBBFFFFFFFFFFFFFFF/BFFFFFFFFFFFFFFFFBFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFBF<
+@SRR6357071.52260105 52260105/1 kraken:taxid|4932
+TAGACTTACCAGTACCCTTTTCGACGGCGGAAACATTCAAAATACCGTTAGAGTCGACATCGAAAGTGACTTCAATTTGTGGGACACCTCTTGGAGCTGGT
++
+BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFF
+EOF
+
+cat > example_R2.fastq <<'EOF'
+@SRR6357071.22842410 22842410/2 kraken:taxid|4932
+CCGAGATCGAAGAAACGAATTCACCTGATTGCAGCTGTAAAAGCAGTAAAATCAATCAAACCAATACGGACAACCTTACGATACGATGAGGCAATCACCTA
++
+BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
+@SRR6357071.52260105 52260105/2 kraken:taxid|4932
+GTTGATTCCAAGAAACTCTACCATTCCAACTAAGAAATCCGAAGTTTTCTCTACTTATGCTGACAACCAACCAGGTGTCTTGATTCAAGTCTTTGAAGGTG
++
+BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
+EOF
+
+#################################################################
+
+echo ">>> Testing for single-end reads"
+"$meta_executable" \
+    --input "example_R1.fastq" \
+    --trimmed_fastqc_html_1 output_se_test/example.trimmed.html \
+    --trimmed_fastqc_zip_1 output_se_test/example.trimmed.zip \
+    --trimmed_r1 output_se_test/example.trimmed.fastq \
+    --trimming_report_r1 output_se_test/example.trimming_report.txt \
+    --fastqc \
+    --output_dir output_se_test
+
+echo ">> Checking output"
+assert_file_exists "output_se_test/example.trimmed.html"
+assert_file_exists "output_se_test/example.trimmed.zip"
+assert_file_exists "output_se_test/example.trimmed.fastq"
+assert_file_exists "output_se_test/example.trimming_report.txt"
+
+echo ">> Check if output is empty"
+assert_file_not_empty "output_se_test/example.trimmed.html"
+assert_file_not_empty "output_se_test/example.trimmed.zip"
+assert_file_not_empty "output_se_test/example.trimmed.fastq"
+assert_file_not_empty "output_se_test/example.trimming_report.txt"
+
+echo ">> Check contents"
+assert_file_contains "output_se_test/example.trimmed.fastq" "@SRR6357071.22842410 22842410/1"
+assert_file_contains "output_se_test/example.trimming_report.txt" "Sequences removed because they became shorter than the length cutoff"
+
+#################################################################
+
+echo ">>> Testing for paired-end reads"
+"$meta_executable" \
+    --paired \
+    --input "example_R1.fastq;example_R2.fastq" \
+    --trimmed_fastqc_html_1 output_pe_test/example_R1.trimmed.html \
+    --trimmed_fastqc_html_2 output_pe_test/example_R2.trimmed.html \
+    --trimmed_fastqc_zip_1 output_pe_test/example_R1.trimmed.zip \
+    --trimmed_fastqc_zip_2 output_pe_test/example_R2.trimmed.zip \
+    --trimmed_r1 output_pe_test/example_R1.trimmed.fastq \
+    --trimmed_r2 output_pe_test/example_R2.trimmed.fastq \
+    --trimming_report_r1 output_pe_test/example_R1.trimming_report.txt \
+    --trimming_report_r2 output_pe_test/example_R2.trimming_report.txt \
+    --fastqc \
+    --output_dir output_pe_test
+
+echo ">> Checking output"
+assert_file_exists "output_pe_test/example_R1.trimmed.html"
+assert_file_exists "output_pe_test/example_R2.trimmed.html"
+assert_file_exists "output_pe_test/example_R1.trimmed.zip"
+assert_file_exists "output_pe_test/example_R2.trimmed.zip"
+assert_file_exists "output_pe_test/example_R1.trimmed.fastq"
+assert_file_exists "output_pe_test/example_R2.trimmed.fastq"
+assert_file_exists "output_pe_test/example_R1.trimming_report.txt"
+assert_file_exists "output_pe_test/example_R2.trimming_report.txt"
+
+echo ">> Check if output is empty"
+assert_file_not_empty "output_pe_test/example_R1.trimmed.html"
+assert_file_not_empty "output_pe_test/example_R2.trimmed.html"
+assert_file_not_empty "output_pe_test/example_R1.trimmed.zip"
+assert_file_not_empty "output_pe_test/example_R2.trimmed.zip"
+assert_file_not_empty "output_pe_test/example_R1.trimmed.fastq"
+assert_file_not_empty "output_pe_test/example_R2.trimmed.fastq"
+assert_file_not_empty "output_pe_test/example_R1.trimming_report.txt"
+assert_file_not_empty "output_pe_test/example_R2.trimming_report.txt"
+
+echo ">> Check contents"
+assert_file_contains "output_pe_test/example_R1.trimmed.fastq" "@SRR6357071.22842410 22842410/1"
+assert_file_contains "output_pe_test/example_R2.trimmed.fastq" "@SRR6357071.22842410 22842410/2"
+assert_file_contains "output_pe_test/example_R1.trimming_report.txt" "sequences processed in total"
+assert_file_contains "output_pe_test/example_R2.trimming_report.txt" "Number of sequence pairs removed because at least one read was shorter than the length cutoff"
+
+#################################################################
+
+echo ">>> Test finished successfully"
+exit 0

From 237a2e3a229ee589d1ebbc282526f87398e26f58 Mon Sep 17 00:00:00 2001
From: Hendrik Cannoodt <hendrik.cannoodt@gmail.com>
Date: Fri, 27 Sep 2024 11:52:08 +0200
Subject: [PATCH 26/42] Fixes the typo raised in issue #132 (#157)

* Fixes the typo raised in issue #132

* Add changelog entry

* fix typo, modify script

---------

Co-authored-by: jakubmajercik <jakub.majercik@gmail.com>
---
 CHANGELOG.md              | 4 ++++
 src/falco/config.vsh.yaml | 2 +-
 src/falco/script.sh       | 4 ++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0613fa25..a2aa5387 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,10 @@
 
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
+## BREAKING CHANGES
+
+* `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
+
 ## MINOR CHANGES
 
 * Upgrade to Viash 0.9.0.
diff --git a/src/falco/config.vsh.yaml b/src/falco/config.vsh.yaml
index de9906ef..a161e252 100644
--- a/src/falco/config.vsh.yaml
+++ b/src/falco/config.vsh.yaml
@@ -86,7 +86,7 @@ argument_groups:
           bisulfite sequencing, and more Ts and fewer 
           Cs are therefore expected and will be 
           accounted for in base content.
-      - name: --reverse_complliment
+      - name: --reverse_complement
         alternatives: [-r]
         type: boolean_true
         description: |
diff --git a/src/falco/script.sh b/src/falco/script.sh
index 43f5efe5..13e2eab4 100644
--- a/src/falco/script.sh
+++ b/src/falco/script.sh
@@ -4,7 +4,7 @@ set -eo pipefail
 
 [[ "$par_nogroup" == "false" ]] && unset par_nogroup
 [[ "$par_bisulfite" == "false" ]] && unset par_bisulfite
-[[ "$par_reverse_compliment" == "false" ]] && unset par_reverse_compliment
+[[ "$par_reverse_complement" == "false" ]] && unset par_reverse_complement
 
 IFS=";" read -ra input <<< $par_input
 
@@ -15,7 +15,7 @@ $(which falco) \
   ${par_limits:+--limits "$par_limits"} \
   ${par_subsample:+-subsample $par_subsample} \
   ${par_bisulfite:+-bisulfite} \
-  ${par_reverse_compliment:+-reverse-compliment} \
+  ${par_reverse_complement:+-reverse-complement} \
   ${par_outdir:+--outdir "$par_outdir"} \
   ${par_format:+--format "$par_format"} \
   ${par_data_filename:+-data-filename "$par_data_filename"} \

From 0a0edcacb5368517d249210022363bd9265f1bf5 Mon Sep 17 00:00:00 2001
From: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Date: Thu, 3 Oct 2024 14:46:57 +0200
Subject: [PATCH 27/42] Cutadapt: fix non-functional action parameter (#161)

* Cutadapt: fix non-functional action parameter

* Add PR number
---
 CHANGELOG.md           | 4 ++++
 src/cutadapt/script.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a2aa5387..d1654375 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -13,6 +13,10 @@
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
 
+## BUG FIXES
+
+* `cutadapt`: fix the the non-functional `action` parameter (PR #161).
+
 ## MINOR CHANGES
 
 * Upgrade to Viash 0.9.0.
diff --git a/src/cutadapt/script.sh b/src/cutadapt/script.sh
index 20c92724..d181e2b0 100644
--- a/src/cutadapt/script.sh
+++ b/src/cutadapt/script.sh
@@ -108,7 +108,7 @@ input_args=$(echo \
   ${par_overlap:+--overlap "${par_overlap}"} \
   ${par_match_read_wildcards:+--match-read-wildcards} \
   ${par_no_match_adapter_wildcards:+--no-match-adapter-wildcards} \
-  ${par_action:+--action "${par_action}"} \
+  ${par_action:+--action="${par_action}"} \
   ${par_revcomp:+--revcomp} \
 )
 debug "Arguments to cutadapt:"

From add125261c6fa0ed7c9906fc85e7368d2072c4a3 Mon Sep 17 00:00:00 2001
From: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Date: Mon, 7 Oct 2024 11:06:04 +0200
Subject: [PATCH 28/42] FEAT: avoid using boolean_false (#160)

---
 CHANGELOG.md                                  | 4 ++++
 CONTRIBUTING.md                               | 6 ++++++
 src/agat/agat_convert_bed2gff/config.vsh.yaml | 2 +-
 src/agat/agat_convert_bed2gff/script.sh       | 2 +-
 src/cutadapt/config.vsh.yaml                  | 4 ++--
 src/cutadapt/script.sh                        | 4 ++--
 6 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d1654375..47c786c6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -19,6 +19,10 @@
 
 ## MINOR CHANGES
 
+* `agat_convert_bed2gff`: change type of argument `inflate_off` from `boolean_false` to `boolean_true` (PR #160).
+
+* `cutadapt`: change type of argument `no_indels` and `no_match_adapter_wildcards` from `boolean_false` to `boolean_true` (PR #160).
+
 * Upgrade to Viash 0.9.0.
 
 # biobox 0.2.0
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a32b680c..1e4ef18c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -231,6 +231,12 @@ Finally, add all other arguments to the config file. There are a few exceptions:
 
 * If the help lists defaults, do not add them as defaults but to the description. Example: `description: <Explanation of parameter>. Default: 10.`
 
+Note:
+  
+* Prefer using `boolean_true` over `boolean_false`. This avoids confusion when specifying values for this argument in a Nextflow workflow.
+  For example, consider the CLI option `--no-indels` for `cutadapt`. If the config for `cutadapt` would specify an argument `no_indels` of type `boolean_false`,
+  the script of the component must pass a `--no-indels` argument to `cutadapt` when `par_no_indels` is set to `false`. This becomes problematic setting a value for this argument using `fromState` in a nextflow workflow: with `fromState: ["no_indels": true]`, the value that gets passed to the script is `true` and the `--no-indels` flag would *not* be added to the options for `cutadapt`. This is inconsitent to what one might expect when interpreting `["no_indels": true]`.
+  When using `boolean_true`, the reasoning becomes simpler because its value no longer represents the effect of the argument, but wether or not the flag is set.
 
 ### Step 10: Add a Docker engine
 
diff --git a/src/agat/agat_convert_bed2gff/config.vsh.yaml b/src/agat/agat_convert_bed2gff/config.vsh.yaml
index a0fafc44..4466b5f1 100644
--- a/src/agat/agat_convert_bed2gff/config.vsh.yaml
+++ b/src/agat/agat_convert_bed2gff/config.vsh.yaml
@@ -49,7 +49,7 @@ argument_groups:
       - name: --inflate_off
         description: |
           By default we inflate the block fields (blockCount, blockSizes, blockStarts) to create subfeatures of the main feature (primary_tag). The type of subfeature created is based on the inflate_type parameter. If you do not want this inflating behaviour you can deactivate it by using the --inflate_off option.
-        type: boolean_false
+        type: boolean_true
       - name: --inflate_type
         description: |
           Feature type (3rd column in gff) created when inflate parameter activated [default: exon].
diff --git a/src/agat/agat_convert_bed2gff/script.sh b/src/agat/agat_convert_bed2gff/script.sh
index fbeb9206..4d4b8209 100644
--- a/src/agat/agat_convert_bed2gff/script.sh
+++ b/src/agat/agat_convert_bed2gff/script.sh
@@ -4,7 +4,7 @@
 ## VIASH END
 
 # unset flags
-[[ "$par_inflate_off" == "true" ]] && unset par_inflate_off
+[[ "$par_inflate_off" == "false" ]] && unset par_inflate_off
 [[ "$par_verbose" == "false" ]] && unset par_verbose
 
 # run agat_convert_sp_bed2gff.pl
diff --git a/src/cutadapt/config.vsh.yaml b/src/cutadapt/config.vsh.yaml
index 7e36a8e0..e20fb7fb 100644
--- a/src/cutadapt/config.vsh.yaml
+++ b/src/cutadapt/config.vsh.yaml
@@ -196,7 +196,7 @@ argument_groups:
           length of matching region. Default: 0.1 (10%).
         example: 0.1
       - name: --no_indels
-        type: boolean_false
+        type: boolean_true
         description: |
           Allow only mismatches in alignments.
 
@@ -218,7 +218,7 @@ argument_groups:
         description: |
           Interpret IUPAC wildcards in reads.
       - name: --no_match_adapter_wildcards
-        type: boolean_false
+        type: boolean_true
         description: |
           Do not interpret IUPAC wildcards in adapters.
       - name: --action
diff --git a/src/cutadapt/script.sh b/src/cutadapt/script.sh
index d181e2b0..1986e162 100644
--- a/src/cutadapt/script.sh
+++ b/src/cutadapt/script.sh
@@ -96,9 +96,9 @@ debug
 # Input arguments 
 ###########################################################
 echo ">> Parsing input arguments"
-[[ "$par_no_indels" == "true" ]] && unset par_no_indels
+[[ "$par_no_indels" == "false" ]] && unset par_no_indels
 [[ "$par_match_read_wildcards" == "false" ]] && unset par_match_read_wildcards
-[[ "$par_no_match_adapter_wildcards" == "true" ]] && unset par_no_match_adapter_wildcards
+[[ "$par_no_match_adapter_wildcards" == "false" ]] && unset par_no_match_adapter_wildcards
 [[ "$par_revcomp" == "false" ]] && unset par_revcomp
 
 input_args=$(echo \

From 86333c1a465db45facd936695f1f33b186ccf0fc Mon Sep 17 00:00:00 2001
From: Suman Muralidharan <104161349+sumanm99@users.noreply.github.com>
Date: Tue, 15 Oct 2024 23:46:17 +0530
Subject: [PATCH 29/42] SnpEff (#153)

* Help file

* config file

* config file

* runners script

* config file

* test script

* test

* test

* runners script

* snake case

* snake case

* output parameters

* modify argument formatting, container setup

* fix buf with mv command

* avoid boolean_false and fix bug with output files

---------

Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 src/snpeff/config.vsh.yaml              | 297 ++++++++++++++++++++++++
 src/snpeff/help.txt                     |  79 +++++++
 src/snpeff/script.sh                    | 148 ++++++++++++
 src/snpeff/test.sh                      | 129 ++++++++++
 src/snpeff/test_data/cancer.vcf         |   2 +
 src/snpeff/test_data/my_annotations.bed |   1 +
 src/snpeff/test_data/script.sh          |  15 ++
 src/snpeff/test_data/test.vcf           |   1 +
 8 files changed, 672 insertions(+)
 create mode 100644 src/snpeff/config.vsh.yaml
 create mode 100644 src/snpeff/help.txt
 create mode 100644 src/snpeff/script.sh
 create mode 100644 src/snpeff/test.sh
 create mode 100644 src/snpeff/test_data/cancer.vcf
 create mode 100644 src/snpeff/test_data/my_annotations.bed
 create mode 100644 src/snpeff/test_data/script.sh
 create mode 100644 src/snpeff/test_data/test.vcf

diff --git a/src/snpeff/config.vsh.yaml b/src/snpeff/config.vsh.yaml
new file mode 100644
index 00000000..5fb8622d
--- /dev/null
+++ b/src/snpeff/config.vsh.yaml
@@ -0,0 +1,297 @@
+name: snpeff
+description: |
+  Genetic variant annotation, and functional effect prediction toolbox. 
+  It annotates and predicts the effects of genetic variants on genes and 
+  proteins (such as amino acid changes).
+keywords: [ "annotation", "effect prediction", "snp", "variant", "vcf"]
+
+links:
+  repository: https://github.com/pcingola/SnpEff
+  homepage: https://pcingola.github.io/SnpEff/
+  documentation: https://pcingola.github.io/SnpEff/
+references:
+  doi: 10.3389/fgene.2012.00035
+license: MIT
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        type: file
+        description: Input variants file.
+        example: test.vcf
+        required: true
+      - name: --genome_version
+        type: string
+        description: Reference genome version.
+        example: GRCh37.75
+        required: true
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        description: The output file.
+        example: out.vcf
+        direction: output
+        required: true
+      - name: --summary
+        type: file
+        description: Summary file directory.
+        example: summary_dir
+        direction: output
+      - name: --genes
+        type: file
+        description: Txt file directory.
+        example: genes_dir
+        direction: output
+  - name: Options
+    arguments:
+      - name: --chr
+        type: string
+        description: |
+          Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
+      - name: --classic
+        type: boolean_true
+        description: Use old style annotations instead of Sequence Ontology and Hgvs.
+      - name: --csv_stats
+        type: file
+        description: Create CSV summary file.
+      - name: --download
+        type: boolean_true
+        description: Download reference genome if not available.
+      - name: --input_format
+        alternatives: [-i]
+        type: string
+        description: |
+          Input format [ vcf, bed ]. Default: VCF.
+          example: "VCF"
+      - name: --file_list
+        type: boolean_true
+        description: Input actually contains a list of files to process.
+      - name: --output_format
+        alternatives: [-o]
+        type: string
+        description: |
+          Output format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
+        example: "VCF"
+      - name: --stats
+        alternatives: [-s, --htmlStats]
+        type: boolean_true
+        description: Create HTML summary file.
+      - name: --no_stats
+        type: boolean_true
+        description: Do not create stats (summary) file.
+  - name: Results filter options
+    arguments:
+      - name: --fi
+        alternatives: [--filterInterval]
+        type: file
+        description: |
+          Only analyze changes that intersect with the intervals 
+          specified in this file. This option can be used several times.
+      - name: --no_downstream 
+        type: boolean_true
+        description: Do not show DOWNSTREAM changes
+      - name: --no_intergenic
+        type: boolean_true
+        description: Do not show INTERGENIC changes.
+      - name: --no_intron
+        type: boolean_true
+        description: Do not show INTRON changes.
+      - name: --no_upstream
+        type: boolean_true
+        description: Do not show UPSTREAM changes.
+      - name: --no_utr
+        type: boolean_true
+        description: Do not show 5_PRIME_UTR or 3_PRIME_UTR changes.
+      - name: --no
+        type: string
+        description: |
+          Do not show 'EffectType'. This option can be used several times.
+  - name: Annotations options
+    arguments:
+      - name: --cancer 
+        type: boolean_true
+        description: Perform 'cancer' comparisons (Somatic vs Germline).
+      - name: --cancer_samples
+        type: file
+        description: Two column TXT file defining 'original \t derived' samples.
+      - name: --fastaprot
+        type: file
+        description: |
+          Create an output file containing the resulting protein sequences.
+      - name: --format_eff
+        type: boolean_true
+        description: |
+          Use 'EFF' field compatible with older versions (instead of 'ANN').
+      - name: --gene_id
+        type: boolean_true
+        description: Use gene ID instead of gene name (VCF output).
+      - name: --hgvs
+        type: boolean_true
+        description: Use HGVS annotations for amino acid sub-field.
+      - name: --hgvs_old
+        type: boolean_true
+        description: Use old HGVS notation.
+      - name: --hgvs1_letter_aa 
+        type: boolean_true
+        description: Use one letter Amino acid codes in HGVS notation.
+      - name: --hgvs_tr_id
+        type: boolean_true
+        description: Use transcript ID in HGVS notation.
+      - name: --lof
+        type: boolean_true
+        description: |
+          Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
+      - name: -no_hgvs
+        type: boolean_true
+        description: Do not add HGVS annotations.
+      - name: --no_lof
+        type: boolean_true
+        description: Do not add LOF and NMD annotations.
+      - name: --no_shift_hgvs
+        type: boolean_true
+        description: |
+          Do not shift variants according to HGVS notation (most 3prime end).
+      - name: --oicr
+        type: boolean_true
+        description: Add OICR tag in VCF file.
+      - name: --sequence_ontology
+        type: boolean_true
+        description: Use Sequence Ontology terms.
+  - name: Generic options
+    arguments:
+      - name: --config
+        alternatives: [-c]
+        type: file
+        description: Specify config file
+      - name: --config_option
+        type: string
+        description: Override a config file option (name=value).
+      - name: --debug
+        alternatives: [-d]
+        type: boolean_true
+        description: Debug mode (very verbose).
+      - name: --data_dir
+        type: file
+        description: Override data_dir parameter from config file.
+      - name: --no_download
+        type: boolean_true
+        description: Do not download a SnpEff database, if not available locally.
+      - name: --no_log
+        type: boolean_true
+        description: Do not report usage statistics to server.
+      - name: --quiet
+        alternatives: [-q]
+        type: boolean_true
+        description: Quiet mode (do not show any messages or errors)
+      - name: --verbose
+        alternatives: [-v]
+        type: boolean_true
+        description: Verbose mode.
+  - name: Database options
+    arguments:
+      - name: --canon 
+        type: boolean_true
+        description: Only use canonical transcripts.
+      - name: --canon_list
+        type: file
+        description: |
+          Only use canonical transcripts, replace some transcripts using the 'gene_id         
+          transcript_id' entries in <file>.
+      - name: --tag
+        type: string
+        description: |
+          Only use transcript having a tag 'tagName'. This option can be used multiple times.
+      - name: --no_tag
+        type: boolean_true
+        description: |
+          Filter out transcript having a tag 'tagName'. This option can be used multiple times.
+      - name: --interaction
+        type: boolean_true
+        description: Annotate using interactions (requires interaction database).
+      - name: --interval
+        type: file
+        description: |
+          Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times).
+      - name: --max_tsl
+        type: integer
+        description: Only use transcripts having Transcript Support Level lower than <TSL_number>.
+      - name: --motif 
+        type: boolean_true
+        description: Annotate using motifs (requires Motif database).
+      - name: --nextprot
+        type: boolean_true
+        description: Annotate using NextProt (requires NextProt database).
+      - name: --no_genome
+        type: boolean_true
+        description: Do not load any genomic database (e.g. annotate using custom files).
+      - name: --no_expand_iub
+        type: boolean_true
+        description: Disable IUB code expansion in input variants.
+      - name: --no_interaction
+        type: boolean_true
+        description: Disable inteaction annotations.
+      - name: --no_motif
+        type: boolean_true
+        description: Disable motif annotations.
+      - name: --no_nextprot
+        type: boolean_true
+        description: Disable NextProt annotations.
+      - name: --only_reg
+        type: boolean_true
+        description: Only use regulation tracks.
+      - name: --only_protein
+        type: boolean_true
+        description: Only use protein coding transcripts.
+      - name: --only_tr
+        type: file
+        description: |
+          Only use the transcripts in this file. Format: One transcript ID per line.
+        example: file.txt
+      - name: --reg
+        type: string
+        description: Regulation track to use (this option can be used add several times).
+      - name: --ss
+        alternatives: [--spliceSiteSize]
+        type: integer
+        description: |
+          Set size for splice sites (donor and acceptor) in bases. Default: 2.
+      - name: --splice_region_exon_size
+        type: integer
+        description: |
+          Set size for splice site region within exons. Default: 3 bases.
+      - name: --splice_region_intron_min
+        type: integer
+        description: |
+          Set minimum number of bases for splice site region within intron. Default: 3 bases.
+      - name: --splice_region_intron_max
+        type: integer
+        description: |
+          Set maximum number of bases for splice site region within intron. Default: 8 bases.
+      - name: --strict
+        type: boolean_true
+        description: Only use 'validated' transcripts (i.e. sequence has been checked).
+      - name: --ud
+        alternatives: [--upDownStreamLen]
+        type: integer
+        description: Set upstream downstream interval length (in bases).
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/staphb/snpeff:5.2a
+    setup:
+      - type: docker
+        run: |
+          version=$(snpEff -version) && \
+          version_trimmed=$(echo "$version" | awk '{print $1, $2}') && \
+          echo "$version_trimmed" > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/snpeff/help.txt b/src/snpeff/help.txt
new file mode 100644
index 00000000..d1950220
--- /dev/null
+++ b/src/snpeff/help.txt
@@ -0,0 +1,79 @@
+Usage: snpEff [eff] [options] genome_version [input_file]
+
+        variants_file                   : Default is STDIN
+
+Options:
+        -chr <string>                   : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
+        -classic                        : Use old style annotations instead of Sequence Ontology and Hgvs.
+        -csvStats <file>                : Create CSV summary file.
+        -download                       : Download reference genome if not available. Default: true
+        -i <format>                     : Input format [ vcf, bed ]. Default: VCF.
+        -fileList                       : Input actually contains a list of files to process.
+        -o <format>                     : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
+        -s , -stats, -htmlStats         : Create HTML summary file.  Default is 'snpEff_summary.html'
+        -noStats                        : Do not create stats (summary) file      
+
+Results filter options:
+        -fi , -filterInterval  <file>   : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)        
+        -no-downstream                  : Do not show DOWNSTREAM changes
+        -no-intergenic                  : Do not show INTERGENIC changes
+        -no-intron                      : Do not show INTRON changes
+        -no-upstream                    : Do not show UPSTREAM changes
+        -no-utr                         : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
+        -no <effectType>                : Do not show 'EffectType'. This option can be used several times.
+
+Annotations options:
+        -cancer                         : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
+        -cancerSamples <file>           : Two column TXT file defining 'oringinal \t derived' samples.
+        -fastaProt <file>               : Create an output file containing the resulting protein sequences.
+        -formatEff                      : Use 'EFF' field compatible with older versions (instead of 'ANN').
+        -geneId                         : Use gene ID instead of gene name (VCF output). Default: false
+        -hgvs                           : Use HGVS annotations for amino acid sub-field. Default: true
+        -hgvsOld                        : Use old HGVS notation. Default: false   
+        -hgvs1LetterAa                  : Use one letter Amino acid codes in HGVS notation. Default: false
+        -hgvsTrId                       : Use transcript ID in HGVS notation. Default: false
+        -lof                            : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
+        -noHgvs                         : Do not add HGVS annotations.
+        -noLof                          : Do not add LOF and NMD annotations.     
+        -noShiftHgvs                    : Do not shift variants according to HGVS notation (most 3prime end).
+        -oicr                           : Add OICR tag in VCF file. Default: false
+        -sequenceOntology               : Use Sequence Ontology terms. Default: true
+
+Generic options:
+        -c , -config                 : Specify config file
+        -configOption name=value     : Override a config file option
+        -d , -debug                  : Debug mode (very verbose).
+        -dataDir <path>              : Override data_dir parameter from config file.
+        -download                    : Download a SnpEff database, if not available locally. Default: true
+        -nodownload                  : Do not download a SnpEff database, if not available locally.
+        -h , -help                   : Show this help and exit
+        -noLog                       : Do not report usage statistics to server   
+        -q , -quiet                  : Quiet mode (do not show any messages or errors)
+        -v , -verbose                : Verbose mode
+        -version                     : Show version number and exit
+
+Database options:
+        -canon                       : Only use canonical transcripts.
+        -canonList <file>            : Only use canonical transcripts, replace some transcripts using the 'gene_id         transcript_id' entries in <file>.        
+        -tag <tagName>               : Only use transcript having a tag 'tagName'. This option can be used multiple times.
+        -notag <tagName>             : Filter out transcript having a tag 'tagName'. This option can be used multiple times.
+        -interaction                 : Annotate using interactions (requires interaction database). Default: true
+        -interval <file>             : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
+        -maxTSL <TSL_number>         : Only use transcripts having Transcript Support Level lower than <TSL_number>.
+        -motif                       : Annotate using motifs (requires Motif database). Default: true
+        -nextProt                    : Annotate using NextProt (requires NextProt database).
+        -noGenome                    : Do not load any genomic database (e.g. annotate using custom files).
+        -noExpandIUB                 : Disable IUB code expansion in input variants
+        -noInteraction               : Disable inteaction annotations
+        -noMotif                     : Disable motif annotations.
+        -noNextProt                  : Disable NextProt annotations.
+        -onlyReg                     : Only use regulation tracks.
+        -onlyProtein                 : Only use protein coding transcripts. Default: false
+        -onlyTr <file.txt>           : Only use the transcripts in this file. Format: One transcript ID per line.
+        -reg <name>                  : Regulation track to use (this option can be used add several times).
+        -ss , -spliceSiteSize <int>  : Set size for splice sites (donor and acceptor) in bases. Default: 2
+        -spliceRegionExonSize <int>  : Set size for splice site region within exons. Default: 3 bases
+        -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
+        -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
+        -strict                      : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
+        -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)
\ No newline at end of file
diff --git a/src/snpeff/script.sh b/src/snpeff/script.sh
new file mode 100644
index 00000000..bf3914bb
--- /dev/null
+++ b/src/snpeff/script.sh
@@ -0,0 +1,148 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# Unset flags if 'false'
+unset_if_false=(
+    par_classic
+    par_download
+    par_file_list
+    par_stats
+    par_cancer
+    par_format_eff
+    par_gene_id
+    par_hgvs
+    par_hgvs_old
+    par_hgvs1_letter_aa
+    par_hgvs_tr_id
+    par_lof
+    par_oicr
+    par_sequence_ontology
+    par_debug
+    par_quiet
+    par_verbose
+    par_canon
+    par_interaction
+    par_motif
+    par_nextprot
+    par_only_reg
+    par_only_protein
+    par_strict
+    par_no_stats
+    par_no_downstream
+    par_no_intergenic
+    par_no_intron
+    par_no_upstream
+    par_no_utr
+    par_no_hgvs
+    par_no_lof
+    par_no_shift_hgvs
+    par_no_download
+    par_no_log
+    par_no_tag
+    par_no_genome
+    par_no_expand_iub
+    par_no_interaction
+    par_no_motif
+    par_no_nextprot
+)
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}" # contains the value of the 'par'
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+
+# Run SnpEff
+snpEff \
+    ${par_chr:+-chr "$par_chr"} \
+    ${par_classic:+-classic} \
+    ${par_csv_stats:+-csvStats "$par_csv_stats"} \
+    ${par_download:+-download} \
+    ${par_input_format:+-i "$par_input_format"} \
+    ${par_file_list:+-fileList} \
+    ${par_output_format:+-o "$par_output_format"} \
+    ${par_stats:+-stats} \
+    ${par_no_stats:+-noStats} \
+    ${par_fi:+-fi "$par_fi"} \
+    ${par_no_downstream:+-no-downstream} \
+    ${par_no_intergenic:+-no-intergenic} \
+    ${par_no_intron:+-no-intron} \
+    ${par_no_upstream:+-no-upstream} \
+    ${par_no_utr:+-no-utr} \
+    ${par_no:+-no "$par_no"} \
+    ${par_cancer:+-cancer} \
+    ${par_cancer_samples:+-cancerSamples "$par_cancer_samples]"} \
+    ${par_fastaprot:+-fastaProt "$par_fastaprot]"} \
+    ${par_format_eff:+-formatEff} \
+    ${par_gene_id:+-geneId} \
+    ${par_hgvs:+-hgvs} \
+    ${par_hgvs_old:+-hgvsOld} \
+    ${par_hgvs1_letter_aa:+-hgvs1LetterAa} \
+    ${par_hgvs_tr_id:+-hgvsTrId} \
+    ${par_lof:+-lof} \
+    ${par_no_hgvs:+-noHgvs} \
+    ${par_no_lof:+-noLof} \
+    ${par_no_shift_hgvs:+-noShiftHgvs} \
+    ${par_oicr:+-oicr} \
+    ${par_sequence_ontology:+-sequenceOntology} \
+    ${par_config:+-config "$par_config"} \
+    ${par_config_option:+-configOption "$par_config_option"} \
+    ${par_debug:+-debug} \
+    ${par_data_dir:+-dataDir "$par_data_dir"} \
+    ${par_no_download:+-nodownload} \
+    ${par_no_log:+-noLog} \
+    ${par_quiet:+-quiet} \
+    ${par_verbose:+-verbose} \
+    ${par_canon:+-canon} \
+    ${par_canon_list:+-canonList "$par_canon_list"} \
+    ${par_tag:+-tag "$par_tag"} \
+    ${par_no_tag:+-notag} \
+    ${par_interaction:+-interaction} \
+    ${par_interval:+-interval "$par_interval"} \
+    ${par_max_tsl:+-maxTSL "$par_max_tsl"} \
+    ${par_motif:+-motif} \
+    ${par_nextprot:+-nextProt} \
+    ${par_no_genome:+-noGenome} \
+    ${par_no_expand_iub:+-noExpandIUB} \
+    ${par_no_interaction:+-noInteraction} \
+    ${par_no_motif:+-noMotif} \
+    ${par_no_nextprot:+-noNextProt} \
+    ${par_only_reg:+-onlyReg} \
+    ${par_only_protein:+-onlyProtein} \
+    ${par_only_tr:+-onlyTr "$par_onlyTr"} \
+    ${par_reg:+-reg "$par_reg"} \
+    ${par_ss:+-ss "$par_ss"} \
+    ${par_splice_region_exon_size:+-spliceRegionExonSize "$par_splice_region_exon_size"} \
+    ${par_splice_region_intron_min:+-spliceRegionIntronMin "$par_splice_region_intron_min"} \
+    ${par_splice_region_intron_max:+-spliceRegionIntronMax "$par_splice_region_intron_max"} \
+    ${par_strict:+-strict} \
+    ${par_ud:+-ud "$par_ud"} \
+    "$par_genome_version" \
+    "$par_input" \
+    > "$par_output"
+
+# Path of the output file (par_output)
+absolute_path=$(realpath "$par_output")
+directory_path=$(dirname "$absolute_path")
+
+# Move the automatically generated outputs to their locations
+if [ -z "$par_no_stats" ]; then
+    if [ ! -z "$par_summary" ]; then
+        mv -n snpEff_summary.html "$par_summary"
+    else
+        mv -n snpEff_summary.html "$directory_path"
+    fi
+fi 
+
+if [ -z "$par_no_stats" ]; then
+    if [ ! -z "$par_genes" ]; then
+        mv -n snpEff_genes.txt "$par_genes"
+    else
+        mv -n snpEff_genes.txt "$directory_path"
+    fi
+fi
+
+exit 0
diff --git a/src/snpeff/test.sh b/src/snpeff/test.sh
new file mode 100644
index 00000000..d8c72c20
--- /dev/null
+++ b/src/snpeff/test.sh
@@ -0,0 +1,129 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+###########################################################################
+
+# Test 1: Run SnpEff with only required parameters
+
+mkdir test1
+pushd test1 > /dev/null # cd test1 (stack)
+
+echo "> Run Test 1: required parameters"
+"$meta_executable" \
+  --genome_version GRCh37.75 \
+  --input "$meta_resources_dir/test_data/cancer.vcf" \
+  --output out.vcf
+
+# Check if output files are generated
+output_files=("out.vcf" "snpEff_genes.txt" "snpEff_summary.html")
+
+# Check if any of the files do not exist
+for file in "${output_files[@]}"; do
+    if [ ! -e "$file" ]; then
+        echo "File $file does not exist."
+    fi
+done
+
+# Check if files are empty
+for file in "${output_files[@]}"; do
+    if [ ! -s "$file" ]; then
+        echo "File $file is empty."
+    fi
+done
+
+popd > /dev/null # Remove directory from stack (LIFO)
+
+echo "Test 1 succeeded."
+
+###########################################################################
+
+# Test 2: Run SnpEff with a different input + options
+
+mkdir test2
+pushd test2 > /dev/null
+
+echo "> Run Test 2: different input + options"
+"$meta_executable" \
+  --genome_version GRCh37.75 \
+  --input "$meta_resources_dir/test_data/test.vcf" \
+  --interval "$meta_resources_dir/test_data/my_annotations.bed" \
+  --no_stats \
+  --output output.vcf
+
+# Check if output.vcf exists
+if [ ! -e "output.vcf" ]; then
+    echo "File output.vcf does not exist."
+fi
+
+# These files should not exist
+files=("snpEff_genes.txt" "snpEff_summary.html")
+for file in "${files[@]}"; do
+    if [ -e "$file" ]; then
+        echo "Error: File $file exists."
+    fi
+done
+
+# Check if output.vcf is empty
+if [ ! -s "output.vcf" ]; then
+    echo "File output.vcf is empty."
+fi
+
+popd > /dev/null
+
+echo "Test 2 succeeded."
+
+###########################################################################
+
+# Test 3: Move the output files to other locations
+
+mkdir test3
+pushd test3 > /dev/null
+
+mkdir temp
+
+echo "> Run Test 3: move output files"
+"$meta_executable" \
+  --genome_version GRCh37.75 \
+  --input "$meta_resources_dir/test_data/test.vcf" \
+  --output output.vcf \
+  --summary temp \
+  --genes temp
+
+# Check if output.vcf exists
+if [ ! -e "output.vcf" ]; then
+    echo "File output.vcf does not exist."
+fi
+
+# Check if the other output files have been moved to temp folder
+output_files=("snpEff_genes.txt" "snpEff_summary.html")
+
+# Check if any of the files do not exist
+for file in "${output_files[@]}"; do
+    if [ ! -e "temp/$file" ]; then
+        echo "File $file does not exist in 'temp' folder."
+    fi
+done
+
+# Check if output.vcf is empty
+if [ ! -s "output.vcf" ]; then
+    echo "File output.vcf is empty."
+fi
+
+# Check if the other output files in temp folder are empty
+for file in "${output_files[@]}"; do
+    if [ ! -s "temp/$file" ]; then
+        echo "File $file is empty."
+    fi
+done
+
+popd > /dev/null
+
+echo "Test 3 succeeded."
+
+###########################################################################
+
+echo "All tests successfully completed!"
\ No newline at end of file
diff --git a/src/snpeff/test_data/cancer.vcf b/src/snpeff/test_data/cancer.vcf
new file mode 100644
index 00000000..f37ad8c3
--- /dev/null
+++ b/src/snpeff/test_data/cancer.vcf
@@ -0,0 +1,2 @@
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Patient_01_Germline	Patient_01_Somatic
+1	69091	.	A	C,G	.	PASS	AC=1	GT	1/0	2/1
diff --git a/src/snpeff/test_data/my_annotations.bed b/src/snpeff/test_data/my_annotations.bed
new file mode 100644
index 00000000..a5247f97
--- /dev/null
+++ b/src/snpeff/test_data/my_annotations.bed
@@ -0,0 +1 @@
+1	10000	20000	MY_ANNOTATION
diff --git a/src/snpeff/test_data/script.sh b/src/snpeff/test_data/script.sh
new file mode 100644
index 00000000..a47ec136
--- /dev/null
+++ b/src/snpeff/test_data/script.sh
@@ -0,0 +1,15 @@
+# Test files from SnpEff examples
+if [ ! -f snpEff_latest_core.zip ]; then
+    wget https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
+fi
+
+if [ ! -d snpEff ]; then
+    unzip snpEff_latest_core.zip
+fi
+
+mv snpEff/examples/test.vcf src/snpeff/test_data/
+mv snpEff/examples/cancer.vcf src/snpeff/test_data/
+mv snpEff/examples/my_annotations.bed src/snpeff/test_data/
+
+rm -rf snpEff_latest_core.zip
+rm -rf snpEff
\ No newline at end of file
diff --git a/src/snpeff/test_data/test.vcf b/src/snpeff/test_data/test.vcf
new file mode 100644
index 00000000..d552ef18
--- /dev/null
+++ b/src/snpeff/test_data/test.vcf
@@ -0,0 +1 @@
+1	10469	.	C	G	365.78	PASS	AC=30;AF=0.0732

From 7fb67a98539868b9af788338fb5f46d34ab742f7 Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Fri, 18 Oct 2024 11:15:20 +0200
Subject: [PATCH 30/42] Add bbmap_bbsplit (#138)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* initial commit, complete config file, add test data

* complete config file, adjusted script and tests, not functional

* update changelog, hep.txt, functional test, large test data

* smaller test data

* remove test resource from config

* modify paths in test script

* Arguments closer to original tool's

* Extra arg to allow use of bbmap args
---
 CHANGELOG.md                      |   3 +
 src/bbmap_bbsplit/config.vsh.yaml | 162 ++++++++++++++++++++++++++++++
 src/bbmap_bbsplit/help.txt        |  83 +++++++++++++++
 src/bbmap_bbsplit/script.sh       |  91 +++++++++++++++++
 src/bbmap_bbsplit/test.sh         | 145 ++++++++++++++++++++++++++
 5 files changed, 484 insertions(+)
 create mode 100644 src/bbmap_bbsplit/config.vsh.yaml
 create mode 100644 src/bbmap_bbsplit/help.txt
 create mode 100755 src/bbmap_bbsplit/script.sh
 create mode 100644 src/bbmap_bbsplit/test.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 47c786c6..16e79693 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -184,6 +184,9 @@
 * `bedtools`:
     - `bedtools_getfasta`: extract sequences from a FASTA file for each of the
                            intervals defined in a BED/GFF/VCF file (PR #59).
+                           
+* `bbmap`:
+    - `bbmap_bbsplit`: Split sequencing reads by mapping them to multiple references simultaneously (PR #138).
 
 
diff --git a/src/bbmap_bbsplit/config.vsh.yaml b/src/bbmap_bbsplit/config.vsh.yaml
new file mode 100644
index 00000000..61336b35
--- /dev/null
+++ b/src/bbmap_bbsplit/config.vsh.yaml
@@ -0,0 +1,162 @@
+namespace: "bbmap"
+name: "bbmap_bbsplit"
+description: Split sequencing reads by mapping them to multiple references simultaneously.
+links:
+  homepage: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/
+  documentation: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmap-guide/
+  repository: https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh
+
+license: BBTools Copyright (c) 2014
+
+argument_groups:
+- name: "Input"
+  arguments:
+  - name: "--id"
+    type: string
+    description: Sample ID
+  - name: "--paired"
+    type: boolean_true
+    description: Paired fastq files or not?
+  - name: "--input"
+    type: file
+    multiple: true
+    description: Input fastq files, either one or two (paired), separated by ";".
+    example: reads.fastq
+  - name: "--ref"
+    type: file
+    multiple: true
+    description: Reference FASTA files, separated by ";". The primary reference should be specified first.
+  - name: "--only_build_index"
+    type: boolean_true
+    description: If set, only builds the index. Otherwise, mapping is performed.
+  - name: "--build"
+    type: string
+    description: |
+      Designate index to use. Corresponds to the number specified when building the index.
+      If building the index, this will be the build's id. If multiple references are indexed
+      in the same directory, each needs a unique build ID. Default: 1.
+    example: "1"
+  - name: "--qin"
+    type: string
+    description: |
+      Set to 33 or 64 to specify input quality value ASCII offset. Automatically detected if
+      not specified.
+  - name: "--interleaved"
+    type: boolean_true
+    description: |
+      True forces paired/interleaved input; false forces single-ended mapping.
+      If not specified, interleaved status will be autodetected from read names.
+  - name: "--maxindel"
+    type: integer
+    description: |
+      Don't look for indels longer than this. Lower is faster. Set to >=100k for RNA-seq.
+    example: 20
+  - name: "--minratio"
+    type: double
+    description: |
+      Fraction of max alignment score required to keep a site. Higher is faster.
+    example: 0.56
+  - name: "--minhits"
+    type: integer
+    description: |
+      Minimum number of seed hits required for candidate sites. Higher is faster.
+    example: 1
+  - name: "--ambiguous"
+    type: string
+    description: |
+      Set behavior on ambiguously-mapped reads (with multiple top-scoring mapping locations).
+        * best    Use the first best site (Default)
+        * toss    Consider unmapped
+        * random  Select one top-scoring site randomly
+        * all     Retain all top-scoring sites.  Does not work yet with SAM output
+    choices: [best, toss, random, all]
+    example: best
+  - name: "--ambiguous2"
+    type: string
+    description: |
+      Set behavior only for reads that map ambiguously to multiple different references.
+      Normal 'ambiguous=' controls behavior on all ambiguous reads;
+      Ambiguous2 excludes reads that map ambiguously within a single reference.
+        * best    Use the first best site (Default)
+        * toss    Consider unmapped
+        * all     Write a copy to the output for each reference to which it maps
+        * split   Write a copy to the AMBIGUOUS_ output for each reference to which it maps
+    choices: [best, toss, all, split]
+    example: best
+  - name: "--qtrim"
+    type: string
+    description: |
+      Quality-trim ends to Q5 before mapping. Options are 'l' (left), 'r' (right), and 'lr' (both).
+    choices: [l, r, lr]
+  - name: "--untrim"
+    type: boolean_true
+    description: Undo trimming after mapping. Untrimmed bases will be soft-clipped in cigar strings.
+
+
+- name: "Output"
+  arguments:
+  - name: "--fastq_1"
+    type: file
+    description: |
+      Output file for read 1.
+    direction: output
+    example: read_out1.fastq
+  - name: "--fastq_2"
+    type: file
+    description: |
+      Output file for read 2.
+    direction: output
+    example: read_out2.fastq
+  - name: "--sam2bam"
+    alternatives: ["--bs"]
+    type: file
+    description: |
+      Write a shell script to 'file' that will turn the sam output into a sorted, indexed bam file.
+    direction: output
+    example: script.sh
+  - name: "--scafstats"
+    type: file
+    description: |
+      Write statistics on how many reads mapped to which scaffold to this file.
+    direction: output
+    example: scaffold_stats.txt
+  - name: "--refstats"
+    type: file
+    description: |
+      Write statistics on how many reads were assigned to which reference to this file.
+      Unmapped reads whose mate mapped to a reference are considered assigned and will be counted.
+    direction: output
+    example: reference_stats.txt
+  - name: "--nzo"
+    type: boolean_true
+    description: Only print lines with nonzero coverage.
+  - name: "--bbmap_args"
+    type: string
+    description: |
+      Additional arguments from BBMap to pass to BBSplit.
+    
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  
+engines:
+- type: docker
+  image: ubuntu:22.04
+  setup:
+    - type: docker
+      run: | 
+        apt-get update && \
+        apt-get install -y build-essential openjdk-17-jdk wget tar && \
+        wget --no-check-certificate https://sourceforge.net/projects/bbmap/files/BBMap_39.01.tar.gz && \
+        tar xzf BBMap_39.01.tar.gz && \
+        cp -r bbmap/* /usr/local/bin
+    - type: docker
+      run: |
+        bbsplit.sh --version 2>&1 | awk '/BBMap version/{print "BBMAP:", $NF}' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
diff --git a/src/bbmap_bbsplit/help.txt b/src/bbmap_bbsplit/help.txt
new file mode 100644
index 00000000..56544a34
--- /dev/null
+++ b/src/bbmap_bbsplit/help.txt
@@ -0,0 +1,83 @@
+```
+bbsplit.sh
+```
+
+BBSplit
+Written by Brian Bushnell, from Dec. 2010 - present
+Last modified June 11, 2018
+
+Description:  Maps reads to multiple references simultaneously.
+Outputs reads to a file for the reference they best match, with multiple options for dealing with ambiguous mappings.
+
+To index:     bbsplit.sh build=<1> ref_x=<reference fasta> ref_y=<another reference fasta>
+To map:       bbsplit.sh build=<1> in=<reads> out_x=<output file> out_y=<another output file>
+
+To be concise, and do everything in one command:
+bbsplit.sh ref=x.fa,y.fa in=reads.fq basename=o%.fq
+
+that is equivalent to
+bbsplit.sh build=1 in=reads.fq ref_x=x.fa ref_y=y.fa out_x=ox.fq out_y=oy.fq
+
+By default paired reads will yield interleaved output, but you can use the # symbol to produce twin output files.
+For example, basename=o%_#.fq will produce ox_1.fq, ox_2.fq, oy_1.fq, and oy_2.fq.
+
+     
+Indexing Parameters (required when building the index):
+ref=<file,file>     A list of references, or directories containing fasta files.
+ref_<name>=<ref.fa> Alternate, longer way to specify references. e.g., ref_ecoli=ecoli.fa
+                    These can also be comma-delimited lists of files; e.g., ref_a=a1.fa,a2.fa,a3.fa
+build=<1>           If multiple references are indexed in the same directory, each needs a unique build ID.
+path=<.>            Specify the location to write the index, if you don't want it in the current working directory.
+
+Input Parameters:
+build=<1>           Designate index to use.  Corresponds to the number specified when building the index.
+in=<reads.fq>       Primary reads input; required parameter.
+in2=<reads2.fq>     For paired reads in two files.
+qin=<auto>          Set to 33 or 64 to specify input quality value ASCII offset.
+interleaved=<auto>  True forces paired/interleaved input; false forces single-ended mapping.
+                    If not specified, interleaved status will be autodetected from read names.
+
+Mapping Parameters:
+maxindel=<20>       Don't look for indels longer than this.  Lower is faster.  Set to >=100k for RNA-seq.
+minratio=<0.56>     Fraction of max alignment score required to keep a site.  Higher is faster.
+minhits=<1>         Minimum number of seed hits required for candidate sites.  Higher is faster.
+ambiguous=<best>    Set behavior on ambiguously-mapped reads (with multiple top-scoring mapping locations).
+                       best   (use the first best site)
+                       toss   (consider unmapped)
+                       random   (select one top-scoring site randomly)
+                       all   (retain all top-scoring sites.  Does not work yet with SAM output)
+ambiguous2=<best>   Set behavior only for reads that map ambiguously to multiple different references.
+                    Normal 'ambiguous=' controls behavior on all ambiguous reads;
+                    Ambiguous2 excludes reads that map ambiguously within a single reference.
+                       best   (use the first best site)
+                       toss   (consider unmapped)
+                       all   (write a copy to the output for each reference to which it maps)
+                       split   (write a copy to the AMBIGUOUS_ output for each reference to which it maps)
+qtrim=<true>        Quality-trim ends to Q5 before mapping.  Options are 'l' (left), 'r' (right), and 'lr' (both).
+untrim=<true>       Undo trimming after mapping.  Untrimmed bases will be soft-clipped in cigar strings.
+
+Output Parameters:
+out_<name>=<file>   Output reads that map to the reference <name> to <file>.
+basename=prefix%suffix     Equivalent to multiple out_%=prefix%suffix expressions, in which each % is replaced by the name of a reference file.
+bs=<file>           Write a shell script to 'file' that will turn the sam output into a sorted, indexed bam file.
+scafstats=<file>    Write statistics on how many reads mapped to which scaffold to this file.
+refstats=<file>     Write statistics on how many reads were assigned to which reference to this file.
+                    Unmapped reads whose mate mapped to a reference are considered assigned and will be counted.
+nzo=t               Only print lines with nonzero coverage.
+
+***** Notes *****
+Almost all BBMap parameters can be used; run bbmap.sh for more details.
+Exceptions include the 'nodisk' flag, which BBSplit does not support.
+BBSplit is recommended for fastq and fasta output, not for sam/bam output.
+When the reference sequences are shorter than read length, use Seal instead of BBSplit.
+
+Java Parameters:
+-Xmx                This will set Java's memory usage, overriding autodetection.
+                    -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs.
+                    The max is typically 85% of physical memory.
+-eoom               This flag will cause the process to exit if an
+                    out-of-memory exception occurs.  Requires Java 8u92+.
+-da                 Disable assertions.
+
+This list is not complete.  For more information, please consult /readme.txt
+Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
\ No newline at end of file
diff --git a/src/bbmap_bbsplit/script.sh b/src/bbmap_bbsplit/script.sh
new file mode 100755
index 00000000..ac8542c9
--- /dev/null
+++ b/src/bbmap_bbsplit/script.sh
@@ -0,0 +1,91 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+function clean_up {
+    rm -rf "$tmpdir"
+}
+trap clean_up EXIT 
+
+unset_if_false=( par_paired par_only_build_index par_interleaved par_untrim par_nzo)
+
+for var in "${unset_if_false[@]}"; do
+    if [ -z "${!var}" ]; then
+        unset $var
+    fi
+done
+
+if [ ! -d "$par_build" ]; then
+    IFS=";" read -ra ref_files <<< "$par_ref"
+    primary_ref="${ref_files[0]}"
+    refs=()
+    for file in "${ref_files[@]:1}"
+    do
+        name=$(basename "$file" | sed 's/\.[^.]*$//')
+        refs+=("ref_$name=$file")
+    done
+fi
+
+if $par_only_build_index; then
+    if [ ${#refs[@]} -gt 1 ]; then
+        bbsplit.sh \
+            --ref_primary="$primary_ref" \
+            "${refs[@]}" \
+            path=$par_build
+    else
+        echo "ERROR: Please specify at least two reference fasta files."
+    fi
+else
+    IFS=";" read -ra input <<< "$par_input"
+    tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
+    index_files=''
+    if [ -d "$par_build" ]; then
+        index_files="path=$par_build"
+    elif [ ${#refs[@]} -gt 0 ]; then
+        index_files="--ref_primary=$primary_ref ${refs[*]}"
+    else
+        echo "ERROR: Please either specify a BBSplit index as input or at least two reference fasta files."
+    fi
+
+    extra_args=""
+    if [ -n "$par_refstats" ]; then extra_args+=" --refstats $par_refstats"; fi
+    if [ -n "$par_ambiguous" ]; then extra_args+=" --ambiguous $par_ambiguous"; fi
+    if [ -n "$par_ambiguous2" ]; then extra_args+=" --ambiguous2 $par_ambiguous2"; fi
+    if [ -n "$par_minratio" ]; then extra_args+=" --minratio $par_minratio"; fi
+    if [ -n "$par_minhits" ]; then extra_args+=" --minhits $par_minhits"; fi
+    if [ -n "$par_maxindel" ]; then extra_args+=" --maxindel $par_maxindel"; fi
+    if [ -n "$par_qin" ]; then extra_args+=" --qin $par_qin"; fi
+    if [ -n "$par_qtrim" ]; then extra_args+=" --qtrim $par_qtrim"; fi
+    if [ "$par_interleaved" = true ]; then extra_args+=" --interleaved"; fi
+    if [ "$par_untrim" = true ]; then extra_args+=" --untrim"; fi
+    if [ "$par_nzo" = true ]; then extra_args+=" --nzo"; fi
+
+    if [ -n "$par_bbmap_args" ]; then extra_args+=" $par_bbmap_args"; fi
+
+    
+    if $par_paired; then
+        bbsplit.sh \
+            $index_files \
+            in=${input[0]} \
+            in2=${input[1]} \
+            basename=${tmpdir}/%_#.fastq \
+            $extra_args
+        read1=$(find $tmpdir/ -iname primary_1*)
+        read2=$(find $tmpdir/ -iname primary_2*)
+        cp $read1 $par_fastq_1
+        cp $read2 $par_fastq_2
+    else
+        bbsplit.sh \
+            $index_files \
+            in=${input[0]} \
+            basename=${tmpdir}/%.fastq \
+            $extra_args
+        read1=$(find $tmpdir/ -iname primary*)
+        cp $read1 $par_fastq_1
+    fi
+fi
+
+exit 0
diff --git a/src/bbmap_bbsplit/test.sh b/src/bbmap_bbsplit/test.sh
new file mode 100644
index 00000000..1ad7aac2
--- /dev/null
+++ b/src/bbmap_bbsplit/test.sh
@@ -0,0 +1,145 @@
+#!/bin/bash
+
+echo ">>> Test $meta_functionality_name"
+
+echo "> Prepare test data"
+
+cat > reads_R1.fastq <<'EOF'
+@SEQ_ID1
+ACAGGGTTTCACCATGTTGGCCAGG
++
+IIIIIIIIIIIIIIIIIIIIIIIII
+@SEQ_ID2
+TCCCAGGTAACAAACCAACCAACTT
++
+!!!!!!!!!!!!!!!!!!!!!!!!!
+EOF
+
+cat > reads_R2.fastq <<'EOF'
+@SEQ_ID1
+TACCATTACCCTACCATCCACCATG
++
+IIIIIIIIIIIIIIIIIIIIIIIII
+@SEQ_ID2
+CACTCGGCTGCATGCTTAGTGCACT
++
+!!!!!!!!!!!!!!!!!!!!!!!!!
+EOF
+
+cat > genome.fasta <<'EOF'
+>I
+AGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTTGATCTCCTGACCTCAGGTGATCCATCCGCCT
+TGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCACCTGGCCTGGTTTCGAACTCTTGACCTCAGGTGGTCTG
+CCCATCTTGACCTTCCAAAGTGCTGGAGCTACAGGCATGAGCCACTGCACCTGGTGCTTTTGGTAAAAGCAACCTGGAAT
+CAAATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTT
+TAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGAC
+EOF
+
+cat > human.fa <<'EOF'
+>human
+AGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTTGATCTCCTGACCTCAGGTGATCCATCCGCCT
+TGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCACCTGGCCTGGTTTCGAACTCTTGACCTCAGGTGGTCTG
+CCCATCTTGACCTTCCAAAGTGCTGGAGCTACAGGCATGAGCCACTGCACCTGGTGCTTTTGGTAAAAGCAACCTGGAAT
+EOF
+
+cat > sarscov2.fa <<'EOF'
+>sarscov2
+ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAA
+AATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGG
+ACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
+EOF
+
+####################################################################################################
+
+echo ">>> Building BBSplit index"
+"${meta_executable}" \
+  --ref "genome.fasta;human.fa;sarscov2.fa" \
+  --only_build_index \
+  --build "BBSplit_index" 
+
+echo ">>> Check whether output exists"
+[ ! -d "BBSplit_index" ] && echo "BBSplit index does not exist!" && exit 1
+[ -z "$(ls -A 'BBSplit_index')" ] && echo "BBSplit index is empty!" && exit 1
+
+####################################################################################################
+
+
+echo ">>> Testing with single-end reads and primary/non-primary FASTA files"
+"${meta_executable}" \
+  --input "reads_R1.fastq" \
+  --ref "genome.fasta;human.fa;sarscov2.fa" \
+  --fastq_1 "filtered_reads_R1.fastq"
+
+echo ">>> Check whether output exists"
+[ ! -f "filtered_reads_R1.fastq" ] && echo "Filtered reads file does not exist!" && exit 1
+[ ! -s "filtered_reads_R1.fastq" ] && echo "Filtered reads file is empty!" && exit 1
+
+echo ">>> Check whether output is correct"
+grep -q "ACAGGGTTTCACCATGTTGGCCAGG" filtered_reads_R1.fastq || { echo "Filtered reads file does not contain expected sequence!"; exit 1; }
+
+rm filtered_reads_R1.fastq
+
+####################################################################################################
+
+echo ">>> Testing with paired-end reads and primary/non-primary FASTA files"
+"${meta_executable}" \
+  --paired \
+  --input "reads_R1.fastq;reads_R2.fastq" \
+  --ref "genome.fasta;human.fa;sarscov2.fa" \
+  --fastq_1 "filtered_reads_R1.fastq" \
+  --fastq_2 "filtered_reads_R2.fastq"
+
+echo ">>> Check whether output exists"
+[ ! -f "filtered_reads_R1.fastq" ] && echo "Filtered read 1 file does not exist!" && exit 1
+[ ! -s "filtered_reads_R1.fastq" ] && echo "Filtered read 1 file is empty!" && exit 1
+[ ! -f "filtered_reads_R2.fastq" ] && echo "Filtered read 2 file does not exist!" && exit 1
+[ ! -s "filtered_reads_R2.fastq" ] && echo "Filtered read 2 file is empty!" && exit 1
+
+echo ">>> Check whether output is correct"
+grep -q "ACAGGGTTTCACCATGTTGGCCAGG" filtered_reads_R1.fastq || { echo "Filtered read 1 file does not contain expected sequence!"; exit 1; }
+grep -q "TACCATTACCCTACCATCCACCATG" filtered_reads_R2.fastq || { echo "Filtered read 2 file does not contain expected sequence!"; exit 1; }
+
+rm filtered_reads_R1.fastq filtered_reads_R2.fastq
+
+####################################################################################################
+
+echo ">>> Testing with single-end reads and BBSplit index"
+"${meta_executable}" \
+  --input "reads_R1.fastq" \
+  --build "BBSplit_index" \
+  --fastq_1 "filtered_reads_R1.fastq"
+
+echo ">>> Check whether output exists"
+[ ! -f "filtered_reads_R1.fastq" ] && echo "Filtered reads file does not exist!" && exit 1
+[ ! -s "filtered_reads_R1.fastq" ] && echo "Filtered reads file is empty!" && exit 1
+
+echo ">>> Check whether output is correct"
+grep -q "ACAGGGTTTCACCATGTTGGCCAGG" filtered_reads_R1.fastq || { echo "Filtered reads file does not contain expected sequence!"; exit 1; }
+
+rm filtered_reads_R1.fastq
+
+####################################################################################################
+
+echo ">>> Testing with paired-end reads and BBSplit index"
+"${meta_executable}" \
+  --paired \
+  --input "reads_R1.fastq;reads_R2.fastq" \
+  --build "BBSplit_index" \
+  --fastq_1 "filtered_reads_R1.fastq" \
+  --fastq_2 "filtered_reads_R2.fastq"
+
+echo ">>> Check whether output exists"
+[ ! -f "filtered_reads_R1.fastq" ] && echo "Filtered read 1 file does not exist!" && exit 1
+[ ! -s "filtered_reads_R1.fastq" ] && echo "Filtered read 1 file is empty!" && exit 1
+[ ! -f "filtered_reads_R2.fastq" ] && echo "Filtered read 2 file does not exist!" && exit 1
+[ ! -s "filtered_reads_R2.fastq" ] && echo "Filtered read 2 file is empty!" && exit 1
+
+
+echo ">>> Check whether output is correct"
+grep -q "ACAGGGTTTCACCATGTTGGCCAGG" filtered_reads_R1.fastq || { echo "Filtered read 1 file does not contain expected sequence!"; exit 1; }
+grep -q "TACCATTACCCTACCATCCACCATG" filtered_reads_R2.fastq || { echo "Filtered read 2 file does not contain expected sequence!"; exit 1; }
+
+rm filtered_reads_R1.fastq filtered_reads_R2.fastq
+
+echo "All tests succeeded!"
+exit 0
\ No newline at end of file

From 6e6b13939c9d719f1cd7ff5a91a6562e0a6e2e29 Mon Sep 17 00:00:00 2001
From: Suman Muralidharan <104161349+sumanm99@users.noreply.github.com>
Date: Sat, 26 Oct 2024 15:23:03 +0530
Subject: [PATCH 31/42] nanoplot (#95)

* nanoplot

* test_data

* reinitiate

* gitignore

* namespace

* Testing NanoPlot in CLI

* NanoPlot complete

* Updated docker engine

* Docker

* Delete taget directory

* Deleted

* Input file

* fastq with more reads

* Delete config.vsh.yaml

* Pull request changes

* Delete var directory

* Config arguments complete

* Update help.txt

* Update config file

* Test files

* runners script

* gitignore default

* Move output

* Delete output directory

* Runners script complete

* Test script

* default output

* test data

* params passed correctly

* outdir

* test script

* input files

* all test files

* test data < 100 KB

* test script update

* Update CHANGELOG.md

* Update CHANGELOG.md

* Test cases in directories

* rm .gz .pickle .feather files

* reduce test input size

* Multiple separator ";" and check there is only one input file

---------

Co-authored-by: jakubmajercik <jakub.majercik@gmail.com>
Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 CHANGELOG.md                           |   4 +-
 src/nanoplot/config.vsh.yaml           | 230 +++++++++++
 src/nanoplot/help.txt                  |  96 +++++
 src/nanoplot/script.sh                 | 129 ++++++
 src/nanoplot/test.sh                   | 549 +++++++++++++++++++++++++
 src/nanoplot/test_data/script.sh       | 102 +++++
 src/nanoplot/test_data/summary.txt     |  51 +++
 src/nanoplot/test_data/test.bam        | Bin 0 -> 2752 bytes
 src/nanoplot/test_data/test.bam.bai    | Bin 0 -> 96 bytes
 src/nanoplot/test_data/test.fasta      |  35 ++
 src/nanoplot/test_data/test1.fastq     |  49 +++
 src/nanoplot/test_data/test2.fastq     |  34 ++
 src/nanoplot/test_data/test_rich.fastq |  40 ++
 13 files changed, 1317 insertions(+), 2 deletions(-)
 create mode 100644 src/nanoplot/config.vsh.yaml
 create mode 100644 src/nanoplot/help.txt
 create mode 100644 src/nanoplot/script.sh
 create mode 100644 src/nanoplot/test.sh
 create mode 100644 src/nanoplot/test_data/script.sh
 create mode 100644 src/nanoplot/test_data/summary.txt
 create mode 100644 src/nanoplot/test_data/test.bam
 create mode 100644 src/nanoplot/test_data/test.bam.bai
 create mode 100644 src/nanoplot/test_data/test.fasta
 create mode 100644 src/nanoplot/test_data/test1.fastq
 create mode 100644 src/nanoplot/test_data/test2.fastq
 create mode 100644 src/nanoplot/test_data/test_rich.fastq

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 16e79693..9e59f784 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,8 @@
 
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
+* `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
+
 ## BREAKING CHANGES
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
@@ -189,8 +191,6 @@
     - `bbmap_bbsplit`: Split sequencing reads by mapping them to multiple references simultaneously (PR #138).
 
 
-
-
 ## MINOR CHANGES
 
 * Uniformize component metadata (PR #23).
diff --git a/src/nanoplot/config.vsh.yaml b/src/nanoplot/config.vsh.yaml
new file mode 100644
index 00000000..1c22775f
--- /dev/null
+++ b/src/nanoplot/config.vsh.yaml
@@ -0,0 +1,230 @@
+name: nanoplot
+description: |
+  Run NanoPlot on nanopore-sequenced reads.
+  NanoPlot is a plotting tool for long read sequencing data and alignments.
+keywords: ["fastq", "sequencing summary", "nanopore"]
+links:
+  repository: https://github.com/wdecoster/NanoPlot
+  homepage: http://nanoplot.bioinf.be/
+  documentation: https://github.com/wdecoster/NanoPlot
+references:
+  doi: 10.1093/bioinformatics/btad311
+license: MIT
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --fastq
+        type: file
+        description: Input fastq file(s), separated by ";".
+        example: read.fq
+        direction: input
+        multiple: true
+      - name: --fasta
+        type: file
+        description: Input fasta file(s), separated by ";".
+        example: read.fa
+        direction: input
+        multiple: true
+      - name: --fastq_rich
+        type: file
+        description: |
+          Input fastq file(s) generated by albacore or 
+          MinKNOW with additional information concerning channel and time, separated by ";".
+        example: read.fq
+        direction: input
+        multiple: true
+      - name: --fastq_minimal
+        type: file
+        description: |
+          Input fastq file(s) generated by albacore or MinKNOW with
+          additional information concerning channel and time. Minimal data is extracted
+          swiftly without elaborate checks. Separated by ";".
+        example: read.fq
+        direction: input
+        multiple: true
+      - name: --summary
+        type: file
+        description: |
+          Input summary file(s) generated by albacore or guppy, separated by ";".
+        example: read.txt
+        direction: input
+        multiple: true
+      - name: --bam
+        type: file
+        description: Input sorted bam file(s), separated by ";".
+        example: read.bam
+        direction: input
+        multiple: true
+      - name: --ubam
+        type: file
+        description: Input unmapped bam file(s), separated by ";".
+        example: read.ubam
+        direction: input
+        multiple: true
+      - name: --cram
+        type: file
+        description: Input sorted cram file(s), separated by ";".
+        example: read.cram
+        direction: input
+        multiple: true
+      - name: --pickle
+        type: file
+        description: Input pickle file stored earlier, separated by ";".
+        example: read.pkl
+        direction: input
+        multiple: true
+      - name: --feather
+        alternatives: [--arrow]
+        type: file
+        description: Input feather file(s), separated by ";".
+        example: read.arrow
+        direction: input
+        multiple: true
+  - name: Outputs
+    arguments:
+      - name: --outdir
+        alternatives: [-o]
+        type: file
+        direction: output
+        description: Specify directory in which output has to be created.
+        required: true
+  - name: Options
+    arguments:
+      - name: --verbose
+        type: boolean_true
+        description: Write log messages also to terminal
+      - name: --store
+        type: boolean_true
+        description: Store the extracted data in a pickle file for future plotting.
+      - name: --raw
+        type: boolean_true
+        description: Store the extracted data in tab separated file.
+      - name: --huge
+        type: boolean_true
+        description: Input data is one very large file.
+      - name: --no_static
+        type: boolean_false
+        description: Do not make static (png) plots.
+      - name: --prefix
+        alternatives: [-p]
+        type: string
+        description: Specify an optional prefix to be used for the output files.
+      - name: --tsv_stats
+        type: boolean_true
+        description: Output the stats file as a properly formatted TSV.
+      - name: --only_report
+        type: boolean_true
+        description: Output only the report.
+      - name: --info_in_report
+        type: boolean_true
+        description: Add NanoPlot run info in the report.
+  - name: Filtering or transforming input
+    arguments:
+      - name: --maxlength
+        type: integer
+        description: Drop reads longer than length specified.
+      - name: --minlength
+        type: integer
+        description: Drop reads shorter than length specified.
+      - name: --drop_outliers
+        type: boolean_false
+        description: Drop outlier reads with extreme long length.
+      - name: --downsample
+        type: integer
+        description: Reduce dataset to N reads by random sampling.
+      - name: --loglength
+        type: boolean_true
+        description: Logarithmic scaling of lengths in plots.
+      - name: --percentqual
+        type: boolean_true
+        description: Use qualities as theoretical percent identities.
+      - name: --alength
+        type: boolean_true
+        description: Use aligned read lengths rather than sequenced length (bam mode). 
+      - name: --minqual
+        type: integer
+        description: Drop reads with an average quality lower than specified.
+      - name: --runtime_until
+        type: integer
+        description: Only take the N first hours of a run.
+      - name: --readtype
+        type: string
+        description: |
+          Which read type to extract information about from summary.
+          Options are 1D, 2D, 1D2
+      - name: --barcoded
+        type: boolean_true
+        description: Use if you want to split the summary file by barcode.
+      - name: --no_supplementary
+        type: boolean_false
+        description: Use if you want to remove supplementary alignments.
+  - name: Customizing plots
+    arguments:
+      - name: --color
+        alternatives: [-c]
+        type: string
+        description: Specify a color for the plots, must be a valid matplotlib color.
+      - name: --colormap
+        alternatives: [-cm]
+        type: string
+        description: Specify a valid matplotlib colormap for the heatmap.
+      - name: --format
+        alternatives: [-f]
+        type: string
+        default: png
+        description: |
+          Specify the output format of the plots.
+          {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}
+      - name: --plots
+        type: string
+        description: |
+          Specify which bivariate plots have to be made.
+          [{kde,hex,dot} ...]
+      - name: --legacy
+        type: string
+        description: |
+          Specify which bivariate plots have to be made (legacy mode).
+          [{kde,dot,hex} ...]
+      - name: --listcolors
+        type: boolean_true
+        description: List the colors which are available for plotting and exit.
+      - name: --listcolormaps
+        type: boolean_true
+        description: List the colormaps which are available for plotting and exit.
+      - name: --no_N50 
+        type: boolean_false
+        description: Hide the N50 mark in the read length histogram.
+      - name: --N50 
+        type: boolean_true
+        description: Show the N50 mark in the read length histogram.
+      - name: --title
+        type: string
+        description: Add a title to all plots, requires quoting if using spaces.
+      - name: --font_scale
+        type: double
+        description: Scale the font of the plots by a factor.
+      - name: --dpi
+        type: integer
+        description: Set the dpi for saving images.
+      - name: --hide_stats
+        type: boolean_false
+        description: Not adding Pearson R stats in some bivariate plots.
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/nanoplot:1.43.0--pyhdfd78af_1
+    setup:
+      - type: docker
+        run: |
+          version=$(NanoPlot --version) && \
+          echo "$version" > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/nanoplot/help.txt b/src/nanoplot/help.txt
new file mode 100644
index 00000000..79869392
--- /dev/null
+++ b/src/nanoplot/help.txt
@@ -0,0 +1,96 @@
+usage: NanoPlot [-h] [-v] [-t THREADS] [--verbose] [--store] [--raw] [--huge]
+                [-o OUTDIR] [--no_static] [-p PREFIX] [--tsv_stats]
+                [--only-report] [--info_in_report] [--maxlength N]
+                [--minlength N] [--drop_outliers] [--downsample N]
+                [--loglength] [--percentqual] [--alength] [--minqual N]
+                [--runtime_until N] [--readtype {1D,2D,1D2}] [--barcoded]
+                [--no_supplementary] [-c COLOR] [-cm COLORMAP]
+                [-f [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...]]
+                [--plots [{kde,hex,dot} ...]] [--legacy [{kde,dot,hex} ...]]      
+                [--listcolors] [--listcolormaps] [--no-N50] [--N50]
+                [--title TITLE] [--font_scale FONT_SCALE] [--dpi DPI]
+                [--hide_stats]
+                (--fastq file [file ...] | --fasta file [file ...] | --fastq_rich file [file ...] | --fastq_minimal file [file ...] | --summary file [file ...] | --bam file [file ...] | --ubam file [file ...] | --cram file [file ...] | --pickle pickle | --feather file [file ...])
+
+CREATES VARIOUS PLOTS FOR LONG READ SEQUENCING DATA.
+
+General options:
+  -h, --help            show the help and exit
+  -v, --version         Print version and exit.
+  -t, --threads THREADS
+                        Set the allowed number of threads to be used by the script
+  --verbose             Write log messages also to terminal.
+  --store               Store the extracted data in a pickle file for future plotting.
+  --raw                 Store the extracted data in tab separated file.
+  --huge                Input data is one very large file.
+  -o, --outdir OUTDIR   Specify directory in which output has to be created.      
+  --no_static           Do not make static (png) plots.
+  -p, --prefix PREFIX   Specify an optional prefix to be used for the output files.
+  --tsv_stats           Output the stats file as a properly formatted TSV.        
+  --only-report         Output only the report
+  --info_in_report      Add NanoPlot run info in the report.
+
+Options for filtering or transforming input prior to plotting:
+  --maxlength N         Hide reads longer than length specified.
+  --minlength N         Hide reads shorter than length specified.
+  --drop_outliers       Drop outlier reads with extreme long length.
+  --downsample N        Reduce dataset to N reads by random sampling.
+  --loglength           Additionally show logarithmic scaling of lengths in plots.
+  --percentqual         Use qualities as theoretical percent identities.
+  --alength             Use aligned read lengths rather than sequenced length (bam mode)
+  --minqual N           Drop reads with an average quality lower than specified.  
+  --runtime_until N     Only take the N first hours of a run
+  --readtype {1D,2D,1D2}
+                        Which read type to extract information about from summary. Options are 1D, 2D,
+                        1D2
+  --barcoded            Use if you want to split the summary file by barcode      
+  --no_supplementary    Use if you want to remove supplementary alignments        
+
+Options for customizing the plots created:
+  -c, --color COLOR     Specify a valid matplotlib color for the plots
+  -cm, --colormap COLORMAP
+                        Specify a valid matplotlib colormap for the heatmap       
+  -f, --format [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...]
+                        Specify the output format of the plots, which are in addition to the html files
+  --plots [{kde,hex,dot} ...]
+                        Specify which bivariate plots have to be made.
+  --legacy [{kde,dot,hex} ...]
+                        Specify which bivariate plots have to be made (legacy mode).
+  --listcolors          List the colors which are available for plotting and exit.
+  --listcolormaps       List the colors which are available for plotting and exit.
+  --no-N50              Hide the N50 mark in the read length histogram
+  --N50                 Show the N50 mark in the read length histogram
+  --title TITLE         Add a title to all plots, requires quoting if using spaces
+  --font_scale FONT_SCALE
+                        Scale the font of the plots by a factor
+  --dpi DPI             Set the dpi for saving images
+  --hide_stats          Not adding Pearson R stats in some bivariate plots        
+
+Input data sources, one of these is required.:
+  --fastq file [file ...]
+                        Data is in one or more default fastq file(s).
+  --fasta file [file ...]
+                        Data is in one or more fasta file(s).
+  --fastq_rich file [file ...]
+                        Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy
+                        with additional information concerning channel and time.  
+  --fastq_minimal file [file ...]
+                        Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy
+                        with additional information concerning channel and time. Is extracted swiftly
+                        without elaborate checks.
+  --summary file [file ...]
+                        Data is in one or more summary file(s) generated by albacore or guppy.
+  --bam file [file ...]
+                        Data is in one or more sorted bam file(s).
+  --ubam file [file ...]
+                        Data is in one or more unmapped bam file(s).
+  --cram file [file ...]
+                        Data is in one or more sorted cram file(s).
+  --pickle pickle       Data is a pickle file stored earlier.
+  --feather, --arrow file [file ...]
+                        Data is in one or more feather file(s).
+
+EXAMPLES:
+    NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed
+    NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots hex dot
+    NanoPlot --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000
\ No newline at end of file
diff --git a/src/nanoplot/script.sh b/src/nanoplot/script.sh
new file mode 100644
index 00000000..fc198e89
--- /dev/null
+++ b/src/nanoplot/script.sh
@@ -0,0 +1,129 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# Unset flags
+unset_if_false=( 
+    par_verbose
+    par_store
+    par_raw
+    par_huge
+    par_no_static
+    par_tsv_stats
+    par_only_report
+    par_info_in_report
+    par_drop_outliers
+    par_loglength
+    par_percentqual
+    par_alength
+    par_barcoded
+    par_no_supplementary
+    par_listcolors
+    par_listcolormaps
+    par_no_N50
+    par_N50
+    par_hide_stats
+)
+
+for var in "${unset_if_false[@]}"; do
+    test_val="${!var}"
+    [[ "$test_val" == "false" ]] && unset $var
+done
+
+par_fastq="${par_fastq//;/ }"
+par_fasta="${par_fasta//;/ }"
+par_fastq_rich="${par_fastq_rich//;/ }"
+par_fastq_minimal="${par_fastq_minimal//;/ }"
+par_summary="${par_summary//;/ }"
+par_bam="${par_bam//;/ }"
+par_ubam="${par_ubam//;/ }"
+par_cram="${par_cram//;/ }"
+par_pickle="${par_pickle//;/ }"
+par_feather="${par_feather//;/ }"
+
+
+inputs=( 
+    "$par_fastq" 
+    "$par_fasta"
+    "$par_fastq_rich"
+    "$par_fastq_minimal"
+    "$par_summary"
+    "$par_bam"
+    "$par_ubam"
+    "$par_cram"
+    "$par_pickle"
+    "$par_feather"
+)
+
+one_input=false
+for var in "${inputs[@]}"; do
+    if [ -n "$var" ]; then # if the parameter is not empty
+        if [ "$one_input" = "false" ]; then
+            one_input=true
+        else # Multiple input file types specified
+            echo "Error: Multiple input file types specified."
+            exit 1
+        fi
+    fi
+done
+
+if [ ! "$one_input" ]; then
+    echo "Error: No input file type specified."
+    exit 1
+fi
+
+
+
+# Run NanoPlot
+NanoPlot \
+    ${par_fastq:+--fastq $par_fastq} \
+    ${par_fasta:+--fasta $par_fasta} \
+    ${par_fastq_rich:+--fastq_rich $par_fastq_rich} \
+    ${par_fastq_minimal:+--fastq_minimal $par_fastq_minimal} \
+    ${par_summary:+--summary $par_summary} \
+    ${par_bam:+--bam $par_bam} \
+    ${par_ubam:+--ubam $par_ubam} \
+    ${par_cram:+--cram $par_cram} \
+    ${par_pickle:+--pickle $par_pickle} \
+    ${par_feather:+--feather $par_feather} \
+    ${par_verbose:+--verbose} \
+    ${par_store:+--store} \
+    ${par_raw:+--raw} \
+    ${par_huge:+--huge} \
+    ${par_no_static:+--no_static} \
+    ${par_prefix:+--prefix "$par_prefix"} \
+    ${par_tsv_stats:+--tsv_stats} \
+    ${par_only_report:+--only-report} \
+    ${par_info_in_report:+--info_in_report} \
+    ${par_maxlength:+--maxlength "$par_maxlength"} \
+    ${par_minlength:+--minlength "$par_minlength"} \
+    ${par_drop_outliers:+--drop_outliers} \
+    ${par_downsample:+--downsample "$par_downsample"} \
+    ${par_loglength:+--loglength} \
+    ${par_percentqual:+--percentqual} \
+    ${par_alength:+--alength} \
+    ${par_minqual:+--minqual "$par_minqual"} \
+    ${par_runtime_until:+--runtime_until "$par_runtime_until"} \
+    ${par_readtype:+--readtype "$par_readtype"} \
+    ${par_barcoded:+--barcoded} \
+    ${par_no_supplementary:+--no_supplementary} \
+    ${par_color:+--color "$par_color"} \
+    ${par_colormap:+--colormap "$par_colormap"} \
+    ${par_format:+--format "$par_format"} \
+    ${par_plots:+--plots "$par_plots"} \
+    ${par_legacy:+--legacy "$par_legacy"} \
+    ${par_listcolors:+--listcolors} \
+    ${par_listcolormaps:+--listcolormaps} \
+    ${par_no_N50:+--no-N50} \
+    ${par_N50:+--N50} \
+    ${par_title:+--title "$par_title"} \
+    ${par_font_scale:+--font_scale "$par_font_scale"} \
+    ${par_dpi:+--dpi "$par_dpi"} \
+    ${par_hide_stats:+--hide_stats} \
+    ${meta_cpus:+--threads "$meta_cpus"} \
+    --outdir "$par_outdir"
+
+exit 0
diff --git a/src/nanoplot/test.sh b/src/nanoplot/test.sh
new file mode 100644
index 00000000..cac10c17
--- /dev/null
+++ b/src/nanoplot/test.sh
@@ -0,0 +1,549 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# Files at runtime (.gz, .pickle and .feather)
+wget https://github.com/wdecoster/nanotest/archive/refs/heads/master.zip
+unzip master.zip
+
+###########################################################################
+
+# Test 1: Run NanoPlot with only input parameter (Fastq)
+
+mkdir test1
+pushd test1 > /dev/null # cd test1 (stack)
+
+echo "> Run Test 1: one input (Fastq)"
+"$meta_executable" \
+  --fastq "$meta_resources_dir/test_data/test1.fastq" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null # Remove directory from stack (LIFO)
+
+echo "Test 1 succeeded."
+
+###########################################################################
+
+# Test 2: Run NanoPlot with multiple inputs (Fastq)
+
+mkdir test2
+pushd test2 > /dev/null
+
+echo "> Run Test 2: multiple inputs (Fastq)"
+"$meta_executable" \
+  --fastq "$meta_resources_dir/test_data/test1.fastq;$meta_resources_dir/test_data/test2.fastq" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 2 succeeded."
+
+###########################################################################
+
+# Test 3: Run NanoPlot with multiple options-1
+
+mkdir test3
+pushd test3 > /dev/null
+
+echo "> Run Test 3: multiple options-1"
+"$meta_executable" \
+  --fastq "$meta_resources_dir/test_data/test1.fastq" \
+  --maxlength 40000 \
+  --format jpg \
+  --prefix biobox_ \
+  --store \
+  --color "yellow" \
+  --info_in_report \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if the extracted data exists (--store)
+if ! ls output/*.pickle > /dev/null 2>&1; then
+  echo "Extracted data is not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+if find output -name "*.pickle" -type f -size 0 | grep -q .; then 
+  echo "Extracted data is empty."
+  exit 1
+fi
+
+# Check if the output file starts with "biobox" prefix
+if ! ls output/biobox* > /dev/null 2>&1; then
+    echo "The prefix is not added to the output files."
+    exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 3 succeeded."
+
+###########################################################################
+
+# Test 4: Run NanoPlot with multiple options-2
+
+mkdir test4
+pushd test4 > /dev/null
+
+echo "> Run Test 4: multiple options-2"
+"$meta_executable" \
+  --fastq "$meta_resources_dir/test_data/test1.fastq" \
+  --maxlength 40000 \
+  --only_report \
+  --raw \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -ne 4 ]; then # 4 output files
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if the extracted data exists (--raw)
+if ! ls output/*.tsv.gz > /dev/null 2>&1; then
+  echo "Extracted data is not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "NanoPlot-report.html" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot report is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+if find output -name "*.tsv.gz" -type f -size 0 | grep -q .; then 
+  echo "Extracted data is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 4 succeeded."
+
+###########################################################################
+
+# Test 5: Run NanoPlot with different input (Fasta)
+
+mkdir test5
+pushd test5 > /dev/null
+
+echo "> Run Test 5: Input Fasta"
+"$meta_executable" \
+  --fasta "$meta_resources_dir/test_data/test.fasta" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 5 succeeded."
+
+###########################################################################
+
+# Test 6: Run NanoPlot with different input (Fastq_rich)
+
+mkdir test6
+pushd test6 > /dev/null
+
+echo "> Run Test 6: Input Fastq_rich"
+"$meta_executable" \
+  --fastq_rich "$meta_resources_dir/test_data/test_rich.fastq" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 6 succeeded."
+
+###########################################################################
+
+# Test 7: Run NanoPlot with different input (Fastq_minimal)
+
+mkdir test7
+pushd test7 > /dev/null
+
+echo "> Run Test 7: Input Fasta"
+"$meta_executable" \
+  --fastq_minimal "../nanotest-master/reads.fastq.gz" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 7 succeeded."
+
+###########################################################################
+
+# Test 8: Run NanoPlot with different input (Summary)
+
+mkdir test8
+pushd test8 > /dev/null
+
+echo "> Run Test 8: Input Summary"
+"$meta_executable" \
+  --summary "$meta_resources_dir/test_data/summary.txt" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 8 succeeded."
+
+###########################################################################
+
+# Test 9: Run NanoPlot with different input (BAM)
+
+mkdir test9
+pushd test9 > /dev/null
+
+echo "> Run Test 9: Input BAM"
+"$meta_executable" \
+  --bam "$meta_resources_dir/test_data/test.bam" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 9 succeeded."
+
+###########################################################################
+
+# Test 10: Run NanoPlot with different input (pickle)
+
+mkdir test10
+pushd test10 > /dev/null
+
+echo "> Run Test 10: Input pickle"
+"$meta_executable" \
+  --pickle "../nanotest-master/alignment.pickle" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 10 succeeded."
+
+###########################################################################
+
+# Test 11: Run NanoPlot with different input (feather)
+
+mkdir test11
+pushd test11 > /dev/null
+
+echo "> Run Test 11: Input feather"
+"$meta_executable" \
+  --arrow "../nanotest-master/summary1.feather" \
+  --outdir output
+
+# Check if output directory exists
+if [[ ! -d output ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "output" | wc -l)" -lt 1 ]; then # Apart from log file
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find output -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find output -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find output -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 11 succeeded."
+
+###########################################################################
+
+# Test 12: Run NanoPlot with different output directory
+
+mkdir test12
+pushd test12 > /dev/null
+
+echo "> Run Test 12: different output directory"
+"$meta_executable" \
+  --fastq "$meta_resources_dir/test_data/test1.fastq" \
+  --outdir out
+
+# Check if output directory exists
+if [[ ! -d out ]]; then
+  echo "Output directory not found!"
+  exit 1
+fi
+
+# Check if output files are generated
+if [ "$(ls -1 "out" | wc -l)" -lt 1 ]; then
+  echo "Output files are not found!"
+  exit 1
+fi
+
+# Check if files are empty
+if find out -name "*.html" -type f -size 0 | grep -q .; then 
+  echo "At least one HTML file is empty."
+  exit 1
+fi
+if find out -name "*.png" -type f -size 0 | grep -q .; then 
+  echo "At least one plot is empty."
+  exit 1
+fi
+if find out -name "*.txt" -type f -size 0 | grep -q .; then 
+  echo "NanoPlot summary file is empty."
+  exit 1
+fi
+
+popd > /dev/null
+
+echo "Test 12 succeeded."
+
+###########################################################################
+
+echo "All tests successfully completed!"
\ No newline at end of file
diff --git a/src/nanoplot/test_data/script.sh b/src/nanoplot/test_data/script.sh
new file mode 100644
index 00000000..9bb6ffd6
--- /dev/null
+++ b/src/nanoplot/test_data/script.sh
@@ -0,0 +1,102 @@
+#!/bin/bash
+
+## Fastq file ##
+# Define the number of reads
+NUM_READS=10
+OUTPUT_FILE="./src/nanoplot/test_data/test1.fastq"
+
+# Function to generate a random DNA sequence of given length
+generate_sequence() {
+    local length=$1 #assigns it the value of the first argument passed to the function 
+    cat /dev/urandom | tr -dc 'ACGT' | fold -w $length | head -n 1
+}
+
+# Function to generate random quality scores of given length
+generate_quality() {
+    local length=$1
+    local average_quality=$2
+    local quality=""
+    for ((i=0; i<length; i++)); do
+        # Generate a quality score based on the average_quality
+        quality+=$(awk -v avg=$average_quality 'BEGIN {printf "%c", int(rand()*10 + avg) + 33}')
+    done
+    echo $quality
+}
+
+echo -n "" > $OUTPUT_FILE #Create the fastq file
+for i in $(seq 1 $NUM_READS); do
+    # Randomly determine the read length (between 20 and 100 bases)
+    read_length=$(shuf -i 20-100 -n 1)
+    # Randomly determine the average quality (between 30 and 40)
+    average_quality=$(shuf -i 0-40 -n 1)
+    sequence=$(generate_sequence $read_length)
+    quality=$(generate_quality $read_length $average_quality)
+    echo "@read_$i" >> $OUTPUT_FILE
+    echo $sequence >> $OUTPUT_FILE
+    echo "+" >> $OUTPUT_FILE
+    echo $quality >> $OUTPUT_FILE
+    echo >> $OUTPUT_FILE  # Add a blank line between reads
+done
+
+NUM_READS=7
+OUTPUT_FILE="./src/nanoplot/test_data/test2.fastq"
+echo -n "" > $OUTPUT_FILE #Create another fastq file
+for i in $(seq 1 $NUM_READS); do
+    # Randomly determine the read length (between 20 and 100 bases)
+    read_length=$(shuf -i 20-100 -n 1)
+    # Randomly determine the average quality (between 30 and 40)
+    average_quality=$(shuf -i 0-40 -n 1)
+    sequence=$(generate_sequence $read_length)
+    quality=$(generate_quality $read_length $average_quality)
+    echo "@read_$i" >> $OUTPUT_FILE
+    echo $sequence >> $OUTPUT_FILE
+    echo "+" >> $OUTPUT_FILE
+    echo $quality >> $OUTPUT_FILE
+    echo >> $OUTPUT_FILE  # Add a blank line between reads
+done
+
+#########################################################################################
+
+## Fasta file ##
+wget -O src/nanoplot/test_data/test.fasta https://raw.githubusercontent.com/merenlab/reads-for-assembly/master/examples/files/fasta_01.fa
+# reduced the size of each sequence to ~300 bp.
+
+#########################################################################################
+
+## Fastq_rich file ##
+wget -O src/nanoplot/test_data/test_rich.fastq.gz https://github.com/epi2me-labs/fastcat/raw/master/test/data/bc0.fastq.gz
+
+# Unzip file
+gunzip -c src/nanoplot/test_data/test_rich.fastq.gz > src/nanoplot/test_data/test_rich.fastq
+
+rm src/nanoplot/test_data/test_rich.fastq.gz 
+
+#########################################################################################
+
+## Summary file ##
+if [ ! -d nanotest ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/wdecoster/nanotest/
+fi
+
+mv nanotest/sequencing_summary.txt src/nanoplot/test_data/test_summary.txt
+# reduce to first 101 lines
+head -n 51 src/nanoplot/test_data/test_summary.txt > src/nanoplot/test_data/summary.txt
+
+rm -rf nanotest
+
+#########################################################################################
+
+## BAM file ##
+if [ ! -d /tmp/snakemake-wrappers ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
+fi
+
+cp /tmp/snakemake-wrappers/bio/biobambam2/bamsormadup/test/mapped/a.bam src/nanoplot/test_data/test.bam
+
+# samtools view -h test.bam | head -n 44 > test_sm.sam
+# samtools view -bS test_sm.sam > test_sm.bam
+# samtools index test_sm.bam
+# rm test.bam
+# mv test_sm.bam test.bam
+# mv test_sm.bam.bai test.bam.bai
+# rm test_sm.sam
\ No newline at end of file
diff --git a/src/nanoplot/test_data/summary.txt b/src/nanoplot/test_data/summary.txt
new file mode 100644
index 00000000..b566d6ec
--- /dev/null
+++ b/src/nanoplot/test_data/summary.txt
@@ -0,0 +1,51 @@
+filename	read_id	run_id	channel	start_time	duration	num_events	passes_filtering	template_start	num_events_template	template_duration	num_called_template	sequence_length_template	mean_qscore_template	strand_score_template
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch124_read148_strand.fast5	170fb1c5-979b-4df7-864f-c5c14689a14c	b5e83402e47ea9927694cb6e80d61180dfc8a49a	124	3733.02575	22.56375	12875	True	0.031	12875	22.53275	12875	8242	10.049	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch320_read27_strand.fast5	6d0956c2-c161-48f4-b2fa-142ca872406f	b5e83402e47ea9927694cb6e80d61180dfc8a49a	320	1826.8425	123.37625	34771	True	62.52675	34771	60.8495	34771	16881	11.164	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch496_read2_strand.fast5	e9a32f7d-4aa6-4b85-9f76-6764769ad99c	b5e83402e47ea9927694cb6e80d61180dfc8a49a	496	7.1315	121.414	52102	True	30.235	52102	91.179	52102	19346	9.822	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch485_read15_strand.fast5	b01da059-de21-4ed3-9eb8-6126ea59cb00	b5e83402e47ea9927694cb6e80d61180dfc8a49a	485	2586.54825	107.53375	36399	True	43.834	36399	63.69975	36399	19861	10.17	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch362_read219_strand.fast5	4d253e4f-2090-4adb-aa3e-16dc5e4d5e55	b5e83402e47ea9927694cb6e80d61180dfc8a49a	362	2720.77225	14.9615	2577	True	10.45175	2577	4.50975	2577	1672	12.663	-0.0004
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch163_read69_strand.fast5	4629b40a-aea4-4c92-9458-0e66ef4ecc17	b5e83402e47ea9927694cb6e80d61180dfc8a49a	163	673.69725	185.45225	95287	True	18.699	95287	166.75325	95287	59133	9.573	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch502_read25_strand.fast5	a8785b36-b442-4de7-9e43-5ddae6e39fdb	b5e83402e47ea9927694cb6e80d61180dfc8a49a	502	884.39875	187.91175	83750	True	41.3485	83750	146.56325	83750	55323	11.985	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch355_read19_strand.fast5	436405ef-1e7d-43a5-99b4-929e31897043	b5e83402e47ea9927694cb6e80d61180dfc8a49a	355	571.15325	94.5895	11586	True	74.31375	11586	20.27575	11586	7636	11.865	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch240_read291_strand.fast5	f31d3457-2065-4acf-a9d5-966a4818564c	b5e83402e47ea9927694cb6e80d61180dfc8a49a	240	3511.1415	57.23625	19778	True	22.62325	19778	34.613	19778	6176	8.535	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch124_read242_strand.fast5	d67b506a-b026-450d-803e-1e12bd1facaa	b5e83402e47ea9927694cb6e80d61180dfc8a49a	124	6315.02775	53.26525	8709	True	38.023	8709	15.24225	8709	5765	12.3	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch217_read62_strand.fast5	68a01ec4-bf8f-4aa4-8763-39cd9a15b8aa	b5e83402e47ea9927694cb6e80d61180dfc8a49a	217	3506.43875	16.38525	2944	True	11.23225	2944	5.153	2944	2011	9.229	-0.0007
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch321_read18_strand.fast5	63fcec17-46fd-4cdc-a381-7b09d6f652e9	b5e83402e47ea9927694cb6e80d61180dfc8a49a	321	820.995	47.1295	25668	True	2.21	25668	44.9195	25668	17575	12.18	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch235_read49_strand.fast5	45eb23a8-63d1-4870-9a31-c349836cc728	b5e83402e47ea9927694cb6e80d61180dfc8a49a	235	3662.59625	250.6945	122186	True	36.86825	122186	213.82625	122186	20295	8.707	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_sequencing_run_170301_MG1655_PC_RAD002_87615_ch150_read334_strand.fast5	1b05de41-d66d-4947-8533-c27bdafeee69	b5e83402e47ea9927694cb6e80d61180dfc8a49a	150	4017.1535	183.56	97579	True	12.79625	97579	170.76375	97579	61111	9.709	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch5_read33_strand.fast5	b5b5833b-9341-4886-9ffd-7dd7f876c009	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	5	142.765	25.96625	9812	True	8.79475	9812	17.1715	9812	225	7.694	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch438_read26_strand.fast5	76a5b578-7c92-458b-9981-437f48b82455	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	438	160.71825	55.85775	31896	True	0.03975	31896	55.818	31896	21845	10.004	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch450_read2842_strand.fast5	26cfa987-1a6d-4137-b4b7-19f84f990bfc	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	450	362.60825	76.74075	43851	True	0.0	43851	76.74075	43851	29248	10.348	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch151_read88_strand.fast5	6e2f5cdb-c978-4403-9611-4faaa35722f8	a3f8b1fb56e77905d115a86ef283e1f838d7476d	151	184.193	8.241	4709	True	0.0	4709	8.241	4709	2638	10.235	-0.0004
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch402_read37_strand.fast5	32762878-4ef4-4f27-bfcd-5fe902fb6497	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	402	250.694	77.26225	25086	True	33.3605	25086	43.90175	25086	16574	11.969	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch206_read39_strand.fast5	d52c84b1-7a31-4639-b41e-cf5847681395	a3f8b1fb56e77905d115a86ef283e1f838d7476d	206	164.9445	36.5865	20906	True	0.0	20906	36.5865	20906	10700	7.348	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch174_read239_strand.fast5	c61d655a-fa49-4376-a266-d1710fffdc60	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	174	140.031	20.596	11726	True	0.07425	11726	20.52175	11726	5285	7.139	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch240_read28_strand.fast5	e32e01c1-79ad-4436-96a6-afb4414bccab	a3f8b1fb56e77905d115a86ef283e1f838d7476d	240	96.7155	34.78475	3500	True	28.65875	3500	6.126	3500	2284	11.446	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch461_read3_strand.fast5	d7c4f400-faf1-4574-933c-14cfe563ecdb	a3f8b1fb56e77905d115a86ef283e1f838d7476d	461	22.223	40.1695	1803	True	37.0135	1803	3.156	1803	1216	11.478	-0.0006
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch142_read28_strand.fast5	0a779938-c2f0-4fe9-937b-19b8172322b3	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	142	152.8475	63.728	36416	True	0.0	36416	63.728	36416	22419	10.38	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch220_read62_strand.fast5	53d223e3-8341-4fb2-82a9-534b29d917f0	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	220	250.694	22.03525	10606	True	3.47325	10606	18.562	10606	7053	12.447	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch17_read37_strand.fast5	6cd9b908-7d7c-4df2-887b-557631f4ecc4	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	17	320.315	7.64125	4343	True	0.04025	4343	7.601	4343	1726	10.341	-0.0005
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch119_read68_strand.fast5	c1050d07-d676-4f09-bb50-5af9a0d36719	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	119	274.408	2.05275	1157	True	0.02775	1157	2.025	1157	804	11.135	-0.0024
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch260_read26_strand.fast5	e681ea0c-485a-4170-bb87-13e86878f0d5	a3f8b1fb56e77905d115a86ef283e1f838d7476d	260	280.141	2.97125	1281	True	0.728	1281	2.24325	1281	750	7.439	-0.0013
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch427_read24_strand.fast5	e4208eb0-c817-4512-a0d6-3472748d09a3	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	427	125.59	12.975	7397	True	0.02925	7397	12.94575	7397	4747	12.276	-0.0001
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch507_read3_strand.fast5	cd6e4550-22d9-49e5-8d4a-dc2d54eb78b9	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	507	22.127	64.9935	23188	True	24.41425	23188	40.57925	23188	5082	10.188	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch144_read32_strand.fast5	1ba73b61-7f74-46ce-acbe-643b8946ee07	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	144	147.0335	4.7515	2698	True	0.0285	2698	4.723	2698	1895	10.679	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch222_read21_strand.fast5	a045f9b2-93dd-467f-a7d9-ceb6d72a4f67	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	222	130.9055	1.071	612	True	0.0	612	1.071	612	392	7.268	-0.0036
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch363_read164_strand.fast5	49e5d9e0-b87d-4bb2-867b-fbc6a321bcf8	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	363	431.1165	8.23225	4674	True	0.05125	4674	8.181	4674	3212	11.092	-0.0001
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch170_read40_strand.fast5	7d15ba0b-67c8-4307-961e-5ddeb79b1056	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	170	232.9605	17.50725	9980	True	0.0415	9980	17.46575	9980	5658	10.647	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch410_read30_strand.fast5	a7fc1f72-648d-471e-87f9-e2186b246627	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	410	141.0205	5.52325	3140	True	0.02725	3140	5.496	3140	1913	11.971	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch349_read69_strand.fast5	17df9262-7bf6-4711-bc7d-a0569f473cd3	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	349	307.5495	20.40675	11647	True	0.02425	11647	20.3825	11647	7829	12.098	-0.0004
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch10_read65_strand.fast5	1bc8d128-eed3-41c2-baea-3ca8cd9f0dc9	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	10	250.694	35.269	9451	True	18.72825	9451	16.54075	9451	6468	10.704	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch67_read26_strand.fast5	09437fae-3ba4-40cd-b02a-40b67a067ffe	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	67	127.99425	10.7565	6059	True	0.15325	6059	10.60325	6059	4117	9.926	-0.0004
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch234_read31_strand.fast5	30a2e325-06d5-4c30-843c-153da097c13b	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	234	129.3055	9.26275	5270	True	0.04	5270	9.22275	5270	3704	11.268	-0.0005
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch237_read27_strand.fast5	740be0f7-60f5-4fc5-96d9-225eda8ff83e	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	237	250.6935	35.98925	15850	True	8.251	15850	27.73825	15850	10192	11.631	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch464_read31_strand.fast5	b298c02b-4e8e-4636-b7d2-4920b7e8c292	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	464	157.44275	12.122	6913	True	0.02375	6913	12.09825	6913	4148	10.846	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch192_read3_strand.fast5	7dd06578-5b15-4485-988f-b039a2d86ead	a3f8b1fb56e77905d115a86ef283e1f838d7476d	192	22.223	40.16925	21038	True	3.35275	21038	36.8165	21038	8534	8.957	-0.0003
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch507_read7_strand.fast5	94b3ba2e-2cc3-4a7c-a319-9b1bf976aeff	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	507	98.27225	5.3885	3073	True	0.01025	3073	5.37825	3073	1819	10.48	-0.0006
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch170_read42_strand.fast5	3eec21b1-872f-480b-8d11-daa41209338b	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	170	250.694	77.2625	44150	True	0.0	44150	77.2625	44150	24787	11.046	-0.0002
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch212_read68_strand.fast5	778f7330-179c-42f3-bdfe-f7c5ccddea01	a3f8b1fb56e77905d115a86ef283e1f838d7476d	212	164.93525	36.59575	20911	True	0.0	20911	36.59575	20911	14734	11.492	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch406_read32_strand.fast5	1592d38b-2bec-4892-8021-1a51507c6327	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	406	250.69425	77.26175	35190	True	15.6785	35190	61.58325	35190	19989	8.682	-0.0002
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch226_read66_strand.fast5	5f428477-799c-443a-986f-2ebd5b84ab18	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	226	351.44525	10.95275	6253	True	0.00925	6253	10.9435	6253	3877	11.287	-0.0004
+nanopore2_20170303_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_26713_ch275_read39_strand.fast5	890ec449-f329-40c8-9e57-f4eb2c358b4c	9ff0fede59c6669aa7f0d860aa73a4f0959d4b99	275	250.69425	8.092	4624	True	0.0	4624	8.092	4624	3122	12.351	-0.0005
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch466_read71_strand.fast5	db1765d2-0daa-4154-9a6d-6aed0cb13803	a3f8b1fb56e77905d115a86ef283e1f838d7476d	466	217.31975	17.3305	7267	True	4.6125	7267	12.718	7267	4838	11.926	-0.0003
+nanopore2_20170302_FNFAF09967_MN17024_mux_scan_170301_MG1655_PC_RAD002_10881_ch212_read32_strand.fast5	56ab6b26-7b8f-4447-93b8-331d2dea9a99	a3f8b1fb56e77905d115a86ef283e1f838d7476d	212	94.6505	1.855	1048	True	0.02075	1048	1.83425	1048	759	12.249	-0.0014
diff --git a/src/nanoplot/test_data/test.bam b/src/nanoplot/test_data/test.bam
new file mode 100644
index 0000000000000000000000000000000000000000..041bceb9ab119e2a6c7e51b4c9811ff5b09adadb
GIT binary patch
literal 2752
zcmV;x3P1H9iwFb&00000{{{d;LjnNQ0Cmve3W6{Y0O0j2COyPnprb{U?3a~LftfMr
zZ_J!2I7e>ei}wJjVS&((<BmJ-gS*Z8IGT(Q^7cNygp9%x^Ao#iT_t%btaLnn#yn&^
zlX)R<5VHAV;ds#wAAA<q0!Lw&n%x<!fUc$$rLK#&LE9!uDsq>lRuWQStRR&XiGlRd
z6GkYjbzxfPS4X``VLofEeng(RtjRPZRvHVjP(ugkQ2N&YxtMp})hK8tPV){lZsKz@
zdLe|~Nn!yQCjkHeABzYC000000RIL6LPG)o&IzqrO>-Mr74@)S#X#cNv87gbzkZ*0
zMcI-aNmVRRoT`Myu$fs<Q&2Dv7F59s_yKXTn7;@N4F3eqx%ahV1~aiFOP0Fz`Xlf3
z{W#~|m%kppe|_&qKRNyg+qd}qyS_O7$KU<o&%giTPrrTp!xu+C{_fGy%XcqtPoDDi
z^393h9na->^X2ezbNeoy#LX>!zPo+*a)Tcya*lD4A|hFGE?H78QgXyVz^6#Wu@ra|
zKfU<iWZ=m$%1p1~nS*Y!$0iQai!=^*6FoMIPbW!!`pM1FC$B!gytp{pe0uZtXP@32
z{p}98k3R_R@@h|Xg(&zIu`kX!m6%e>0iQ^~Nk?-I03C@<9MJYNX#rMxj24V}oYYH9
z9fmO8xv50&nmj!m-Ewc+5mYJo%8D+hVkjj<{f0qX9dVLmd=T|yAmXQZ0i>zWa=`|L
zgS^0p`_TP;Cv@}mUg$O(0J9kvI$(lh76RfJ{48O^EnY)3^R}{m5bDB%lxH+vSJWF*
zONg`Z{7ihp@8(QqK=%>Qt(P}*!uw)(<1N<vx~9I&96$n$Nm9(&!vPCnWMU$~Qi2R!
zi$L@|$!8juR}K?eskHh4d<Wb0Q;a-?QJ?e1{XNF}YA1A8i=ELut$|m@u#4)SI^%4%
zDOq8em*TM3bWsL1Wd>%41A(z94bN{iZUe9FQkVohiO_XYjkj9AzEpg_+}$}}tVVoa
z?vYqVNGWICBi@kdn%spkF=viBaoJ268CAKHKH-pSQ<5pL1RrKHSl$KTp~FrmL&dke
zS<kodc5}|<V*k8Jztd=faWuekPHHk_O|R*aSVDkz2DML8ilD)rJ`|DGJCcazl|hHH
zrkeZ%UU6{VTpp4)HTE?5W6#VNGKh;7B2MuJow1mP9y_p^^yePd^pz68d7hw?xa2))
zFZtmVHtqd==)T?w-TJ?5ck&Q)acesbzxo0s+5;Ih*UXMN`_T88d;$!$R_7R(z%-<u
zri1@Ym?PElJR|2b^-tnYOf26ay6eqyzPY?4#LvGOec%5u@m=h1K_aABVdz!+iFsX3
z|KRwT%uWKP*G!3+i|y9JR5u;+DgHHyjSI;+UKXO=<O9t20C@9#!_VBa7zb2fRi$v&
z%rqI>LB&cICtK1RpFvD9B216lD$;z?K+KcU!RNdCxnW0mSBt~qkI3+ys#sK+X+f7{
zRF}!%42BRfs#~THm+XWD-|d-q8gP4|MAH-Qf%lIc;a%?t?|7R$OVurWGq0$17?JeB
zB!dq_(Ojw49aMgY%J@O0S||?6Gn?^SjQ7~iye3(A`;(FTu3tkNkL2*xZUC2is_Hjp
zy^#UwSPC_pEYa9<k*o*~$A?~;T=B(pDYDe_YI84Pz;l?UCvxUE0B(@8Q|um%%51$+
z1O8?Y|6E<{9e(O=c=(wg*eu@(uF8kN1>)w0p~xJYn=+Kfyi-}8hVM^cU@!JqeBT5_
zW#{D){B!mH{qy!i%Cn|V6`HIWPzVtKr-*FQ7l;=!8=#WkGhH=hFSUvar4!o+xR`pC
z;T_8e2I4bk$&=>0y4YN=xBmIB9RYp|JG{CJaQGkfGLm~O{8%+8AtKoXs}T@~tSL@d
z<T8uKr1c$r+0pc&Pu@i9%nV<inGZDh9G`~DfLDvn{Q9%S#l8%<S{>k{Wvu0_gjY2~
zc*u!H_Ss}jq4Na`IBMu2nfnrHY_W}s@ma2^6~gBHOdi8W*Ow25_m}$%cV&0ugIrM>
z7xb7aBq`>bSkEEhfV$bhh~|ob<iLAp7>0XJ`~Dnad@6m*M}X7(=%Y)R@Ws9L!$SaG
z9YFE&{*`J?Zk3>UTJb5W#Q-(OPPt^T5`!7mBk}_6bxx~A8-{w3^PZ)LoIQeqSugjl
zZdF%T4Ko(z)Y2+VwCSR$M8$N?u8?K}-*!xuxvK%JBTdP@RT6s1!SLqui-S5Z3JBh*
zJtbO@S~PIt=#(+LNT`Ao)m@GuPFqk#Co_kOW9SWc1JfnVGA-3AhI&?7=ZC{v9ALe%
zTLcW5l^=ztDl;>cCkZmoWGI;d<c`XmD8U^@15fa)sIN}mKP%D1WOw*LZ{i{F_G!JG
zjIpPlsa{$&CboD7gI8;S)Kh`YdRInK>Soeta&_v5+A%{>;q!z*Jx2$WSJP3yEIxa&
zr#_qStFH$!cz5ff+NwYbga^8@%@<3qcF9JT1ttt#?Vm+eqBxS_)QDPVRitlugAw2A
zwN`(>-UHutjlSExyA#?bvkI^R)K+z}G9l|Th(3?}$ixb6K%*(LsXchew_bCqgYkG;
z+wjpquceA_v01DAeziyVEiVsh4Jxc$<%^lokdO(vP1Us0Chr|Q*A1N?tQ`Z@*7mH{
zwc+yYxmvV6pXhUw&^<!(T+R1Sp2mLL3Qn1;u%o1$fqCbkmPA#Q-Js2E%Nh}0$nMij
zn}Q(nC(b46Hfg653t7gcKOElr@(^WKWc$aY?n%hl!ZXQnOpQv6&fGO)(kNWJwo^Wj
zL#}#FGfZl7o^l;Rf#QP6BUo>}I7qmZO3PXvK}=~oEZLB+^E9HSZIZW4g1I+xiuogl
zP|?5-V^fG2)*R$~#=AnXJp#Pt{_rX_uq_{I?ZA3fOWCY|lBv)dmr=%2Sk9d%txsoL
z{5~kBXycxq+owD|7~W!j2(xdORm8lH^D~*2(zr)+b|&4l%NsxejkwAiK%vWmLyt(z
zGJ|}{0|sZO`F>{qdN+7icXZkD!{NQzSYfMw!4|wiRwt#Y!nVcabq{36o<v}dOvd-2
zKU0C}BVo7GR;;Hn<t0Px!=UiA_j$QlUE8wJ&v&nk7S}s1*lxbLU|Y)}O(Rr`wZ+85
ziUlNqM$W-g2+);LlMK4o4TI24NRE64MJon}Q*tv{LGN|xb4%{a_09bK%E~`?1NdTB
zdUyqZ{n(xu7mso=-Bnb8T6@V-=}3N6TMk+(rFvMaL*4~&x?Y;$G)T`YrVZ`rs|{CI
zo0aLaU+fWt%lQGTE7_(^fF><9<4~Y^DsBlTv3ZVz*F}R1EQ~g+kR=y&ZLH&kGg~KQ
zClP)<!|!r<j#=(s+TK}#K=?_j{EfmdArezHTSwdk#w9a5c^U>#;Q%zVKr%qN5w*J%
zn0PwgW{Stt$l>r-mxuh1iBbV-kxK)H+CM9`L#^Vo9kfz%=BbdIwqQ*+t^H5r8(SRH
zlr%vLk+9vt#nEDQy=QY*)k{%PO$};I`amMAbZ=`&tdMf&t&<c@!g0}(sPGkiIQ3^u
zCudz~OEWu{$!yFnvybm<?*0c=nKR(;A^-p%iwFb&00000{{{d;LjnLB00RI300000
G0000hEmb-I

literal 0
HcmV?d00001

diff --git a/src/nanoplot/test_data/test.bam.bai b/src/nanoplot/test_data/test.bam.bai
new file mode 100644
index 0000000000000000000000000000000000000000..1bf27ec266aa7ba9032fb45d9f7a79e77b005d1c
GIT binary patch
literal 96
wcmZ>A^kigYU|?VZVoxCk1`wNp;VPJ9U|7NhVt7r0$fJvEKvlsgRJ{;U0AS|^4*&oF

literal 0
HcmV?d00001

diff --git a/src/nanoplot/test_data/test.fasta b/src/nanoplot/test_data/test.fasta
new file mode 100644
index 00000000..78c66827
--- /dev/null
+++ b/src/nanoplot/test_data/test.fasta
@@ -0,0 +1,35 @@
+>640612206 slice:0-298
+TTTCTATTTGCCATTCATACCACCTAGTCTCGTTTAAACAGGTCGCGTG
+TATAGACCTTGTCCGCCACGTCCGCGAGCTCGTCGCTCCAGCGGTTGGC
+GACGATCACGTCGCAGCCGGCCTTGAAGGCCTCCAGGTCGTGCGTGACC
+TCGGAGCCGAAAAACTCCGGCGCGTCCAGCGTGGGCTCGTAGACCACCA
+CGGGCACGCCCTTGGCTTTCACGCGCTTCATGACGCCCTGGATGGAGCT
+CGCGCGGAAGTTGTCGGAGTTGGACTTCATCGTCAGGCGGTACACGCCC
+>640612206 slice:15000-15298
+GCTTTTACCTGCGGTTTTAATATCACCAAAATGCCTGTGGTTGAGATCA
+TTCAATTCGTCGTAGTAAACCGAAGTACTTTTGTTTGGCTACAAACAGT
+ATCGGTATAGGCGATTATGAATATCGCTATAATTTGGATGGTAAAACGA
+TTTTCTAGGACAACCGTTCGCCGATGGTAAACGGATGTTGTTTATACAG
+CCTGTGTACAACAGATATACTTACATCCTGTGCGTAAAGCCCATGGCCA
+GCAGGCCATGATTCTATCGAACTGGACCGTACTATGAGATTGATACACA
+>640612206 slice:30000-30298
+GAACCAACAGCGACAGCAGCGTCAACAACGACAGCAGCACCAGGCAAAC
+GGCAATGCGCCCAAGCAGCCCCCCACGCACGCTCGAGGCGATCGCGGCC
+CCGCGCGCAAGTCCGCCGGCAACAATAAGTCGGGCAAAAAGACGACGCT
+CTTTGTCGTCCTGGGTCTAATCGTCATTGTCTATATCGTTGGCGTCGTA
+GCATTTTCGCAGGTAGCCTACCCCAACACCATCATCGCCGGCGTCGACG
+TCTCGTTCTCTAACGCTTCGTCTGCCGCCACCAAGGTCAACTCGGCTTG
+>640612206 slice:45000-45298
+TCCTCGTAGTAGAACGAGAACGCCTCGTCACGCGCGACGGCGATGATGG
+GCCGCGCTCCCGCGATCGGCTCAAACCGGTAAGGTTCCTCGCAGATATC
+GGGTGCCGTCGCCGCTATTTCGAGCAAGCGGTCGACGTCGACGCTCTTT
+TCCACCAGCTCGGCCATCTTATCGATGCGCGCGGAGAGCTGCTCCACCT
+CGTCGGCGGTCACAAGCCCCAGATGCCGGCTTTCGAGCGAGAACGCCTC
+GTCGGCGGGGATATTCCCCAAAACCGCGACGCCCGTGTGCTTCTCGATC
+>640612206 slice:60000-60298
+TCGGCACGCTTAAGGTCCATGAGCTCGTCAATCAGGCGGGCCGTGTCGA
+CGCCCTCACCCGAAAGCGCGCGCATCATATTGAGCAGGCAGGAGCGCTC
+GGGGCGCAGCGGCTTGTCGTGATATTTGATGAGCAGGCACACGTCGCGC
+ACCAGGTCGTGCGAGAGCGCCAGGCGATCCATAATGACGCGCGCTTTCT
+TGGCGCCGAGCTCGGGATGACCGTAGAAGTGTCCGCTGCCGGCGTGATC
+GACCGTGAAACACTCGGGCTTGGACACATCGTGCAAAAACGCCGCCCAC
diff --git a/src/nanoplot/test_data/test1.fastq b/src/nanoplot/test_data/test1.fastq
new file mode 100644
index 00000000..f262027d
--- /dev/null
+++ b/src/nanoplot/test_data/test1.fastq
@@ -0,0 +1,49 @@
+@read_1
+TCCTAAGTTCGTTGGTTCAAGCCTCGCTTGCCAACGGCGCATGTCAGACCCGATGGAGTAGTGCACCGGA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+
+@read_2
+CCAGGACCAACAGAGTCTCTCAATACCGAGGCTGCGGAGGTAAAATACATCTACTCGAAGAAGAAAAAGCCGTACTACGTTTGTT
++
+00000000)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
+
+@read_3
+AAAAGCGGATCGGGTTGGTGGTTCCTCGAAGAGATTTGAATGGCACAATTCTCACAGCGGCTGACCCCGATATAGCCAAGTCAAATCATACGGTT
++
+///////////////////////////////////////////////////////////////////////////////////////////////
+
+@read_4
+GTTCGGAGATCAGAAAGAGAAACCCAACAAAGAGATGGCTCTA
++
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+@read_5
+GCTCCACCCAACATTGAACGACCCCCAACTTAATATGCTTGGG
++
+4444444444444444444444444444444444444444444
+
+@read_6
+AGCTATCACGTTAAATATATCAAACCCCTCGGTGAAAAGCAAGGCTCCGGTTAGCACGCCACGCTTAAGTAATTAGCTACCTAGTT
++
+22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
+
+@read_7
+GGCACTCCATCACCGTACTTAACCTGTAAGTTACCTCGCCGAGCAAA
++
+99999999999999999999999999999999999999999999999
+
+@read_8
+CAGACTACTGGCAGACATCGGAAATGCCTTGCCTCGGTTTCGCTGTAGCGGT
++
+GGGGGGGGGGKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
+
+@read_9
+AACGTTAAAGCAGGGACGCGTGTTCCCTCCGA
++
+DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
+
+@read_10
+ACTGGTATGTCGTGGTACCCTTGA
++
+111111111111111111111111
\ No newline at end of file
diff --git a/src/nanoplot/test_data/test2.fastq b/src/nanoplot/test_data/test2.fastq
new file mode 100644
index 00000000..b9283728
--- /dev/null
+++ b/src/nanoplot/test_data/test2.fastq
@@ -0,0 +1,34 @@
+@read_1
+TCAGGATCCGACCGTTTTGG
++
+55555555555555555555
+
+@read_2
+CGTCAGGTCTTAATGTCGTGGTTGTGATTGTTAATAATATACTCTATGTTC
++
+777777777777777777777777777777777777777777777777777
+
+@read_3
+GCTATCTTCCGAAAGAGGCTATTTCAGGTCCTTCGTGGCTCGCCACTTAT
++
+22222222222222222222222222222222222222222222222222
+
+@read_4
+ACGGGATCGCCGGTCCATACTGGTTCGGGAACCTCTCTAACTTAACCATGAGAGGTTCGAGTCC
++
+MMMMMMMMMMMMMMMMMMMMKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
+
+@read_5
+ATTTCTAAGTCTGTGGCTTATGGACTGGCTCCATGCTCGGGCTGGTATACCGTT
++
+''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+@read_6
+CAAAGCCGACCCAAATATTTTCCTAGCCTCTCACCCCGTAGTCGCTCGACCGTCACTGTTCCCTTATCATATTACACTCTG
++
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
+@read_7
+AATAAAGCCCGTTCCACACTTTAGCAATGTCAAGACTGTATCATCGACAGCGGTAGTTATGTAGCCAGCACATTTCATTACCCCCTCGC
++
+77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777
\ No newline at end of file
diff --git a/src/nanoplot/test_data/test_rich.fastq b/src/nanoplot/test_data/test_rich.fastq
new file mode 100644
index 00000000..d47af6ae
--- /dev/null
+++ b/src/nanoplot/test_data/test_rich.fastq
@@ -0,0 +1,40 @@
+@32e13a1c-4171-4706-b6ce-a32c0f65fa16 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=9 ch=282 start_time=2021-04-20T17:00:40Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+GATCTGGGTGTTTTAACTTGATCCCGCTAATGGCTTCTAACTTCGTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTCGTGCGCCGCTTCACATGTTACCTTCTTCATCTACAATAAAATTGTTGATGAGCCCCTGAAGAACATGTCCAAATTCACACAATCGACGGTTCATCCGGAGTTGTTAATCCAGTAATGGAACAATTTATGATGAACCGACGACGACTACCAGTGCCTTTGTAAGCACAGCTGATGAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACACGTTAATAGTTAATAGCGTACTTCTTTTG
++
+$#$#%&).6/*.-,,'##$.)*46$$$,$$;77;?B=6::<<>::<D;6465:@HHH@DHCAEB=<:BEC8B@9DHCB;@C=@0431289764+-+)+),+-55476EE<AB?BD=EFIKHFDBAB=<;323+'%(+.,&''',-+<8:@B?>9<228;<>DA;A<7<IJFCDA8575;=>>@=6.550.47===<?),09;?BBFHCED?9:::<;53251)),.'$%$(&45<==1/....%%'+16889;::<48<>>0095731+0;667?==>C@A79??6;.7/*++-1')69<=>>>??AD@=@8:?=@?GDC>A:50#
+@b87f011e-b802-4993-8f56-fd240b2e784f runid=5a21d8a6996146deceeaea3784244c52741cae93 read=19 ch=213 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+TTGTACTTCGTTCGGTGCAGATGGTGTTTAACCTCAATCAAAGACGACAGGTGTTTTCGCATTTATCGTGAAACGCTTTCGCCCAGCATTTTCGTCCCGCCACTTCACTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAACTTCACAACTGCTCCTGCCATTTGTCTGGAAACACTTTCTGTGAAGGTGTCTTTGTTTCAAGTAAACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACACACAACATTTGTGTCTGGTAACTGTGATGTTGCTTAGCGGAATTGTCAACAACACAGTTTATGATCTTTGCAACCTGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACGTACCAGATGTTGGTTGGGAA
++
+%&$&#&'('*,-.'))%%$%#%%'2157//+2/037764-+*(*)''&((496;@<4,'(**.1+++(*))6:6).-///%&*&''(&(+++('($&$'((($$%%%&%.,.004+31211.++,..534;;8<6;)53430(,9<54/8958./0/-'&'**/84/42*'(*,*+3343.'$#/06350>678;>>9>C59/0&&''&&(%%#(17'$-20//557-&),+-1;::6878840,1())78<>D;8<:4'8:;=>/<;;=0'143//../(+)%2435(0*'$$(($$$'%))*-/0+-21-*'''90<-'+-//.$,('.)))%.$%'+2+++,==>=<:=<74-&')/740.-.485776<87-.699::0//4'&)7=;:7623-%&0*%'%##
+@6f64aedb-bb8e-4777-b494-43e661841e06 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=13 ch=67 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+ATAGCCGCCGTTCATTGCATCTTAACGCGTTCAGTTATATTTGTTGGAATTGTTTAACCCTTATCCAGGGTTTAACCAGCAACTTTGTTTTCGCATTTATCGTGAAAACGCTTTCGCGTTTTCAATTGCGCCGCTTCAACATTACAAATACCATTTGCTATGCAAATGGCTTATAGATTTAATGGTATTGGAGTTACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAAAATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAA
++
+&%$'(($'%,12'(&($$$%&'*&$$')/*..+36(#&#$%$(&'''&((+5870.(&'&%)%57-&((('0*%%#$&%(((&%264;ACC=:ADCD@@B:+-(%&$$$$'''$$&('$(%&&%%&0+6586*057;455&&)1235908>@BABF?D:DBAFGH>;;:>@@;9('$%%)((%%),,,.7.0==<76@<@=A=<;1F=C9A64=>ADEDC9?7<967435>=:<=@EFHIJOKH>=G?D>DAE>?C@C;>:@<AD:?<@>>>EIG>CD>?H><;HIJ:BDC<>?GDEPIIH=@?7*6<?A<?:9=A@BAE@?CEF:9=?@A7>AB>DB>??-37>A=AA@A97-.
+@c372fb2c-dd45-4feb-81b2-c167c3d1ce93 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=18 ch=337 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+ATACTTCGTTCAGTTATCGAAGGTGGGTGTGGCTTGCTGGTGTGTCCTGACGGTAGGTTCACCATTTATCAGTGAGCATTTCACAGAGTTTTGCACAATTGCGCCCTTCCCCATGGTAGATGGGTAAAGTGGGAGGCATCCTGCAAACCTGCTCTGAAGTGGCAGAACTCCTCTCCCATTCTCTGGACCTGCCATGTGGCCACATCCAGCTTCAGGGAGTTTGGGAGGGCCCAGAAGAAAGAAGGGAAACATTGTGTGGGCACACACCAACCCACCTGTCTCAACTCCCCTCAGCTGGTAACAGGAAGAGAATCCTT
++
+'0%''(&.00,+/0-#&$&&$&&-(,,)(&%&##$#$'%'*(($(*,&*,*(*''+02*&$$%('+&'(&'&('%%$$#'(*$$#&#'#&%$$$$%%%'/'&&&&(,45751(+$&%&&&''*+)675+:35-''&+013*%*2/1,+48:8</))%*/522-+++)<695451640**56:8:9:<64234..1/+-9;,-0:=8645:8:66:86-176-''-+$'&-/-$).*',+(')*$++((&&$$&'**-)012.5/8::;45794278;6<::90.7;@=641@@=<;<<98:9<;271;3631;8($%
+@18d04e8d-2816-4986-8e1b-e5be676837fc runid=5a21d8a6996146deceeaea3784244c52741cae93 read=18 ch=507 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+GTATTCACTTCAGTTCAATTTATAATTTGGGTGTTTTAGCAAATGCAACTTTCCCACAGGTAGTTCGTTTCTTCATTTATGGCAAATACTTTCTGCACATTCATTGCTGCTTTCTAGTCATTCATCGTTGTCAATTGGAAGCGTATTTCTTAAAGATCATCAGCAATTTTATTGTGAATGAAGAAGGTGTGTATGTTCGCTTTCCAATTGTCTGTACTCAATTGAGTTGAGTACAAGCTGGTATCAATCGAAGTGAAATAACAACAACCTAGCGATCTTTTTCTCCAGATTCCCATTTTTCGGAAATTATAACCACCAATCTGAGTAGTCTATGTTCAGAAATAGGACTTGTTGTGCCATCACCTGAAGTAATGACAATTGGAAGAAGTTACACTATTGTAGACGCTGCAATGGTCGTAACAATTAGTATGCAAGCAAAGAAAATAGTTGACATCCATAAAGTAATGAGTTTTTTGGAACATTTCAGCAAAGCCGAAAAACCTCGTATTATTCTTTAAGTTTATACTCACCTGAAATGACTAAAGCATAAAGATAGAAAGACTCTGACCAGCAGCATAACTGAGCAAAAGGTGTGAGTAAGCTGTACTAAACACTTCATAACTTAGTTGCAAACCAAAGTGAACTTTATGGAAGTTGCTGAGATTGCCATCTCTGCTGGGGTTGTATGATTTTGGAAGCGCTCTGAAAAACAGCCAAAGTGCAACGCTAACCATCAGCCAATACGAAGGTGAGGCTTGCTCATCAGTATCGTTGCAGTAGCGCGAACAAAATCTGAAGGAGTAGCATCCTTGATTTCACCTTGCTTCCAAAGTTACCAGTTCCAATTGTAGAGGTCATAAACAATCCATAAGTTCGTTTATGTGTAGTAATTTGACTCCTTTGAGCACTGGCTCAAGTCGATCTTCATCAAATTTGCAGCGCAGTCCACAAGAACAACAGCCCTT
++
+#%#%%'&''&%$$640%'%($$#$$#(-(*197=E@<20401),(&&&,09><:78344(%%64A@<A>71$$&%&),'('%%&%$#$%%))$$##$$$''%%#&##$#$&(('$%%%%%$&%&)%&%,%%#%%&(#&$##($$$$,.+-,*++(%.$$-+5(796:B@7**,%&$$,-*.5,,**%%%&$%%&%,+#&%'))(**))0+255596564:<<>92:<57%*''''$%%%$%'*$$%%%%$%&%.&+)&%#$$%%%&#%((($$%#-,06871)..0,.')1'&&),/04*0%&&%#&87@HF;;B?=?A=9('%&''%)(#%+18-17*976;F<=?ACDAAC=6(;<>@=DBB:;;55780/56675571-73/2*/334653($$(%$%%(&#$)'.--,*+9489>7<3532%%%%&$'$,&/*,&%.,'%./(2-+).,222,'110('*(+(%.6;:88,%&%(($',)/5-234-')&%'.,)$*-22%+++./3;555,'&(+50/%)-23*'$(%++//341-BDF7;:99.((92+%,+)%-+-.&)*&-%&%&&&##'(#$)+29:;3'9>>=>3).001)%$%'%%&-,'&$$#$%$%/(%$$$%-7(0*,$(+*,0162233))*$+$))&&$&###%#&$)10566655-&%%(&''*--''6>AAAAC;:344)@A@B<@;?9)6('',$-)*()0-,000(&%.-)()&%)#$$)$###%%(%).*%)'##(##(%%,)%9=AH==>>?>;?@54G@@9?A<57?A>=@<=<96321-(,.11,*7:9:;A=9B<?;+%%+#%#$&%*/%%((09>4==?@1+)&&(''++)*/0,,77(3.)++2+ADD9EFI@>.*21/&&&&()4883>>989;.*+/-+,..3,3,,*0,''.2.5/256&*7778*('-**'-/655..,9;=64&%&('**('(
+@aa81ca34-9310-42fd-9893-33112e283acc runid=5a21d8a6996146deceeaea3784244c52741cae93 read=19 ch=244 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+TACATGTACTTCGTTCAGGCTAGGTGTTTTTAACCGTAACCTATCGTGTTTCCCCTAGTTTTCGCATTTATCGTGCATTGCTTTCGCGTTTTTCGTGCGCCGCTTCATCTGGCATTAATGCTTCCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTGCCAAGAATTTAAATGAATCTCTGTCGATCTCCAAGAACTTGGAAAGTATGACAGTATATAAATGACATGTACATTTGGCTAGGTTTTTATAGCTGGCTTGATTGCCATATAGTAATGGTGACAATTATGCTTTGCTGTATGACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACGACTCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAAACGAACTTATGGATTTGTTTATGAGAATCTTCACAATTGGAACTAACTTTGAAGCAAGGTGAAATCAGGATGCTACTCCTTCAGATTTTGTTCGCGCTACTGCAACGATGCCGATACAAGCCTCACTCCCTTTCGGATGGCTTATTGTTGGCGTTGCACTTCTTGCTGTTTTTCATAGCGCTTCCAAAATCATAACCCTCAAAGAGATGGCAACTAGCACTCTCCAGATTGTTCACTTTGTTTGCAACTTGCTGTTGTTGTTTGTAACAA
++
+#$$###'(334306/&$&$%+-34>:?CA=;92).&))(48>BD>9A;AAEB;=05014?D:<-4469:5:5*%$$$#'+--1002A;@HLI=999:A/:<3'';ABC@BA::444.')&%$&$,*8@E70::47@AA;=>9)$/33135>>:</<>0>CDDCG=@>H>3<)5/%'116@AB@9;<?9()&$+&010387@/?JGFF@B76:3+2;;823+(8=<=9@@?44#%%#$'+)0?@BH@B@@9:BCEFD?CI.4@<:<?<=?AAC3=:34''4;@MJIB@B<<5&&%(=@>@GHGHFE>DDFAG?B?ANH<87-*%&<54<:@?FF?6BAA8EGA@B?B@AC:<;?68?@D:?A58?>=@87<..37<88>>@<?;<:?A?C@86<622,)(%7:*5/8<>2???BA@9:AB???8?GCDCFGBDBFEEDBGE;./66;>:9513/&),,,/&&$##$''1264(%+(326)1<-77AA.C=CEFF=@6=G??DFACBEFHH>,B@>-('14554./(*/(&&%59=<==)44-:;A=2=@==>;@=948;<5<;>E>>>?A?98=;?@=?B@HH<BABJ?@IIFK<>222(&39EHEFGIG@=>--@@HF>=A51%.6;@BC>@22;($:.("$$$#&#'$%(35)6$547?<EFJ@C>?DDD8J@@BF?EF@FF@CAA54&&
+@c746fb2f-78f6-4a0a-9c75-39465c855c8d runid=5a21d8a6996146deceeaea3784244c52741cae93 read=35 ch=379 start_time=2021-04-20T17:00:42Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+GTCATGGCGCTGGTTCAGCTCGATCTTGTACTTCGTTCCAGTTCAGTGGGTGTTTAACGAGTGGAAAAGGCTGGAGACCGTTTTCGCATTTATCGTTTCGCGTTTTTCGTGCGCCGCTTCATTGTTTGATGAAGCCAGCATCTCGTGTCACTTTGTTGAAAATGAATCTTCAATAAATGACCTCTTGCTTA
++
+%,)$$%'+**)()-**&$&(-))*)$$$&&&&&*02751.,$(%#$&$&%+'+,)#&&)(*/)-0/.,--8.-+(.2489>@@80%%*-.-//)+%%969@@ADGD>86;')*78587:?=ED@FGGECC>9.562.9:79.'&%**$*0357;49<5363''$$6;9;>18;:;:$8:980:<=<+00/$
+@99a108d2-8e72-42bf-bebf-ad8373cfe450 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=38 ch=177 start_time=2021-04-20T17:00:42Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+TGTGGCCTTTTAATTCAGTTACTGATTTGGTGTTTAACCTCGCCACACTCATAGAGGTCACACGGTGTCGCATTTATGAAACGCTTTCGCGCGTTTTTCGTGCGCCACTTCACTGAAAAATGCATTAGGTAAAAGACTGTGGCTAGCATTACACAGTTACTTCACTTCAGACTATTACCGACATACTCAACTCAATTGGTGCAGACATAAGTGTTGAACATATTTACCTTCTTCATCTACAATAAAATTGATGATGAACCTGAAAATTTATGTCCAAATTCCACTAATCGACGGTTCATCAGGTTGACCCAATCCAGTAATGGAACCAATTTATGATGAACCGACGACGACTACAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACAGACTTGTAGCACTCATTCGTTTCGGGAAGAGACAGGTACGTTAATAGTTAACTTAATATGCTTCTTTT
++
+($.((('&'&$$(()#$'*'##%#%$%++,/.*+)435256573%14=90,)'$%-),-%)%&$''%(&$&')/.++(*,)((&&)).''%564=A?<777/..00(8898:5.14314.))'&')%)7:>?6/);7,/&&%%*($')-3)%'&%&4;:=??::6<;99894&$%'&'&%#%%&%*0565@?90-01%(+&&%$$$%'&**5358$$3.-6((@B<<@BGBEDBAKDDC?DE@B=6)**,$/)&%''$-'((,('&$&$%%445;47//8-($$$')('()(&/79.66)%0(('&&&,,12/:4224<=??C@>9;%=ACFCB=<A??CD><>3/,-55++'$'/4;A87:A@?;(1+7846??>;><:@A@;?A.,,7-*+-..-%%%(+00:979<75*-DAB(,45.(?;<;;9>:4,+%&2.-,$$&&&%#%$**3**0-*
+@5d01447f-f17b-4acb-b87e-d60d8aeeccc8 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=21 ch=417 start_time=2021-04-20T17:00:41Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+ATGATGGCCTTCTAGATTTCAGGCATTTGGTGTTTAACCCGACGTAAGTGGTTTTCGCATTTATCGTGGCTTTCGCGTTTTTCGTTGCCGCTTCATTACTATTAGTGTTACCACAGAAATTCTACCAGTGTCTATGACCAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCAACTGAATGCAGCAATCTTTTGTTGCAATATGGCGGATTTTTGTACACAATTAAACCGTGCTTTAACTGGAATAGCTGTTGAATAAGACAAAAACACCCAAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCGCCAATTAAAGATTTTGGTGGTTTAATTTTTCACAAATATTGTAGATCCATCAAAACCAAGCAAGAGGTCATTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCCATCAAACAATATGGTGATTGCCTTGGTGATATTGCTGCTAGGGCCATTTGTGCACAAAGTTTAGCGGCCTTACTGTTTTGCCACCTTGCTCACAGATGAAATGACCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGGACCTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCTATAGAGTTTAATGGTATTGAGTTACA
++
+(*+*+''%&),$&&%%%+(($)(&$#&%$&3*/2-/.($(%%&(()&,*)-2>?<6096688'<-1,++1/28277;@?996*,+)%%%&148456;A9=?=>==E?>=C@>:4326=<?BA:/;624EEDAE@1...*)*8<BC0+5*69:=<2B=<ABFD=;AAAA:,813*BDEB9F=F=;9595762>IJGBFILJBAB54831($%)+%'$148;86744.21312BH???>GCFGK@C?BC<<?@=;?@C?A>*(2$(.045?@6CB8?=<;@A:*=>>>90146>>:@A?AA:GHGFF>,./0.'&%(%%)4ABEFRQOHFGBGCG<?ADGI;<;?;:94/.75:41-,*##$%=>=8,=@CEEFDEAC38/5#%1.11/241-/,-/0-174+)39=DB>791;=@>B@?>;;?B:===;?45<942246*>ABCDBA<><66?>AGHG:C@BBA?==::1.-/.21016.1&%('$&*.'<78..==3-?A@:%?7:ADCF/EE?>BB=21:8?3=?,,.),2>@AA;8:=6220143=:32>?DJIGE=>D;?8,++,.)**2::358=@?>==6882424;<<;+/0,).166($-&+--/67?@==GEFHEFA8962-%#%(%%%$'&<:77C=<><>?@*=<>:;%
+@b0279f8e-e988-44c5-895f-201b68217623 runid=5a21d8a6996146deceeaea3784244c52741cae93 read=32 ch=435 start_time=2021-04-20T17:00:43Z flow_cell_id=FAP67897 protocol_group_id=2021-04-20_UKBC sample_id=RNAsst10002_spike_BA barcode=unclassified barcode_alias=unclassified
+AAATCATGGCCACTTCGTTCAGTTACGGAAAGGTAAGATTGTTTAACCGTCGATACTGGTTCTCATGGACCGCATTTATCGTGAAGCGCTTTCGCGCGTTTTCGTCGCCCGCTTCATGAAAATTAAAACCACCAAAATCTTTAATTGAATTTTGGTGTTTTGTAAATTTGTTTGACTTGTGCAAAAACTTCTTGGGTGTTTTTGTCTTGTTCAACAGCTATTCCAGTTAAAG
++
+('&.-'&&(((&**+'-./-,-/0&%&&**-,,*.03..77<>CAB??;@6542,+**&%)$(($%%&%$$#%&')-094)'%'($%$&.12..($44871.+()#%*-(*,2648A?GFA?-CCBC9:@11?@B@=69AA:+++,,###%(*14:6<<<4.4=;99:A=>=/33365%+#%9;BC<8GH>BCC3=96>>GLIBA<A?9>A812+:&<><;<8-'.::;;0'

From d6c9475ccf825f2df5666cdd0baf4048e98b8812 Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 26 Oct 2024 15:07:08 +0200
Subject: [PATCH 32/42] Add agat sp statistics (#107)

* add help

* add config

* add running script

* add test data and expected output + script to fetch them

* add tests

* update changelog

* cleanup

* config: replace `-d` by a longer name `--plot`

* add set -eo pipefail to script and test files

* create temporary directory and clean up on exit

* improve config: add requirements, add keywords, format description,..

* cleanup changelog

* PR fixes, extended unit tests

* Smaller test data, small changes to version format and config format

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: jakubmajercik <jakub.majercik@gmail.com>
Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 CHANGELOG.md                                  |  7 ++
 src/agat/agat_sp_statistics/config.vsh.yaml   | 93 +++++++++++++++++++
 src/agat/agat_sp_statistics/help.txt          | 60 ++++++++++++
 src/agat/agat_sp_statistics/script.sh         | 26 ++++++
 src/agat/agat_sp_statistics/test.sh           | 65 +++++++++++++
 src/agat/agat_sp_statistics/test_data/1.gff   | 78 ++++++++++++++++
 .../agat_sp_statistics/test_data/script.sh    | 14 +++
 .../test_data/stats_out.txt                   | 93 +++++++++++++++++++
 8 files changed, 436 insertions(+)
 create mode 100644 src/agat/agat_sp_statistics/config.vsh.yaml
 create mode 100644 src/agat/agat_sp_statistics/help.txt
 create mode 100644 src/agat/agat_sp_statistics/script.sh
 create mode 100644 src/agat/agat_sp_statistics/test.sh
 create mode 100644 src/agat/agat_sp_statistics/test_data/1.gff
 create mode 100755 src/agat/agat_sp_statistics/test_data/script.sh
 create mode 100644 src/agat/agat_sp_statistics/test_data/stats_out.txt

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9e59f784..dbc4d95d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,8 @@
 
 * `agat`:
   - `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
+  - `agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).
+
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
 
@@ -49,12 +51,16 @@
                 based on a provided sequence IDs or region coordinates file (PR #85).
 
 * `agat`:
+  - `agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
+  - `agat_convert_bed2gff`: convert bed file to gff format (PR #97).
+  - `agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
   - `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
   - `agat/agat_convert_bed2gff`: convert bed file to gff format (PR #97).
   - `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
   - `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
   - `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
 
+
 * `bedtools`:
   - `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
@@ -91,6 +97,7 @@
 
 * `trimgalore`: Quality and adapter trimming for fastq files (PR #117). 
 
+
 ## MINOR CHANGES
 
 * `busco` components: update BUSCO to `5.7.1` (PR #72).
diff --git a/src/agat/agat_sp_statistics/config.vsh.yaml b/src/agat/agat_sp_statistics/config.vsh.yaml
new file mode 100644
index 00000000..6890bb84
--- /dev/null
+++ b/src/agat/agat_sp_statistics/config.vsh.yaml
@@ -0,0 +1,93 @@
+name: agat_sp_statistics
+namespace: agat
+description: |
+  The script provides exhaustive statistics of a gft/gff file. 
+  
+  If you have isoforms in your file, even if correct, some values calculated
+  might sounds incoherent: e.g. total length mRNA can be superior than the
+  genome size. Because all isoforms length is added... It is why by
+  default we always compute the statistics twice when there are isoforms,
+  once with the isoforms, once without (In that case we keep the longest
+  isoform per locus).
+keywords: [gene annotations, statistics, gff]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_statistics.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+ - commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-i]
+        description: Input GTF/GFF file.
+        type: file
+        required: true
+        example: input.gff
+      - name: --gs_fasta
+        description: |
+          Genome size directly from a fasta file to compute more statistics.
+        type: file
+        example: genome.fasta
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: [-o]
+        description: |
+          The file where the results will be written.
+        type: file
+        direction: output
+        required: true
+        example: output.txt
+  - name: Options
+    arguments:
+      - name: --plot
+        alternatives: [-p, -d]
+        description: |
+          When this option is used, an histogram of distribution of the features will be printed in pdf files.
+        type: boolean_true
+      - name: --gs_size
+        description: |
+          Genome size in nucleotides to compute more statistics.
+        type: integer
+        example: 1000000
+      - name: --verbose
+        alternatives: [-v]
+        description: |
+          Verbose option. To modify verbosity. Default is 1. 0 is quiet, 2 and 3 are increasing verbosity.
+        type: integer
+        example: 1
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config`
+          option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/.*v\.//; s/\s.*//' | sed 's/^/AGAT: /' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_sp_statistics/help.txt b/src/agat/agat_sp_statistics/help.txt
new file mode 100644
index 00000000..fa6ef24d
--- /dev/null
+++ b/src/agat/agat_sp_statistics/help.txt
@@ -0,0 +1,60 @@
+```sh
+agat_sp_statistics.pl --help
+```
+
+  ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_sp_statistics.pl
+
+Description:
+    The script provides exhaustive statistics of a gft/gff file. /!\ If you
+    have isoforms in your file, even if correct, some values calculated
+    might sounds incoherent: e.g. total length mRNA can be superior than the
+    genome size. Because all isoforms length is added... It is why by
+    default we always compute the statistics twice when there are isoforms,
+    once with the isoforms, once without (In that case we keep the longest
+    isoform per locus).
+
+Usage:
+        agat_sp_statistics.pl --gff file.gff  [ -o outfile ]
+        agat_sp_statistics.pl --help
+
+Options:
+    --gff or -i
+            Input GTF/GFF file.
+
+    --gs, -f or -g
+            This option inform about the genome size in oder to compute more
+            statistics. You can give the size in Nucleotide or directly the
+            fasta file.
+
+    -d or -p
+            When this option is used, an histogram of distribution of the
+            features will be printed in pdf files. (d means distribution, p
+            means plot).
+
+    -v or --verbose
+            Verbose option. To modify verbosity. Default is 1. 0 is quiet, 2
+            and 3 are increasing verbosity.
+
+    --output or -o
+            File where will be written the result. If no output file is
+            specified, the output will be written to STDOUT.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    -h or --help
+            Display this helpful text.
\ No newline at end of file
diff --git a/src/agat/agat_sp_statistics/script.sh b/src/agat/agat_sp_statistics/script.sh
new file mode 100644
index 00000000..9865c4b2
--- /dev/null
+++ b/src/agat/agat_sp_statistics/script.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# unset flags
+[[ "$par_d" == "false" ]] && unset par_d
+
+if [[ -n "$par_gs_size" && -n "$par_gs_fasta" ]]; then
+  echo "[error] Please provide only one of the following options to set genome size: --gs_size or --gs_fasta"
+  exit 1
+fi
+
+# run agat_sp_statistics
+agat_sp_statistics.pl \
+  -i "$par_gff" \
+  -o "$par_output" \
+  ${par_plot:+-d} \
+  ${par_gs_size:+--gs "${par_gs_size}"} \
+  ${par_gs_fasta:+--gs "${par_gs_fasta}"} \
+  ${par_verbose:+--verbose "${par_verbose}"} \
+  ${par_config:+--config "${par_config}"}
+
+
diff --git a/src/agat/agat_sp_statistics/test.sh b/src/agat/agat_sp_statistics/test.sh
new file mode 100644
index 00000000..35f42ee0
--- /dev/null
+++ b/src/agat/agat_sp_statistics/test.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+
+set -eo pipefail
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+cd "$TMPDIR"
+
+mkdir test1
+pushd test1
+
+echo "> Run $meta_name with test data and --emblmygff3"
+"$meta_executable" \
+  --gff "$test_dir/1.gff" \
+  --output "output.txt" \
+
+echo ">> Checking output"
+[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "output.txt" ] && echo "Output file output.txt is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "output.txt" "$test_dir/stats_out.txt"
+if [ $? -ne 0 ]; then
+  echo "Output file output.txt does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
+
+
+popd
+mkdir test2
+pushd test2
+
+cat <<EOF > genome.fasta
+>sample_sequence
+ATGCGTACGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
+EOF
+
+echo "> Run $meta_name with both gs_size and gs_fasta"
+error_message=$("$meta_executable" \
+  --gff "$test_dir/1.gff" \
+  --output "output.txt" \
+  --gs_size "1000000" \
+  --gs_fasta "genome.fasta" 2>&1 || true)
+
+expected_error="[error] Please provide only one of the following options to set genome size: --gs_size or --gs_fasta"
+if [[ "$error_message" != *"$expected_error"* ]]; then
+  echo "Output error message: $error_message does not match expected error message: $expected_error"
+  exit 1
+fi
+
+echo "> Error test successful"
+
+echo "---- All tests succeeded! ----"
+exit 0
\ No newline at end of file
diff --git a/src/agat/agat_sp_statistics/test_data/1.gff b/src/agat/agat_sp_statistics/test_data/1.gff
new file mode 100644
index 00000000..775d14fd
--- /dev/null
+++ b/src/agat/agat_sp_statistics/test_data/1.gff
@@ -0,0 +1,78 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+###
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+###
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	3354	3616	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	4357	4455	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	5457	5560	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	7136	7944	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8028	8150	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8232	8320	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8408	8608	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	9210	9615	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10102	10187	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10274	10430	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01
+###
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	Parent=transcript:Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	exon	12152	12435	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	Parent=transcript:Os01t0100200-01
+###
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	exon	12146	12284	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+###
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	Parent=transcript:Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	13906	14271	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14359	14437	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14969	15171	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	15266	15685	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	Parent=transcript:Os01t0100400-01
+###
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00
\ No newline at end of file
diff --git a/src/agat/agat_sp_statistics/test_data/script.sh b/src/agat/agat_sp_statistics/test_data/script.sh
new file mode 100755
index 00000000..5b1133ac
--- /dev/null
+++ b/src/agat/agat_sp_statistics/test_data/script.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/1.gff src/agat/agat_sp_statistics/test_data
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_statistics_1.txt src/agat/agat_sp_statistics/test_data
+
+# keep only the first 78 lines of 1.gff
+head -n 78 src/agat/agat_sp_statistics/test_data/1.gff > src/agat/agat_sp_statistics/test_data/1.gff.tmp
+mv src/agat/agat_sp_statistics/test_data/1.gff.tmp src/agat/agat_sp_statistics/test_data/1.gff
\ No newline at end of file
diff --git a/src/agat/agat_sp_statistics/test_data/stats_out.txt b/src/agat/agat_sp_statistics/test_data/stats_out.txt
new file mode 100644
index 00000000..b160ea52
--- /dev/null
+++ b/src/agat/agat_sp_statistics/test_data/stats_out.txt
@@ -0,0 +1,93 @@
+--------------------------------------------------------------------------------
+
+---------------------------------- chromosome ----------------------------------
+Number of chromosome                         1
+Number chromosome overlapping                0
+Total chromosome length (bp)                 43270923
+mean chromosome length (bp)                  43270923
+Longest chromosome (bp)                      43270923
+Shortest chromosome (bp)                     43270923
+
+-------------------------------- repeat_region ---------------------------------
+Number of repeat_region                      1
+Number repeat_region overlapping             0
+Total repeat_region length (bp)              101
+mean repeat_region length (bp)               101
+Longest repeat_region (bp)                   101
+Shortest repeat_region (bp)                  101
+
+------------------------------------- mrna -------------------------------------
+Number of gene                               5
+Number of mrna                               5
+Number of mrnas with utr both sides          4
+Number of mrnas with at least one utr        4
+Number of cds                                5
+Number of exon                               23
+Number of five_prime_utr                     4
+Number of three_prime_utr                    4
+Number of exon in cds                        20
+Number of exon in five_prime_utr             6
+Number of exon in three_prime_utr            5
+Number of intron in cds                      15
+Number of intron in exon                     18
+Number of intron in five_prime_utr           2
+Number of intron in three_prime_utr          1
+Number gene overlapping                      2
+mean mrnas per gene                          1.0
+mean cdss per mrna                           1.0
+mean exons per mrna                          4.6
+mean five_prime_utrs per mrna                0.8
+mean three_prime_utrs per mrna               0.8
+mean exons per cds                           4.0
+mean exons per five_prime_utr                1.5
+mean exons per three_prime_utr               1.2
+mean introns in cdss per mrna                3.0
+mean introns in exons per mrna               3.6
+mean introns in five_prime_utrs per mrna     0.4
+mean introns in three_prime_utrs per mrna    0.2
+Total gene length (bp)                       14100
+Total mrna length (bp)                       14100
+Total cds length (bp)                        5364
+Total exon length (bp)                       8107
+Total five_prime_utr length (bp)             1793
+Total three_prime_utr length (bp)            950
+Total intron length per cds (bp)             5738
+Total intron length per exon (bp)            5993
+Total intron length per five_prime_utr (bp)  182
+Total intron length per three_prime_utr (bp) 73
+mean gene length (bp)                        2820
+mean mrna length (bp)                        2820
+mean cds length (bp)                         1072
+mean exon length (bp)                        352
+mean five_prime_utr length (bp)              448
+mean three_prime_utr length (bp)             237
+mean cds piece length (bp)                   268
+mean five_prime_utr piece length (bp)        298
+mean three_prime_utr piece length (bp)       190
+mean intron in cds length (bp)               382
+mean intron in exon length (bp)              332
+mean intron in five_prime_utr length (bp)    91
+mean intron in three_prime_utr length (bp)   73
+Longest gene (bp)                            7833
+Longest mrna (bp)                            7833
+Longest cds (bp)                             2109
+Longest exon (bp)                            1093
+Longest five_prime_utr (bp)                  779
+Longest three_prime_utr (bp)                 445
+Longest cds piece (bp)                       1040
+Longest five_prime_utr piece (bp)            680
+Longest three_prime_utr piece (bp)           326
+Longest intron into cds part (bp)            1575
+Longest intron into exon part (bp)           1575
+Longest intron into five_prime_utr part (bp) 97
+Longest intron into three_prime_utr part (bp)73
+Shortest gene (bp)                           913
+Shortest mrna (bp)                           913
+Shortest cds piece (bp)                      24
+Shortest five_prime_utr piece (bp)           53
+Shortest three_prime_utr piece (bp)          61
+Shortest intron into cds part (bp)           81
+Shortest intron into exon part (bp)          73
+Shortest intron into five_prime_utr part (bp)85
+Shortest intron into three_prime_utr part (bp)73
+

From 52f44f5049606ac655154cf54ed53fa76b49896f Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 26 Oct 2024 15:07:43 +0200
Subject: [PATCH 33/42] Add agat sp add introns (#104)

* add help

* add config

* add run script

* add test data and expected output + script to fetch them

* add tests

* update changelog

* Update src/agat/agat_sp_add_introns/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_add_introns/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_add_introns/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* # create temporary directory and clean up on exit

* add set -e to test

* fix create temporary directory

* fix create temporary directory

* add set -eo pipefail to test

* add set -eo pipefail to script

* remove file added by mistake

* update --config description

* cleanup changelog

* cleanup changelog

* minor changes to config

* reduce test data size

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 CHANGELOG.md                                  |   5 +-
 src/agat/agat_sp_add_introns/config.vsh.yaml  |  64 +++++++++
 src/agat/agat_sp_add_introns/help.txt         |  62 +++++++++
 src/agat/agat_sp_add_introns/script.sh        |  11 ++
 src/agat/agat_sp_add_introns/test.sh          |  34 +++++
 .../test_data/1_truncated.gff                 | 106 +++++++++++++++
 .../agat_sp_add_introns/test_data/script.sh   |  12 ++
 .../test_data/test_output.gff                 | 125 ++++++++++++++++++
 8 files changed, 418 insertions(+), 1 deletion(-)
 create mode 100644 src/agat/agat_sp_add_introns/config.vsh.yaml
 create mode 100644 src/agat/agat_sp_add_introns/help.txt
 create mode 100644 src/agat/agat_sp_add_introns/script.sh
 create mode 100644 src/agat/agat_sp_add_introns/test.sh
 create mode 100644 src/agat/agat_sp_add_introns/test_data/1_truncated.gff
 create mode 100755 src/agat/agat_sp_add_introns/test_data/script.sh
 create mode 100644 src/agat/agat_sp_add_introns/test_data/test_output.gff

diff --git a/CHANGELOG.md b/CHANGELOG.md
index dbc4d95d..a8cfc83a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -13,6 +13,10 @@
 
 * `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
 
+* `agat`:
+  - `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
+
+
 ## BREAKING CHANGES
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
@@ -94,7 +98,6 @@
     - `kallisto_index`: Create a kallisto index (PR #149).
     - `kallisto_quant`: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads (PR #152).
 
-
 * `trimgalore`: Quality and adapter trimming for fastq files (PR #117). 
 
 
diff --git a/src/agat/agat_sp_add_introns/config.vsh.yaml b/src/agat/agat_sp_add_introns/config.vsh.yaml
new file mode 100644
index 00000000..06ec8474
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/config.vsh.yaml
@@ -0,0 +1,64 @@
+name: agat_sp_add_introns
+namespace: agat
+description: |
+  Add intronic elements to a gtf/gff file without intron features.
+keywords: [gene annotations, GTF conversion]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_add_introns.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+  commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-f, --ref, --reffile]
+        description: Input GTF/GFF file.
+        type: file
+        required: true
+        example: input.gff
+  - name: Outputs
+    arguments:       
+      - name: --output
+        alternatives: [-o, --out, --outfile, --gtf]
+        description: Output GFF3 file.
+        type: file
+        direction: output
+        required: true
+        example: output.gff
+  - name: Arguments
+    arguments:
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option 
+          gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_sp_add_introns/help.txt b/src/agat/agat_sp_add_introns/help.txt
new file mode 100644
index 00000000..48dc1ace
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/help.txt
@@ -0,0 +1,62 @@
+```sh
+agat_sp_add_introns.pl --help
+```
+ 
+  ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_sp_add_introns.pl
+
+Description:
+    The script aims to add intron features to gtf/gff file without intron
+    features.
+
+Usage:
+        agat_sp_add_introns.pl --gff infile --out outFile
+        agat_sp_add_introns.pl --help
+
+Options:
+    --gff, -f, --ref or -reffile
+            Input GTF/GFF file.
+
+    --out, --output or -o
+            Output GFF3 file.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    --help or -h
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
\ No newline at end of file
diff --git a/src/agat/agat_sp_add_introns/script.sh b/src/agat/agat_sp_add_introns/script.sh
new file mode 100644
index 00000000..95cacee4
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+agat_sp_add_introns.pl \
+  -f "$par_gff" \
+  -o "$par_output" \
+  ${par_config:+--config "${par_config}"}
diff --git a/src/agat/agat_sp_add_introns/test.sh b/src/agat/agat_sp_add_introns/test.sh
new file mode 100644
index 00000000..d7144d91
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/test.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --gff "$test_dir/1_truncated.gff" \
+  --output "$TMPDIR/output.gff" 
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/test_output.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+echo "> Test successful"
\ No newline at end of file
diff --git a/src/agat/agat_sp_add_introns/test_data/1_truncated.gff b/src/agat/agat_sp_add_introns/test_data/1_truncated.gff
new file mode 100644
index 00000000..a86a94d9
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/test_data/1_truncated.gff
@@ -0,0 +1,106 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+###
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+###
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	3354	3616	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	4357	4455	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	5457	5560	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	7136	7944	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8028	8150	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8232	8320	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8408	8608	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	9210	9615	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10102	10187	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10274	10430	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01
+###
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	Parent=transcript:Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	exon	12152	12435	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	Parent=transcript:Os01t0100200-01
+###
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	exon	12146	12284	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+###
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	Parent=transcript:Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	13906	14271	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14359	14437	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14969	15171	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	15266	15685	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	Parent=transcript:Os01t0100400-01
+###
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00
+###
+1	irgsp	gene	16399	20144	.	+	.	ID=gene:Os01g0100500;biotype=protein_coding;description=Immunoglobulin-like domain containing protein. (Os01t0100500-01);gene_id=Os01g0100500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	16399	20144	.	+	.	ID=transcript:Os01t0100500-01;Parent=gene:Os01g0100500;biotype=protein_coding;transcript_id=Os01t0100500-01
+1	irgsp	five_prime_UTR	16399	16598	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	16399	16976	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100500-01.exon1;rank=1
+1	irgsp	CDS	16599	16976	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17383	17474	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100500-01.exon2;rank=2
+1	irgsp	CDS	17383	17474	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17558	18258	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100500-01.exon3;rank=3
+1	irgsp	CDS	17558	18258	.	+	1	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18501	18571	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100500-01.exon4;rank=4
+1	irgsp	CDS	18501	18571	.	+	2	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18968	19057	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon5;rank=5
+1	irgsp	CDS	18968	19057	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19142	19321	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon6;rank=6
+1	irgsp	CDS	19142	19321	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19531	19593	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19531	19629	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100500-01.exon7;rank=7
+1	irgsp	three_prime_UTR	19594	19629	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	three_prime_UTR	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01
+###
+1	irgsp	gene	22841	26892	.	+	.	ID=gene:Os01g0100600;biotype=protein_coding;description=Single-stranded nucleic acid binding R3H domain containing protein. (Os01t0100600-01);gene_id=Os01g0100600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	22841	26892	.	+	.	ID=transcript:Os01t0100600-01;Parent=gene:Os01g0100600;biotype=protein_coding;transcript_id=Os01t0100600-01
+1	irgsp	five_prime_UTR	22841	23231	.	+	.	Parent=transcript:Os01t0100600-01
+1	irgsp	exon	22841	23281	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1
+1	irgsp	CDS	23232	23281	.	+	0	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	23572	23847	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
diff --git a/src/agat/agat_sp_add_introns/test_data/script.sh b/src/agat/agat_sp_add_introns/test_data/script.sh
new file mode 100755
index 00000000..e5880652
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/test_data/script.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/1.gff src/agat/agat_sp_add_introns/test_data
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_add_introns_1.gff src/agat/agat_sp_add_introns/test_data
+
+head -n 106 "src/agat/agat_sp_add_introns/test_data/1.gff" > "src/agat/agat_sp_add_introns/test_data/1_truncated.gff"
\ No newline at end of file
diff --git a/src/agat/agat_sp_add_introns/test_data/test_output.gff b/src/agat/agat_sp_add_introns/test_data/test_output.gff
new file mode 100644
index 00000000..607907f6
--- /dev/null
+++ b/src/agat/agat_sp_add_introns/test_data/test_output.gff
@@ -0,0 +1,125 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	ID=Os01t0100100-01.exon1;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	exon	3354	3616	.	+	.	ID=Os01t0100100-01.exon2;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	exon	4357	4455	.	+	.	ID=Os01t0100100-01.exon3;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	exon	5457	5560	.	+	.	ID=Os01t0100100-01.exon4;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	exon	7136	7944	.	+	.	ID=Os01t0100100-01.exon5;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	exon	8028	8150	.	+	.	ID=Os01t0100100-01.exon6;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	exon	8232	8320	.	+	.	ID=Os01t0100100-01.exon7;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	exon	8408	8608	.	+	.	ID=Os01t0100100-01.exon8;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	exon	9210	9615	.	+	.	ID=Os01t0100100-01.exon9;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	exon	10102	10187	.	+	.	ID=Os01t0100100-01.exon10;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	exon	10274	10430	.	+	.	ID=Os01t0100100-01.exon11;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	exon	10504	10815	.	+	.	ID=Os01t0100100-01.exon12;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	ID=agat-five_prime_utr-1;Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	ID=agat-five_prime_utr-2;Parent=transcript:Os01t0100100-01
+1	irgsp	intron	3269	3353	.	+	.	ID=intron_added-1;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	3617	4356	.	+	.	ID=intron_added-2;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	4456	5456	.	+	.	ID=intron_added-3;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	5561	7135	.	+	.	ID=intron_added-4;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	7945	8027	.	+	.	ID=intron_added-5;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	8151	8231	.	+	.	ID=intron_added-6;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	8321	8407	.	+	.	ID=intron_added-7;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	8609	9209	.	+	.	ID=intron_added-8;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	9616	10101	.	+	.	ID=intron_added-9;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	10188	10273	.	+	.	ID=intron_added-10;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	intron	10431	10503	.	+	.	ID=intron_added-11;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	ID=agat-three_prime_utr-1;Parent=transcript:Os01t0100100-01
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	ID=agat-three_prime_utr-2;Parent=transcript:Os01t0100100-01
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	ID=Os01t0100200-01.exon1;Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	exon	12152	12435	.	+	.	ID=Os01t0100200-01.exon2;Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	ID=agat-five_prime_utr-3;Parent=transcript:Os01t0100200-01
+1	irgsp	intron	12061	12151	.	+	.	ID=intron_added-12;Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	ID=agat-three_prime_utr-3;Parent=transcript:Os01t0100200-01
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	ID=Os01t0100300-00.exon2;Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	exon	12146	12284	.	-	.	ID=Os01t0100300-00.exon1;Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	intron	12043	12145	.	-	.	ID=intron_added-13;Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	ID=Os01t0100400-01.exon1;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	exon	13906	14271	.	+	.	ID=Os01t0100400-01.exon2;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	exon	14359	14437	.	+	.	ID=Os01t0100400-01.exon3;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	exon	14969	15171	.	+	.	ID=Os01t0100400-01.exon4;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	exon	15266	15685	.	+	.	ID=Os01t0100400-01.exon5;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	ID=agat-five_prime_utr-4;Parent=transcript:Os01t0100400-01
+1	irgsp	intron	13814	13905	.	+	.	ID=intron_added-14;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	intron	14272	14358	.	+	.	ID=intron_added-15;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	intron	14438	14968	.	+	.	ID=intron_added-16;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	intron	15172	15265	.	+	.	ID=intron_added-17;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	ID=agat-three_prime_utr-4;Parent=transcript:Os01t0100400-01
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	ID=Os01t0100466-00.exon2;Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	exon	13880	13978	.	-	.	ID=Os01t0100466-00.exon1;Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	ID=agat-five_prime_utr-5;Parent=transcript:Os01t0100466-00
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	ID=agat-five_prime_utr-6;Parent=transcript:Os01t0100466-00
+1	irgsp	intron	13783	13879	.	-	.	ID=intron_added-18;Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	ID=agat-three_prime_utr-5;Parent=transcript:Os01t0100466-00
+1	irgsp	gene	16399	20144	.	+	.	ID=gene:Os01g0100500;biotype=protein_coding;description=Immunoglobulin-like domain containing protein. (Os01t0100500-01);gene_id=Os01g0100500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	16399	20144	.	+	.	ID=transcript:Os01t0100500-01;Parent=gene:Os01g0100500;biotype=protein_coding;transcript_id=Os01t0100500-01
+1	irgsp	exon	16399	16976	.	+	.	ID=Os01t0100500-01.exon1;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100500-01.exon1;rank=1
+1	irgsp	exon	17383	17474	.	+	.	ID=Os01t0100500-01.exon2;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100500-01.exon2;rank=2
+1	irgsp	exon	17558	18258	.	+	.	ID=Os01t0100500-01.exon3;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100500-01.exon3;rank=3
+1	irgsp	exon	18501	18571	.	+	.	ID=Os01t0100500-01.exon4;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100500-01.exon4;rank=4
+1	irgsp	exon	18968	19057	.	+	.	ID=Os01t0100500-01.exon5;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon5;rank=5
+1	irgsp	exon	19142	19321	.	+	.	ID=Os01t0100500-01.exon6;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon6;rank=6
+1	irgsp	exon	19531	19629	.	+	.	ID=Os01t0100500-01.exon7;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100500-01.exon7;rank=7
+1	irgsp	exon	19734	20144	.	+	.	ID=Os01t0100500-01.exon8;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	CDS	16599	16976	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	17383	17474	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	17558	18258	.	+	1	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	18501	18571	.	+	2	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	18968	19057	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19142	19321	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19531	19593	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	five_prime_UTR	16399	16598	.	+	.	ID=agat-five_prime_utr-7;Parent=transcript:Os01t0100500-01
+1	irgsp	intron	16977	17382	.	+	.	ID=intron_added-19;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	17475	17557	.	+	.	ID=intron_added-20;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	18259	18500	.	+	.	ID=intron_added-21;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	18572	18967	.	+	.	ID=intron_added-22;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	19058	19141	.	+	.	ID=intron_added-23;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	19322	19530	.	+	.	ID=intron_added-24;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	intron	19630	19733	.	+	.	ID=intron_added-25;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	three_prime_UTR	19594	19629	.	+	.	ID=agat-three_prime_utr-6;Parent=transcript:Os01t0100500-01
+1	irgsp	three_prime_UTR	19734	20144	.	+	.	ID=agat-three_prime_utr-7;Parent=transcript:Os01t0100500-01
+1	irgsp	gene	22841	26892	.	+	.	ID=gene:Os01g0100600;biotype=protein_coding;description=Single-stranded nucleic acid binding R3H domain containing protein. (Os01t0100600-01);gene_id=Os01g0100600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	22841	26892	.	+	.	ID=transcript:Os01t0100600-01;Parent=gene:Os01g0100600;biotype=protein_coding;transcript_id=Os01t0100600-01
+1	irgsp	exon	22841	23281	.	+	.	ID=Os01t0100600-01.exon1;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1
+1	irgsp	exon	23572	26892	.	+	.	ID=Os01t0100600-01.exon2;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
+1	irgsp	CDS	23232	23281	.	+	0	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	five_prime_UTR	22841	23231	.	+	.	ID=agat-five_prime_utr-8;Parent=transcript:Os01t0100600-01
+1	irgsp	intron	23282	23571	.	+	.	ID=intron_added-26;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
+1	AGAT	three_prime_UTR	23572	26892	.	+	.	ID=agat-three_prime_utr-8;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1

From ebbc0d45eed4983d1184595420b5940026c2fcc9 Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 26 Oct 2024 20:27:23 +0200
Subject: [PATCH 34/42] Add agat sp filter feature from kill list (#105)

* add help

* add config

* add run script

* add test data and expected output + script to fetch them

* update config: kill_list as Inputs

* all fetch kill_list.txt

* add tests

* update changelog

* run script: fixe `verbose` usage

* Update src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* update --config description

* add requirements

* format the description of --type

* update --config description

* update formatting --type description

* add mutliple to --type

* create temporary directory and clean up on exit

* convert par_type to comma separated list

* add set -e

* fix create temporary directory

* add set -eo pipefail to script and test files

* fix create temporary directory

* fix typo

* cleanup changelog

* cleanup changelog

* Minor chanegs to config

* reduce test data size

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 CHANGELOG.md                                  |   9 +-
 .../config.vsh.yaml                           | 105 +++++++++++++++
 .../help.txt                                  |  85 ++++++++++++
 .../script.sh                                 |  22 ++++
 .../test.sh                                   |  36 +++++
 .../test_data/1_truncated.gff                 | 123 ++++++++++++++++++
 .../test_data/kill_list.txt                   |   3 +
 .../test_data/script.sh                       |  13 ++
 .../test_data/test_output.gff                 | 113 ++++++++++++++++
 9 files changed, 503 insertions(+), 6 deletions(-)
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/help.txt
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/script.sh
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/test.sh
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/test_data/1_truncated.gff
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/test_data/kill_list.txt
 create mode 100755 src/agat/agat_sp_filter_feature_from_kill_list/test_data/script.sh
 create mode 100644 src/agat/agat_sp_filter_feature_from_kill_list/test_data/test_output.gff

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a8cfc83a..76a1e2ec 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,8 +4,9 @@
 
 * `agat`:
   - `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
-  - `agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).
-
+  - `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
+  - `agat/agat_sp_filter_feature_from_kill_list`: remove features in a GFF file based on a kill list (PR #105).
+  - `agat/agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
 
@@ -13,10 +14,6 @@
 
 * `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
 
-* `agat`:
-  - `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
-
-
 ## BREAKING CHANGES
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml b/src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml
new file mode 100644
index 00000000..0608ad4d
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/config.vsh.yaml
@@ -0,0 +1,105 @@
+name: agat_sp_filter_feature_from_kill_list
+namespace: agat
+description: |
+  Remove features based on a kill list. The default behaviour is to look at the features's ID. 
+  If the feature has an ID (case insensitive) listed among the kill list it will be removed.
+  Removing a level1 or level2 feature will automatically remove all linked subfeatures, and 
+  removing all children of a feature will automatically remove this feature too.
+keywords: [gene annotations, filtering, gff]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_filter_feature_from_kill_list.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+  - commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-f, --ref, --reffile]
+        description: Input GFF3 file that will be read.
+        type: file
+        required: true
+      - name: --kill_list
+        alternatives: [--kl]
+        description: Text file containing the kill list. One value per line.
+        type: file
+        required: true
+        example: kill_list.txt
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: [-o, --out]
+        description: |
+          Path to the output GFF file that contains filtered features. 
+        type: file
+        direction: output
+        required: true
+  - name: Arguments
+    arguments:
+      - name: --type
+        alternatives: [-p, -l]
+        description: |
+          Primary tag option, case insensitive, list. Allow to specify the feature types that 
+          will be handled. 
+  
+          You can specify a specific feature by giving its primary tag name (column 3) as: 
+
+            * cds
+            * Gene
+            * mRNA
+            
+          You can specify directly all the feature of a particular
+          level: 
+
+            * level2=mRNA,ncRNA,tRNA,etc 
+            * level3=CDS,exon,UTR,etc. 
+          
+          By default all features are taken into account. Fill the option with the value "all" will 
+          have the same behaviour.
+        type: string
+        multiple: true
+      - name: --attribute
+        alternatives: [-a]
+        description: |
+          Attribute tag to specify the attribute to analyse. Case sensitive. Default: ID
+        type: string
+        example: ID
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT.
+          The `--config` option gives you the possibility to use your own AGAT config file (located 
+          elsewhere or named differently).
+        type: file
+        example: custom_agat_config.yaml
+      - name: --verbose
+        alternatives: [-v]
+        description: Verbose option for debugging purpose.
+        type: boolean_true
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/help.txt b/src/agat/agat_sp_filter_feature_from_kill_list/help.txt
new file mode 100644
index 00000000..b0087916
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/help.txt
@@ -0,0 +1,85 @@
+```sh
+agat_sp_filter_feature_from_kill_list.pl --help
+```
+
+ ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_sp_filter_feature_from_kill_list.pl
+
+Description:
+    The script aims to remove features based on a kill list. The default
+    behaviour is to look at the features's ID. If the feature has an ID
+    (case insensitive) listed among the kill list it will be removed. /!\
+    Removing a level1 or level2 feature will automatically remove all linked
+    subfeatures, and removing all children of a feature will automatically
+    remove this feature too.
+
+Usage:
+        agat_sp_filter_feature_from_kill_list.pl --gff infile.gff --kill_list file.txt  [ --output outfile ]
+        agat_sp_filter_feature_from_kill_list.pl --help
+
+Options:
+    -f, --reffile, --gff or -ref
+            Input GFF3 file that will be read
+
+    -p, --type or -l
+            primary tag option, case insensitive, list. Allow to specied the
+            feature types that will be handled. You can specified a specific
+            feature by given its primary tag name (column 3) as: cds, Gene,
+            MrNa You can specify directly all the feature of a particular
+            level: level2=mRNA,ncRNA,tRNA,etc level3=CDS,exon,UTR,etc By
+            default all feature are taking into account. fill the option by
+            the value "all" will have the same behaviour.
+
+    --kl or --kill_list
+            Kill list. One value per line.
+
+    -a or --attribute
+            Attribute tag to specify the attribute to analyse. Case
+            sensitive. Default: ID
+
+    -o or --output
+            Output GFF file. If no output file is specified, the output will
+            be written to STDOUT.
+
+    -v      Verbose option for debugging purpose.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    -h or --help
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
\ No newline at end of file
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/script.sh b/src/agat/agat_sp_filter_feature_from_kill_list/script.sh
new file mode 100644
index 00000000..6779b857
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/script.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# unset flags
+[[ "$par_verbose" == "false" ]] && unset par_verbose
+
+# convert par_type to comma separated list
+par_type=$(echo $par_type | tr ';' ',')
+
+# run agat_sp_filter_feature_from_kill_list
+agat_sp_filter_feature_from_kill_list.pl \
+  --gff "$par_gff" \
+  --kill_list "$par_kill_list" \
+  --output "$par_output" \
+  ${par_type:+--type "${par_type}"} \
+  ${par_attribute:+--attribute "${par_attribute}"} \
+  ${par_config:+--config "${par_config}"} \
+  ${par_verbose:+-v}
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/test.sh b/src/agat/agat_sp_filter_feature_from_kill_list/test.sh
new file mode 100644
index 00000000..d9d775d5
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/test.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+#trap clean_up EXIT
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --gff "$test_dir/1_truncated.gff" \
+  --kill_list "$test_dir/kill_list.txt" \
+  --output "$TMPDIR/output.gff" 
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/test_output.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
\ No newline at end of file
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/test_data/1_truncated.gff b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/1_truncated.gff
new file mode 100644
index 00000000..e0fb6bce
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/1_truncated.gff
@@ -0,0 +1,123 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+###
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+###
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	3354	3616	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	4357	4455	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	5457	5560	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	7136	7944	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8028	8150	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8232	8320	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8408	8608	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	9210	9615	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10102	10187	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10274	10430	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01
+###
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	Parent=transcript:Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	exon	12152	12435	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	Parent=transcript:Os01t0100200-01
+###
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	exon	12146	12284	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+###
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	Parent=transcript:Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	13906	14271	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14359	14437	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14969	15171	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	15266	15685	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	Parent=transcript:Os01t0100400-01
+###
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00
+###
+1	irgsp	gene	16399	20144	.	+	.	ID=gene:Os01g0100500;biotype=protein_coding;description=Immunoglobulin-like domain containing protein. (Os01t0100500-01);gene_id=Os01g0100500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	16399	20144	.	+	.	ID=transcript:Os01t0100500-01;Parent=gene:Os01g0100500;biotype=protein_coding;transcript_id=Os01t0100500-01
+1	irgsp	five_prime_UTR	16399	16598	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	16399	16976	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100500-01.exon1;rank=1
+1	irgsp	CDS	16599	16976	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17383	17474	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100500-01.exon2;rank=2
+1	irgsp	CDS	17383	17474	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17558	18258	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100500-01.exon3;rank=3
+1	irgsp	CDS	17558	18258	.	+	1	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18501	18571	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100500-01.exon4;rank=4
+1	irgsp	CDS	18501	18571	.	+	2	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18968	19057	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon5;rank=5
+1	irgsp	CDS	18968	19057	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19142	19321	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon6;rank=6
+1	irgsp	CDS	19142	19321	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19531	19593	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19531	19629	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100500-01.exon7;rank=7
+1	irgsp	three_prime_UTR	19594	19629	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	three_prime_UTR	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01
+###
+1	irgsp	gene	22841	26892	.	+	.	ID=gene:Os01g0100600;biotype=protein_coding;description=Single-stranded nucleic acid binding R3H domain containing protein. (Os01t0100600-01);gene_id=Os01g0100600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	22841	26892	.	+	.	ID=transcript:Os01t0100600-01;Parent=gene:Os01g0100600;biotype=protein_coding;transcript_id=Os01t0100600-01
+1	irgsp	five_prime_UTR	22841	23231	.	+	.	Parent=transcript:Os01t0100600-01
+1	irgsp	exon	22841	23281	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1
+1	irgsp	CDS	23232	23281	.	+	0	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	23572	23847	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
+1	irgsp	CDS	23572	23847	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	23962	24033	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon3;rank=3
+1	irgsp	CDS	23962	24033	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	24492	24577	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100600-01.exon4;rank=4
+1	irgsp	CDS	24492	24577	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	25445	25519	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100600-01.exon5;rank=5
+1	irgsp	CDS	25445	25519	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	25883	26391	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	25883	26892	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon6;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0100600-01.exon6;rank=6
+1	irgsp	three_prime_UTR	26392	26892	.	+	.	Parent=transcript:Os01t0100600-01
+###
+1	irgsp	gene	25861	26424	.	-	.	ID=gene:Os01g0100650;biotype=protein_coding;description=Hypothetical gene. (Os01t0100650-00);gene_id=Os01g0100650;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	25861	26424	.	-	.	ID=transcript:Os01t0100650-00;Parent=gene:Os01g0100650;biotype=protein_coding;transcript_id=Os01t0100650-00
+1	irgsp	three_prime_UTR	25861	26039	.	-	.	Parent=transcript:Os01t0100650-00
+1	irgsp	exon	25861	26424	.	-	.	Parent=transcript:Os01t0100650-00;Name=Os01t0100650-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100650-00.exon1;rank=1
+1	irgsp	CDS	26040	26423	.	-	0	ID=CDS:Os01t0100650-00;Parent=transcript:Os01t0100650-00;protein_id=Os01t0100650-00
+1	irgsp	five_prime_UTR	26424	26424	.	-	.	Parent=transcript:Os01t0100650-00
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/test_data/kill_list.txt b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/kill_list.txt
new file mode 100644
index 00000000..a9d72f89
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/kill_list.txt
@@ -0,0 +1,3 @@
+gene:Os01g0100700
+CDS:Os01t0100650-00
+transcript:Os01t0102700-01
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/test_data/script.sh b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/script.sh
new file mode 100755
index 00000000..6f9d1584
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/script.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/1.gff src/agat/agat_sp_filter_feature_from_kill_list/test_data
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_filter_feature_from_kill_list_1.gff src/agat/agat_sp_filter_feature_from_kill_list/test_data
+cp -r /tmp/agat_source/t/scripts_output/in/kill_list.txt src/agat/agat_sp_filter_feature_from_kill_list/test_data
+
+head -n 123 src/agat/agat_sp_filter_feature_from_kill_list/test_data/1.gff > src/agat/agat_sp_filter_feature_from_kill_list/test_data/1_truncated.gff
\ No newline at end of file
diff --git a/src/agat/agat_sp_filter_feature_from_kill_list/test_data/test_output.gff b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/test_output.gff
new file mode 100644
index 00000000..47838fe7
--- /dev/null
+++ b/src/agat/agat_sp_filter_feature_from_kill_list/test_data/test_output.gff
@@ -0,0 +1,113 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	ID=Os01t0100100-01.exon1;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	exon	3354	3616	.	+	.	ID=Os01t0100100-01.exon2;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	exon	4357	4455	.	+	.	ID=Os01t0100100-01.exon3;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	exon	5457	5560	.	+	.	ID=Os01t0100100-01.exon4;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	exon	7136	7944	.	+	.	ID=Os01t0100100-01.exon5;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	exon	8028	8150	.	+	.	ID=Os01t0100100-01.exon6;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	exon	8232	8320	.	+	.	ID=Os01t0100100-01.exon7;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	exon	8408	8608	.	+	.	ID=Os01t0100100-01.exon8;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	exon	9210	9615	.	+	.	ID=Os01t0100100-01.exon9;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	exon	10102	10187	.	+	.	ID=Os01t0100100-01.exon10;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	exon	10274	10430	.	+	.	ID=Os01t0100100-01.exon11;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	exon	10504	10815	.	+	.	ID=Os01t0100100-01.exon12;Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	ID=agat-five_prime_utr-1;Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	ID=agat-five_prime_utr-2;Parent=transcript:Os01t0100100-01
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	ID=agat-three_prime_utr-1;Parent=transcript:Os01t0100100-01
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	ID=agat-three_prime_utr-2;Parent=transcript:Os01t0100100-01
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	ID=Os01t0100200-01.exon1;Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	exon	12152	12435	.	+	.	ID=Os01t0100200-01.exon2;Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	ID=agat-five_prime_utr-3;Parent=transcript:Os01t0100200-01
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	ID=agat-three_prime_utr-3;Parent=transcript:Os01t0100200-01
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	ID=Os01t0100300-00.exon2;Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	exon	12146	12284	.	-	.	ID=Os01t0100300-00.exon1;Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	ID=Os01t0100400-01.exon1;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	exon	13906	14271	.	+	.	ID=Os01t0100400-01.exon2;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	exon	14359	14437	.	+	.	ID=Os01t0100400-01.exon3;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	exon	14969	15171	.	+	.	ID=Os01t0100400-01.exon4;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	exon	15266	15685	.	+	.	ID=Os01t0100400-01.exon5;Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	ID=agat-five_prime_utr-4;Parent=transcript:Os01t0100400-01
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	ID=agat-three_prime_utr-4;Parent=transcript:Os01t0100400-01
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	ID=Os01t0100466-00.exon2;Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	exon	13880	13978	.	-	.	ID=Os01t0100466-00.exon1;Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	ID=agat-five_prime_utr-5;Parent=transcript:Os01t0100466-00
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	ID=agat-five_prime_utr-6;Parent=transcript:Os01t0100466-00
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	ID=agat-three_prime_utr-5;Parent=transcript:Os01t0100466-00
+1	irgsp	gene	16399	20144	.	+	.	ID=gene:Os01g0100500;biotype=protein_coding;description=Immunoglobulin-like domain containing protein. (Os01t0100500-01);gene_id=Os01g0100500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	16399	20144	.	+	.	ID=transcript:Os01t0100500-01;Parent=gene:Os01g0100500;biotype=protein_coding;transcript_id=Os01t0100500-01
+1	irgsp	exon	16399	16976	.	+	.	ID=Os01t0100500-01.exon1;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100500-01.exon1;rank=1
+1	irgsp	exon	17383	17474	.	+	.	ID=Os01t0100500-01.exon2;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100500-01.exon2;rank=2
+1	irgsp	exon	17558	18258	.	+	.	ID=Os01t0100500-01.exon3;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100500-01.exon3;rank=3
+1	irgsp	exon	18501	18571	.	+	.	ID=Os01t0100500-01.exon4;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100500-01.exon4;rank=4
+1	irgsp	exon	18968	19057	.	+	.	ID=Os01t0100500-01.exon5;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon5;rank=5
+1	irgsp	exon	19142	19321	.	+	.	ID=Os01t0100500-01.exon6;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon6;rank=6
+1	irgsp	exon	19531	19629	.	+	.	ID=Os01t0100500-01.exon7;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100500-01.exon7;rank=7
+1	irgsp	exon	19734	20144	.	+	.	ID=Os01t0100500-01.exon8;Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	CDS	16599	16976	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	17383	17474	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	17558	18258	.	+	1	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	18501	18571	.	+	2	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	18968	19057	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19142	19321	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19531	19593	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	five_prime_UTR	16399	16598	.	+	.	ID=agat-five_prime_utr-7;Parent=transcript:Os01t0100500-01
+1	irgsp	three_prime_UTR	19594	19629	.	+	.	ID=agat-three_prime_utr-6;Parent=transcript:Os01t0100500-01
+1	irgsp	three_prime_UTR	19734	20144	.	+	.	ID=agat-three_prime_utr-7;Parent=transcript:Os01t0100500-01
+1	irgsp	gene	22841	26892	.	+	.	ID=gene:Os01g0100600;biotype=protein_coding;description=Single-stranded nucleic acid binding R3H domain containing protein. (Os01t0100600-01);gene_id=Os01g0100600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	22841	26892	.	+	.	ID=transcript:Os01t0100600-01;Parent=gene:Os01g0100600;biotype=protein_coding;transcript_id=Os01t0100600-01
+1	irgsp	exon	22841	23281	.	+	.	ID=Os01t0100600-01.exon1;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1
+1	irgsp	exon	23572	23847	.	+	.	ID=Os01t0100600-01.exon2;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
+1	irgsp	exon	23962	24033	.	+	.	ID=Os01t0100600-01.exon3;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon3;rank=3
+1	irgsp	exon	24492	24577	.	+	.	ID=Os01t0100600-01.exon4;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100600-01.exon4;rank=4
+1	irgsp	exon	25445	25519	.	+	.	ID=Os01t0100600-01.exon5;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100600-01.exon5;rank=5
+1	irgsp	exon	25883	26892	.	+	.	ID=Os01t0100600-01.exon6;Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon6;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0100600-01.exon6;rank=6
+1	irgsp	CDS	23232	23281	.	+	0	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	23572	23847	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	23962	24033	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	24492	24577	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	25445	25519	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	25883	26391	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	five_prime_UTR	22841	23231	.	+	.	ID=agat-five_prime_utr-8;Parent=transcript:Os01t0100600-01
+1	irgsp	three_prime_UTR	26392	26892	.	+	.	ID=agat-three_prime_utr-8;Parent=transcript:Os01t0100600-01
+1	irgsp	gene	25861	26424	.	-	.	ID=gene:Os01g0100650;biotype=protein_coding;description=Hypothetical gene. (Os01t0100650-00);gene_id=Os01g0100650;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	25861	26424	.	-	.	ID=transcript:Os01t0100650-00;Parent=gene:Os01g0100650;biotype=protein_coding;transcript_id=Os01t0100650-00
+1	irgsp	exon	25861	26424	.	-	.	ID=Os01t0100650-00.exon1;Parent=transcript:Os01t0100650-00;Name=Os01t0100650-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100650-00.exon1;rank=1
+1	irgsp	five_prime_UTR	26424	26424	.	-	.	ID=agat-five_prime_utr-9;Parent=transcript:Os01t0100650-00
+1	irgsp	three_prime_UTR	25861	26039	.	-	.	ID=agat-three_prime_utr-9;Parent=transcript:Os01t0100650-00

From 11118fb144e22622c65d075d22febb67c40f9a94 Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 26 Oct 2024 20:28:05 +0200
Subject: [PATCH 35/42] Add agat sp merge annotations (#106)

* add help

* add config

* add test data and expected output + srcipt to fetch them

* add run script and handle multiple inputs

* add test

* update changelog

* fix typo

* add second test

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

* update --config description

* remove unset IFS

* add temporary directory and cleanup on exit

* update clean up on exit function

* add set -eo pipefail to test and script

* fix create temporary directory

* cleanup changelog

* cleanup changelog

* Minor formatting changes

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
---
 CHANGELOG.md                                  |  1 +
 .../agat_sp_merge_annotations/config.vsh.yaml | 67 +++++++++++++++++++
 src/agat/agat_sp_merge_annotations/help.txt   | 64 ++++++++++++++++++
 src/agat/agat_sp_merge_annotations/script.sh  | 19 ++++++
 src/agat/agat_sp_merge_annotations/test.sh    | 56 ++++++++++++++++
 .../test_data/agat_sp_merge_annotations_1.gff | 13 ++++
 .../test_data/agat_sp_merge_annotations_2.gff |  3 +
 .../test_data/file1.gff                       | 14 ++++
 .../test_data/file2.gff                       | 12 ++++
 .../test_data/fileA.gff                       |  2 +
 .../test_data/fileB.gff                       |  2 +
 .../test_data/script.sh                       | 15 +++++
 12 files changed, 268 insertions(+)
 create mode 100644 src/agat/agat_sp_merge_annotations/config.vsh.yaml
 create mode 100644 src/agat/agat_sp_merge_annotations/help.txt
 create mode 100644 src/agat/agat_sp_merge_annotations/script.sh
 create mode 100644 src/agat/agat_sp_merge_annotations/test.sh
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_1.gff
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_2.gff
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/file1.gff
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/file2.gff
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/fileA.gff
 create mode 100644 src/agat/agat_sp_merge_annotations/test_data/fileB.gff
 create mode 100755 src/agat/agat_sp_merge_annotations/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 76a1e2ec..420a3d39 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,7 @@
   - `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
   - `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
   - `agat/agat_sp_filter_feature_from_kill_list`: remove features in a GFF file based on a kill list (PR #105).
+  - `agat/agat_sp_merge_annotations`: merge different gff annotation files in one (PR #106).
   - `agat/agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
diff --git a/src/agat/agat_sp_merge_annotations/config.vsh.yaml b/src/agat/agat_sp_merge_annotations/config.vsh.yaml
new file mode 100644
index 00000000..bc47921a
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/config.vsh.yaml
@@ -0,0 +1,67 @@
+name: agat_sp_merge_annotations
+namespace: agat
+description: |
+  Merge different gff annotation files into one. It uses the AGAT parser that takes care of
+  duplicated names and fixes other oddities met in those files.
+keywords: [gene annotations, merge, gff]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_merge_annotations.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+  commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-f]
+        description: |
+          Input GTF/GFF file(s).
+        type: file
+        multiple: true
+        required: true
+        example: input1.gff;input2.gff
+  - name: Outputs
+    arguments:       
+      - name: --output
+        alternatives: [-o, --out]
+        description: Output gff3 file where the gene incriminated will be writen.
+        type: file
+        direction: output
+        required: true
+        example: output.gff
+  - name: Arguments
+    arguments:
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. 
+          The `--config` option gives you the possibility to use your own AGAT config file (located
+          elsewhere or named differently).
+        type: file
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_sp_merge_annotations/help.txt b/src/agat/agat_sp_merge_annotations/help.txt
new file mode 100644
index 00000000..2a17e7e4
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/help.txt
@@ -0,0 +1,64 @@
+```sh
+agat_sp_merge_annotations.pl --help
+```
+ 
+  ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_sp_merge_annotations.pl
+
+Description:
+    This script merge different gff annotation files in one. It uses the
+    AGAT parser that takes care of duplicated names and fixes other oddities
+    met in those files.
+
+Usage:
+        agat_sp_merge_annotations.pl --gff infile1 --gff infile2 --out outFile
+        agat_sp_merge_annotations.pl --help
+
+Options:
+    --gff or -f
+            Input GTF/GFF file(s). You can specify as much file you want
+            like so: -f file1 -f file2 -f file3
+
+    --out, --output or -o
+            Output gff3 file where the gene incriminated will be write.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    --help or -h
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
diff --git a/src/agat/agat_sp_merge_annotations/script.sh b/src/agat/agat_sp_merge_annotations/script.sh
new file mode 100644
index 00000000..5703745a
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/script.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# Convert a list of file names to multiple -gff arguments
+input_files=""
+IFS=";" read -ra file_names <<< "$par_gff"
+for file in "${file_names[@]}"; do
+    input_files+="--gff $file "
+done
+
+# run agat_sp_merge_annotations
+agat_sp_merge_annotations.pl \
+  $input_files \
+  -o "$par_output" \
+  ${par_config:+--config "${par_config}"}
diff --git a/src/agat/agat_sp_merge_annotations/test.sh b/src/agat/agat_sp_merge_annotations/test.sh
new file mode 100644
index 00000000..7b882717
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+echo "> Run $meta_name with test data 1"
+"$meta_executable" \
+  --gff "$test_dir/file1.gff;$test_dir/file2.gff" \
+  --output "$TMPDIR/output.gff"
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_1.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+
+echo ">> cleanup"
+rm -rf "$TMPDIR/output.gff"
+
+echo "> Run $meta_name with test data 2"
+"$meta_executable" \
+  --gff "$test_dir/fileA.gff;$test_dir/fileB.gff" \
+  --output "$TMPDIR/output.gff"
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_2.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
\ No newline at end of file
diff --git a/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_1.gff b/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_1.gff
new file mode 100644
index 00000000..5f68f1f3
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_1.gff
@@ -0,0 +1,13 @@
+##gff-version 3
+chr10	BestRefSeq	gene	123237824	123357992	.	-	.	ID=gene-FGFR2;ontology=G0222
+chr10	BestRefSeq	mRNA	123237824	123357992	.	-	.	ID=rna-NM_022970.3;Parent=gene-FGFR2;ontology=G0222;merged_ID=IDmodified-mrna-1;merged_Ontology=G0333;merged_Parent=IDmodified-gene-1
+chr10	BestRefSeq	exon	123237824	123239535	.	-	.	ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3
+chr10	BestRefSeq	exon	123243212	123243317	.	-	.	ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3
+chr10	BestRefSeq	exon	123353223	123353481	.	-	.	ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3
+chr10	BestRefSeq	exon	123357476	123357992	.	-	.	ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3
+chr10	BestRefSeq	CDS	123239371	123239535	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3
+chr10	BestRefSeq	CDS	123243212	123243317	.	-	1	ID=cds-NP_075259.4;Parent=rna-NM_022970.3
+chr10	BestRefSeq	CDS	123353223	123353331	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3
+chr10	BestRefSeq	five_prime_UTR	123353332	123353481	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
+chr10	BestRefSeq	five_prime_UTR	123357476	123357992	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
+chr10	BestRefSeq	three_prime_UTR	123237824	123239370	.	-	.	ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3
diff --git a/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_2.gff b/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_2.gff
new file mode 100644
index 00000000..1c3846b2
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_2.gff
@@ -0,0 +1,3 @@
+##gff-version 3
+chr1	AUGUSTUS	gene	1000424	1039237	.	+	.	ID=A
+chr1	AUGUSTUS	mRNA	1000424	1039237	.	+	.	ID=A.t1;Parent=A;merged_ID=B.t1;merged_Parent=B
diff --git a/src/agat/agat_sp_merge_annotations/test_data/file1.gff b/src/agat/agat_sp_merge_annotations/test_data/file1.gff
new file mode 100644
index 00000000..d822ebfa
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/file1.gff
@@ -0,0 +1,14 @@
+chr10	BestRefSeq	gene	123237824	123357992	.	-	.	ID=gene-FGFR2;Ontology=G0222;
+chr10	BestRefSeq	mRNA	123237824	123357992	.	-	.	ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0222;
+chr10	BestRefSeq	exon	123237824	123239535	.	-	.	ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123243212	123243317	.	-	.	ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123353223	123353481	.	-	.	ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123357476	123357992	.	-	.	ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123239371	123239535	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123243212	123243317	.	-	1	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123353223	123353331	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	five_prime_UTR	123353332	123353481	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	five_prime_UTR	123357476	123357992	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	three_prime_UTR	123237824	123239370	.	-	.	ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;
+
+	
\ No newline at end of file
diff --git a/src/agat/agat_sp_merge_annotations/test_data/file2.gff b/src/agat/agat_sp_merge_annotations/test_data/file2.gff
new file mode 100644
index 00000000..f072e1b3
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/file2.gff
@@ -0,0 +1,12 @@
+chr10	BestRefSeq	gene	123237824	123357992	.	-	.	ID=gene-FGFR2;Ontology=G0222;
+chr10	BestRefSeq	mRNA	123237824	123357992	.	-	.	ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0333;
+chr10	BestRefSeq	exon	123237824	123239535	.	-	.	ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123243212	123243317	.	-	.	ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123353223	123353481	.	-	.	ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	exon	123357476	123357992	.	-	.	ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123239371	123239535	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123243212	123243317	.	-	1	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	CDS	123353223	123353331	.	-	0	ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	five_prime_UTR	123353332	123353481	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	five_prime_UTR	123357476	123357992	.	-	.	ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
+chr10	BestRefSeq	three_prime_UTR	123237824	123239370	.	-	.	ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;
\ No newline at end of file
diff --git a/src/agat/agat_sp_merge_annotations/test_data/fileA.gff b/src/agat/agat_sp_merge_annotations/test_data/fileA.gff
new file mode 100644
index 00000000..03b2d16d
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/fileA.gff
@@ -0,0 +1,2 @@
+chr1	AUGUSTUS	gene	1000424	1039237	.	+	.	ID=A;
+chr1	AUGUSTUS	mRNA	1000424	1039237	.	+	.	ID=A.t1;Parent=A;
diff --git a/src/agat/agat_sp_merge_annotations/test_data/fileB.gff b/src/agat/agat_sp_merge_annotations/test_data/fileB.gff
new file mode 100644
index 00000000..e796e5f0
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/fileB.gff
@@ -0,0 +1,2 @@
+chr1	AUGUSTUS	gene	1000424	1039237	.	+	.	ID=B;
+chr1	AUGUSTUS	mRNA	1000424	1039237	.	+	.	ID=B.t1;Parent=B;
diff --git a/src/agat/agat_sp_merge_annotations/test_data/script.sh b/src/agat/agat_sp_merge_annotations/test_data/script.sh
new file mode 100755
index 00000000..0d3acae7
--- /dev/null
+++ b/src/agat/agat_sp_merge_annotations/test_data/script.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file1.gff src/agat/agat_sp_merge_annotations/test_data
+cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file2.gff src/agat/agat_sp_merge_annotations/test_data
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_1.gff src/agat/agat_sp_merge_annotations/test_data
+
+cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileA.gff src/agat/agat_sp_merge_annotations/test_data
+cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileB.gff src/agat/agat_sp_merge_annotations/test_data
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_2.gff src/agat/agat_sp_merge_annotations/test_data
\ No newline at end of file

From f96bd72421969e920cb24717df56bb5127d9bf52 Mon Sep 17 00:00:00 2001
From: Theodoro Gasperin Terra Camargo
 <98555209+tgaspe@users.noreply.github.com>
Date: Sat, 26 Oct 2024 20:29:00 +0200
Subject: [PATCH 36/42] Bedtools bamtobed (#109)

* adding back my work

* adding more tests

- fixing bug
- more tests

* Final test added

* Update CHANGELOG.md

* minor change

- license name
- help file

* small changes on config

* small changes

* adding more links

* Update script.sh

* Adding $TMPDIR to test.sh

---------

Co-authored-by: Emma Rousseau <emmarou1@icloud.com>
Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   4 +
 .../bedtools_bamtobed/config.vsh.yaml         | 118 +++++++++++
 src/bedtools/bedtools_bamtobed/help.txt       |  43 ++++
 src/bedtools/bedtools_bamtobed/script.sh      |  39 ++++
 src/bedtools/bedtools_bamtobed/test.sh        | 183 ++++++++++++++++++
 .../bedtools_bamtobed/test_data/example.bam   | Bin 0 -> 334 bytes
 .../bedtools_bamtobed/test_data/example.sam   |   3 +
 7 files changed, 390 insertions(+)
 create mode 100644 src/bedtools/bedtools_bamtobed/config.vsh.yaml
 create mode 100644 src/bedtools/bedtools_bamtobed/help.txt
 create mode 100644 src/bedtools/bedtools_bamtobed/script.sh
 create mode 100644 src/bedtools/bedtools_bamtobed/test.sh
 create mode 100644 src/bedtools/bedtools_bamtobed/test_data/example.bam
 create mode 100644 src/bedtools/bedtools_bamtobed/test_data/example.sam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 420a3d39..dcc783c9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,10 +11,14 @@
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
 
+* `bedtools`:
+   - `bedtools/bedtools_bamtobed`: Converts BAM alignments to BED6 or BEDPE format (PR #109).
+
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
 * `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
 
+
 ## BREAKING CHANGES
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
diff --git a/src/bedtools/bedtools_bamtobed/config.vsh.yaml b/src/bedtools/bedtools_bamtobed/config.vsh.yaml
new file mode 100644
index 00000000..22ef8b44
--- /dev/null
+++ b/src/bedtools/bedtools_bamtobed/config.vsh.yaml
@@ -0,0 +1,118 @@
+name: bedtools_bamtobed
+namespace: bedtools
+description: Converts BAM alignments to BED6 or BEDPE format.
+keywords: [Converts, BAM, BED, BED6, BEDPE]
+links:
+  documentation: https://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html
+  repository: https://github.com/arq5x/bedtools2
+  homepage: https://bedtools.readthedocs.io/en/latest/#
+  issue_tracker: https://github.com/arq5x/bedtools2/issues
+references:
+  doi: 10.1093/bioinformatics/btq033
+license: MIT
+requirements:
+  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: -i
+        type: file
+        description: Input BAM file.
+        required: true
+    
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: -o
+        required: true
+        type: file
+        direction: output
+        description: Output BED file.
+
+  - name: Options
+    arguments:
+      - name: --bedpe
+        type: boolean_true
+        description: | 
+          Write BEDPE format. Requires BAM to be grouped or sorted by query.
+      
+      - name: --mate1
+        type: boolean_true
+        description: | 
+          When writing BEDPE (-bedpe) format, always report mate one as the first BEDPE "block".
+      
+      - name: --bed12
+        type: boolean_true
+        description: | 
+          Write "blocked" BED format (aka "BED12"). Forces -split.
+          See http://genome-test.cse.ucsc.edu/FAQ/FAQformat#format1
+
+      - name: --split
+        type: boolean_true
+        description: | 
+          Report "split" BAM alignments as separate BED entries.
+          Splits only on N CIGAR operations.
+
+      - name: --splitD
+        type: boolean_true
+        description: | 
+          Split alignments based on N and D CIGAR operators.
+          Forces -split.
+
+      - name: --edit_distance
+        alternatives: -ed
+        type: boolean_true
+        description: | 
+          Use BAM edit distance (NM tag) for BED score.
+          - Default for BED is to use mapping quality.
+          - Default for BEDPE is to use the minimum of
+            the two mapping qualities for the pair.
+          - When -ed is used with -bedpe, the total edit
+            distance from the two mates is reported.
+
+      - name: --tag
+        type: string
+        description: | 
+          Use other NUMERIC BAM alignment tag for BED score.
+          Default for BED is to use mapping quality. Disallowed with BEDPE output.
+        example: "SM"
+      
+      - name: --color
+        type: string
+        description: | 
+          An R,G,B string for the color used with BED12 format.
+          Default is (255,0,0).
+        example: "250,250,250"
+
+      - name: --cigar
+        type: boolean_true
+        description: | 
+          Add the CIGAR string to the BED entry as a 7th column.
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+  - type: docker
+    image: debian:stable-slim
+    setup:
+      - type: apt
+        packages: [bedtools, procps]
+      - type: docker
+        run: |
+          echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/bedtools/bedtools_bamtobed/help.txt b/src/bedtools/bedtools_bamtobed/help.txt
new file mode 100644
index 00000000..0cfc23a2
--- /dev/null
+++ b/src/bedtools/bedtools_bamtobed/help.txt
@@ -0,0 +1,43 @@
+```bash
+bedtools bamtobed
+```
+
+Tool:    bedtools bamtobed (aka bamToBed)
+Version: v2.30.0
+Summary: Converts BAM alignments to BED6 or BEDPE format.
+
+Usage:   bedtools bamtobed [OPTIONS] -i <bam> 
+
+Options: 
+	-bedpe	Write BEDPE format.
+		- Requires BAM to be grouped or sorted by query.
+
+	-mate1	When writing BEDPE (-bedpe) format, 
+		always report mate one as the first BEDPE "block".
+
+	-bed12	Write "blocked" BED format (aka "BED12"). Forces -split.
+
+		http://genome-test.cse.ucsc.edu/FAQ/FAQformat#format1
+
+	-split	Report "split" BAM alignments as separate BED entries.
+		Splits only on N CIGAR operations.
+
+	-splitD	Split alignments based on N and D CIGAR operators.
+		Forces -split.
+
+	-ed	Use BAM edit distance (NM tag) for BED score.
+		- Default for BED is to use mapping quality.
+		- Default for BEDPE is to use the minimum of
+		  the two mapping qualities for the pair.
+		- When -ed is used with -bedpe, the total edit
+		  distance from the two mates is reported.
+
+	-tag	Use other NUMERIC BAM alignment tag for BED score.
+		- Default for BED is to use mapping quality.
+		  Disallowed with BEDPE output.
+
+	-color	An R,G,B string for the color used with BED12 format.
+		Default is (255,0,0).
+
+	-cigar	Add the CIGAR string to the BED entry as a 7th column.
+
diff --git a/src/bedtools/bedtools_bamtobed/script.sh b/src/bedtools/bedtools_bamtobed/script.sh
new file mode 100644
index 00000000..10c4cef4
--- /dev/null
+++ b/src/bedtools/bedtools_bamtobed/script.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -eo pipefail
+
+# Unset parameters
+unset_if_false=( 
+  par_bedpe
+  par_mate1
+  par_bed12
+  par_split
+  par_splitD
+  par_edit_distance
+  par_tag
+  par_color
+  par_cigar
+)
+
+for par in ${unset_if_false[@]}; do
+    test_val="${!par}"
+    [[ "$test_val" == "false" ]] && unset $par
+done
+
+# Execute bedtools sort with the provided arguments
+bedtools bamtobed \
+    ${par_bedpe:+-bedpe} \
+    ${par_mate1:+-mate1} \
+    ${par_bed12:+-bed12} \
+    ${par_split:+-split} \
+    ${par_splitD:+-splitD} \
+    ${par_edit_distance:+-ed} \
+    ${par_tag:+-tag "$par_tag"} \
+    ${par_cigar:+-cigar} \
+    ${par_color:+-color "$par_color"} \
+    -i "$par_input" \
+    > "$par_output"
+
diff --git a/src/bedtools/bedtools_bamtobed/test.sh b/src/bedtools/bedtools_bamtobed/test.sh
new file mode 100644
index 00000000..3ea8b59d
--- /dev/null
+++ b/src/bedtools/bedtools_bamtobed/test.sh
@@ -0,0 +1,183 @@
+#!/bin/bash
+
+# exit on error
+set -eo pipefail
+
+# directory of the bam file
+test_data="$meta_resources_dir/test_data"
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_identical_content() {
+  diff -a "$2" "$1" \
+    || (echo "Files are not identical!" && exit 1)
+}
+#############################################
+
+echo "Creating Test Data..."
+TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
+function clean_up {
+  [[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
+}
+trap clean_up EXIT
+
+# Generate expected files for comparison
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t60\t+\nchr2:172936693-172938111\t428\t528\tmy_read/2\t60\t-\n" > "$TMPDIR/expected.bed"
+printf "chr2:172936693-172938111\t128\t228\tchr2:172936693-172938111\t428\t528\tmy_read\t60\t+\t-\n" > "$TMPDIR/expected.bedpe"
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t60\t+\t128\t228\t255,0,0\t1\t100\t0\nchr2:172936693-172938111\t428\t528\tmy_read/2\t60\t-\t428\t528\t255,0,0\t1\t100\t0\n" > "$TMPDIR/expected.bed12"
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t0\t+\nchr2:172936693-172938111\t428\t528\tmy_read/2\t0\t-\n" > "$TMPDIR/expected_ed.bed"
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t60\t+\t128\t228\t250,250,250\t1\t100\t0\nchr2:172936693-172938111\t428\t528\tmy_read/2\t60\t-\t428\t528\t250,250,250\t1\t100\t0\n" > "$TMPDIR/expected_color.bed12"
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t60\t+\t100M\nchr2:172936693-172938111\t428\t528\tmy_read/2\t60\t-\t100M\n" > "$TMPDIR/expected_cigar.bed"
+printf "chr2:172936693-172938111\t128\t228\tmy_read/1\t85\t+\nchr2:172936693-172938111\t428\t528\tmy_read/2\t85\t-\n" > "$TMPDIR/expected_tag.bed"
+
+
+# Test 1: 
+mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output.bed" \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected.bed"
+echo "- test1 succeeded -"
+
+popd > /dev/null
+
+# Test 2:
+mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -bedpe"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output.bedpe" \
+  --bedpe
+
+# checks
+assert_file_exists "output.bedpe"
+assert_file_not_empty "output.bedpe"
+assert_identical_content "output.bedpe" "../expected.bedpe"
+echo "- test2 succeeded -"
+
+popd > /dev/null
+
+# Test 3:
+mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -bed12"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output.bed12" \
+  --bed12
+
+# checks
+assert_file_exists "output.bed12"
+assert_file_not_empty "output.bed12"
+assert_identical_content "output.bed12" "../expected.bed12"
+echo "- test3 succeeded -"
+
+popd > /dev/null
+
+# Test 4:
+mkdir "$TMPDIR/test4" && pushd "$TMPDIR/test4" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -ed"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output_ed.bed" \
+  --edit_distance
+
+# checks
+assert_file_exists "output_ed.bed"
+assert_file_not_empty "output_ed.bed"
+assert_identical_content "output_ed.bed" "../expected_ed.bed"
+echo "- test4 succeeded -"
+
+popd > /dev/null
+
+# Test 5:
+mkdir "$TMPDIR/test5" && pushd "$TMPDIR/test5" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -color"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output_color.bed12" \
+  --bed12 \
+  --color "250,250,250" \
+  
+# checks
+assert_file_exists "output_color.bed12"
+assert_file_not_empty "output_color.bed12"
+assert_identical_content "output_color.bed12" "../expected_color.bed12"
+echo "- test5 succeeded -"
+
+popd > /dev/null
+
+# Test 6:
+mkdir "$TMPDIR/test6" && pushd "$TMPDIR/test6" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -cigar"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output_cigar.bed" \
+  --cigar
+
+# checks
+assert_file_exists "output_cigar.bed"
+assert_file_not_empty "output_cigar.bed"
+assert_identical_content "output_cigar.bed" "../expected_cigar.bed"
+echo "- test6 succeeded -"
+
+popd > /dev/null
+
+# Test 7:
+mkdir "$TMPDIR/test7" && pushd "$TMPDIR/test7" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with -tag"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output_tag.bed" \
+  --tag "XT"
+
+# checks
+assert_file_exists "output_tag.bed"
+assert_file_not_empty "output_tag.bed"
+assert_identical_content "output_tag.bed" "../expected_tag.bed"
+echo "- test7 succeeded -"
+
+popd > /dev/null
+
+# Test 8: 
+mkdir "$TMPDIR/test8" && pushd "$TMPDIR/test8" > /dev/null
+
+echo "> Run bedtools bamtobed on BAM file with other options"
+"$meta_executable" \
+  --input "$test_data/example.bam" \
+  --output "output.bed" \
+  --bedpe \
+  --mate1 \
+  --split \
+  --splitD \
+
+# checks
+assert_file_exists "output.bed"
+assert_file_not_empty "output.bed"
+assert_identical_content "output.bed" "../expected.bedpe"
+echo "- test8 succeeded -"
+
+popd > /dev/null
+
+echo "---- All tests succeeded! ----"
+exit 0
diff --git a/src/bedtools/bedtools_bamtobed/test_data/example.bam b/src/bedtools/bedtools_bamtobed/test_data/example.bam
new file mode 100644
index 0000000000000000000000000000000000000000..ffc075ab83a83a98ed1edbf88b26cc27ad8946c6
GIT binary patch
literal 334
zcmb2|=3rp}f&Xj_PR>jWAq>SuUsA6mBqS7Y@IB%Aw%O~PhS4S?6Z1_bX2zRMuCZ>`
z;o;@Ato^fw$CpQUheTtRYNNz-r#8JXHa3Ry>s4lk0?m>~GxQF_-U<7&m>dP#pU;|5
z)~CHK)-&PMX8(zQnRkjz7tzu&Q_9lpm^;_nXXDZb**~)OH9hZA+GbYw!F2!1eU^u&
z=6?J8db>^n+vnS58VqGOpQde!^LhT^FPno$sK1R;RVb&i_o5|>-LG;Kg??MHCx&;~
ziZww?R#r16X1LX_ZFYQZ=WBLl9Y-y@V*W$>-;Wo3eOwoN_@m-GsXhDI<L7O#D!rR*
zAbon#WHIeSf_p=Iw@vX;QcA7~nW$-K`s?tN6@ejF_co|EotL|LtN#%rrst#?n85)E
FA^@Dlf^q-=

literal 0
HcmV?d00001

diff --git a/src/bedtools/bedtools_bamtobed/test_data/example.sam b/src/bedtools/bedtools_bamtobed/test_data/example.sam
new file mode 100644
index 00000000..4afb0aef
--- /dev/null
+++ b/src/bedtools/bedtools_bamtobed/test_data/example.sam
@@ -0,0 +1,3 @@
+@SQ	SN:chr2:172936693-172938111	LN:1418
+my_read	99	chr2:172936693-172938111	129	60	100M	=	429	400	CTAACTAGCCTGGGAAAAAAGGATAGTGTCTCTCTGTTCTTTCATAGGAAATGTTGAATCAGACCCCTACTGGGAAAAGAAATTTAATGCATATCTCACT	*	XT:A:U	NM:i:0	SM:i:37	AM:i:37	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:100
+my_read	147	chr2:172936693-172938111	429	60	100M	=	129	-400	TCGAGCTCTGCATTCATGGCTGTGTCTAAAGGGCATGTCAGCCTTTGATTCTCTCTGAGAGGTAATTATCCTTTTCCTGTCACGGAACAACAAATGATAG	*	XT:A:U	NM:i:0	SM:i:37	AM:i:37	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:100

From 2f8bf020dd5f391bbb3260ad33fa73b6094cae19 Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Sat, 26 Oct 2024 20:39:23 +0200
Subject: [PATCH 37/42] Rseq bamstat (#155)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* initial commit

* add test, test data, version, help

* Update CHANGELOG.md

* adjust argument names, reduce test data size

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Co-authored-by: Kai Waldrant <kai@data-intuitive.com>
---
 CHANGELOG.md                                  |   9 +--
 src/rseqc/rseqc_bamstat/config.vsh.yaml       |  59 ++++++++++++++++++
 src/rseqc/rseqc_bamstat/help.txt              |  18 ++++++
 src/rseqc/rseqc_bamstat/script.sh             |   9 +++
 src/rseqc/rseqc_bamstat/test.sh               |  49 +++++++++++++++
 .../rseqc_bamstat/test_data/ref_output.txt    |  22 +++++++
 .../test_data/ref_output_mapq.txt             |  22 +++++++
 src/rseqc/rseqc_bamstat/test_data/sample.bam  | Bin 0 -> 9240 bytes
 8 files changed, 184 insertions(+), 4 deletions(-)
 create mode 100644 src/rseqc/rseqc_bamstat/config.vsh.yaml
 create mode 100644 src/rseqc/rseqc_bamstat/help.txt
 create mode 100644 src/rseqc/rseqc_bamstat/script.sh
 create mode 100644 src/rseqc/rseqc_bamstat/test.sh
 create mode 100644 src/rseqc/rseqc_bamstat/test_data/ref_output.txt
 create mode 100644 src/rseqc/rseqc_bamstat/test_data/ref_output_mapq.txt
 create mode 100644 src/rseqc/rseqc_bamstat/test_data/sample.bam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index dcc783c9..5f720035 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,16 +16,17 @@
 
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
-* `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
+* `rseqc`:
+  - `rseqc/bam_stat`: Generate statistics from a bam file (PR #155).
 
+* `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
 
-## BREAKING CHANGES
+## BUG FIXES
 
 * `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
 
-## BUG FIXES
+* `cutadapt`: Fix the the non-functional `action` parameter (PR #161).
 
-* `cutadapt`: fix the the non-functional `action` parameter (PR #161).
 
 ## MINOR CHANGES
 
diff --git a/src/rseqc/rseqc_bamstat/config.vsh.yaml b/src/rseqc/rseqc_bamstat/config.vsh.yaml
new file mode 100644
index 00000000..6d607e2f
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/config.vsh.yaml
@@ -0,0 +1,59 @@
+name: rseqc_bamstat
+namespace: rseqc
+keywords: [ rnaseq, genomics ]
+description: Generate statistics from a bam file.
+links:
+  homepage: https://rseqc.sourceforge.net/
+  documentation: https://rseqc.sourceforge.net/#bam-stat-py
+  issue_tracker: https://github.com/MonashBioinformaticsPlatform/RSeQC/issues
+  repository: https://github.com/MonashBioinformaticsPlatform/RSeQC
+references:
+  doi: 10.1093/bioinformatics/bts356
+license: GPL-3.0
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+- name: "Input"
+  arguments: 
+  - name: "--input_file"
+    alternatives: -i
+    type: file 
+    required: true
+    description: Input alignment file in BAM or SAM format.
+  - name: "--mapq"
+    alternatives: -q
+    type: integer
+    example: 30 
+    description: |
+      Minimum mapping quality (phred scaled) to determine uniquely mapped reads. Default: '30'.
+    
+- name: "Output"
+  arguments: 
+  - name: "--output"
+    type: file
+    direction: output
+    description: Output file (txt) with mapping quality statistics.
+
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+  
+engines:
+- type: docker
+  image: python:3.10
+  setup:
+    - type: python
+      packages: [ RSeQC ]
+    - type: docker
+      run: |
+        echo "RSeQC bam_stat.py: $(bam_stat.py --version | cut -d' ' -f2-)" > /var/software_versions.txt
+runners: 
+- type: executable
+- type: nextflow
diff --git a/src/rseqc/rseqc_bamstat/help.txt b/src/rseqc/rseqc_bamstat/help.txt
new file mode 100644
index 00000000..b4e9c1d9
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/help.txt
@@ -0,0 +1,18 @@
+```
+bam_stat.py -h
+```
+
+Usage: bam_stat.py [options]
+
+Summarizing mapping statistics of a BAM or SAM file. 
+
+
+
+Options:
+  --version             show program's version number and exit
+  -h, --help            show this help message and exit
+  -i INPUT_FILE, --input-file=INPUT_FILE
+                        Alignment file in BAM or SAM format.
+  -q MAP_QUAL, --mapq=MAP_QUAL
+                        Minimum mapping quality (phred scaled) to determine
+                        "uniquely mapped" reads. default=30
\ No newline at end of file
diff --git a/src/rseqc/rseqc_bamstat/script.sh b/src/rseqc/rseqc_bamstat/script.sh
new file mode 100644
index 00000000..32927bb6
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/script.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+
+set -eo pipefail 
+
+bam_stat.py \
+    --input-file "${par_input_file}" \
+    ${par_mapq:+--mapq "${par_mapq}"} \
+> $par_output
diff --git a/src/rseqc/rseqc_bamstat/test.sh b/src/rseqc/rseqc_bamstat/test.sh
new file mode 100644
index 00000000..f9180da8
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/test.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+
+# define input and output for script
+
+input_bam="sample.bam"
+output_summary="mapping_quality.txt"
+
+# run executable and tests
+echo "> Running $meta_functionality_name."
+
+"$meta_executable" \
+    --input_file "$meta_resources_dir/test_data/$input_bam" \
+    --output "$output_summary"
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Checking whether output is present"
+[ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
+[ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1
+
+echo ">> Checking whether output is correct"
+diff "$meta_resources_dir/test_data/ref_output.txt" "$meta_resources_dir/$output_summary" || { echo "Output is not correct"; exit 1; }
+
+#############################################################################
+
+echo ">>> Test 2: Test with non-default mapping quality threshold"
+
+output_summary="mapping_quality_mapq_50.txt"
+
+# run executable and tests
+echo "> Running $meta_functionality_name."
+
+"$meta_executable" \
+    --input_file "$meta_resources_dir/test_data/$input_bam" \
+    --output "$output_summary" \
+    --mapq 50
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Checking whether output is present"
+[ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
+[ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1
+
+echo ">> Checking whether output is correct"
+diff "$meta_resources_dir/test_data/ref_output_mapq.txt" "$meta_resources_dir/$output_summary" || { echo "Output is not correct"; exit 1; }
+
+exit 0
\ No newline at end of file
diff --git a/src/rseqc/rseqc_bamstat/test_data/ref_output.txt b/src/rseqc/rseqc_bamstat/test_data/ref_output.txt
new file mode 100644
index 00000000..6b939096
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/test_data/ref_output.txt
@@ -0,0 +1,22 @@
+
+#==================================================
+#All numbers are READ count
+#==================================================
+
+Total records:                          90
+
+QC failed:                              0
+Optical/PCR duplicate:                  0
+Non primary hits                        0
+Unmapped reads:                         1
+mapq < mapq_cut (non-unique):           0
+
+mapq >= mapq_cut (unique):              89
+Read-1:                                 45
+Read-2:                                 44
+Reads map to '+':                       44
+Reads map to '-':                       45
+Non-splice reads:                       89
+Splice reads:                           0
+Reads mapped in proper pairs:           88
+Proper-paired reads map to different chrom:0
diff --git a/src/rseqc/rseqc_bamstat/test_data/ref_output_mapq.txt b/src/rseqc/rseqc_bamstat/test_data/ref_output_mapq.txt
new file mode 100644
index 00000000..be8af62f
--- /dev/null
+++ b/src/rseqc/rseqc_bamstat/test_data/ref_output_mapq.txt
@@ -0,0 +1,22 @@
+
+#==================================================
+#All numbers are READ count
+#==================================================
+
+Total records:                          90
+
+QC failed:                              0
+Optical/PCR duplicate:                  0
+Non primary hits                        0
+Unmapped reads:                         1
+mapq < mapq_cut (non-unique):           6
+
+mapq >= mapq_cut (unique):              83
+Read-1:                                 42
+Read-2:                                 41
+Reads map to '+':                       44
+Reads map to '-':                       39
+Non-splice reads:                       83
+Splice reads:                           0
+Reads mapped in proper pairs:           83
+Proper-paired reads map to different chrom:0
diff --git a/src/rseqc/rseqc_bamstat/test_data/sample.bam b/src/rseqc/rseqc_bamstat/test_data/sample.bam
new file mode 100644
index 0000000000000000000000000000000000000000..ed1e24333ba1df0efa75401d7928b790038b762a
GIT binary patch
literal 9240
zcmV+zB<I^7iwFb&00000{{{d;LjnM80fmy^PQox0#*25wm)Hxe<5ULHs|`qyWSi3o
zw@dd2T*6jd8#LbgV7{4Qz@|9#?rYEK@9TGR#<tt}yh6yjo8qO%fDCYO&tf6UBrCW|
zyH@ak1CO~+FrveONdP+@qoZ3o>ROL8JfAYa&X{eo2(a(4x#KL{xo6|RWh#{l`wJHF
zG8Rb+UCXZ?<XMsBd`q$KHG1hWN?@p$qdwq?Qx`OyziG16_AqHATybN?rQ(L<rHsXn
z8ncNV_5jSjY4%M5p&hu#(+<LQT8KQ0$*SPzh!-O%8cYbJx+LrOe;R4scnTzWu7udg
zeV|7BKf(clD%Yv5?XvV$(}PH>!F*E?$@6n6fpc!H+qhDcO4CJy-SVtVlLK9pDosel
z^VV|IVoooa6FAt@UQ4X!YKCDo!o4C#m$XQ}ed3qd%|$c%r<opgI|!j`giy1n=tCj&
z1x#_s3g7|&03VA81ONa4009360763o0F5HGeGRN7Np)Vo{qyd!bY9o=U5af5cXwL2
zSfknZ{{w4lx?dw2JJ@JPvW>+O@P<S|Kv1xg7-4CuQ-Kmj5e{px94T36?F5Th%8z0b
z$tYL>;z%eEf)+$#Wrc{9c#{alYg<Cr;_uW?cXi+U?tSm=`0mV``>OlCnfm%W=X~d!
zs@iTVGOxdj-^rtIU|Z4C(Z?hEx*yk`!=JBv?6Je0qtnxa)9ZVuM@J{o+wO}ZDY@i{
zNHkM@r&wASRVrEE2`+k7w4%|h)x9peP8ChXi#{!iUbb3yW!rUiStqSbd7GxC&htF3
z;>y4B;a!~3IId*$CC@8+UmVjLb8AdD+xK*dhUq)HKgMktE@zqfNpyFPvAl{G=1+P@
ze^*9d@g76;ag5*%?tS0x*>AjW?<{)k^*^~A{m2{7w(6TV%dNlqj=%OVe*V9`_HVuG
zEPCSIXD@xvd(NVtesBGzcfSwUo(piZSl*1@9F6!czWd$pE@Ao6yQ8QCD5BLY5oyAT
zqUsCQr&^UtYnFC>Q&&n<vQmv!^$q%$#tRDsO)R5tX^Qm9fMtgN7Bo4`5K}dH`rj6^
zyvp47vHft4(WMFBS>HT+67#+DzWaZ`&G(*hzGqLq!BMBbBZ`=r?h^xbp2OI8r_|Zq
zJvceuG2BVgx=~$Q@)RVg%DR_HS7^m_+SFAlRKpZ6HLp{y`=m~osu`0>A)8cnoh&Pr
zh(c8=6_Vxjialon>P)^Hm_`4s9M3aiLY@yJD{k{w^34BrS(aNYX`J5Y^t-A6<+np9
zBXH-{#2v!MVYvvn^Xjc1c3}R>0Or%tCmfg`zcY$%ZcV{FI6gf+iQWq^xk%EU>srXR
zWObw4Le#yGDyg+D(j*aWQiz1{wCGx;S=rUB=S_z@ld5jpLKdPDysLOpZ0F`xEgXP`
zWck9rH;gi-X3f=s?(H@+w}VF+Yz@D2e2MX0j)B;H+TWg=5p$DaZRKDzhMg_wQf|4N
z(<m|XYjabq2+l9JU-zbXEr7Gy^4xrGfb(MmH;J!XQ&SANc_E~j7T~X_RU=CwRH_P{
zq)ha(VI8QQqynELs^l4Q#9AZ0H?Nq;f?syxG;>QS`aZJ&Iaic>kOwkmBQbO15<M4J
zQZ^obgt>)Y7vW|y;pQI(JhWimjNTfJeEp3VreIobJ`2aDrLI9ffTyGsjUq^S-S9fm
zAgU~?nsvQOlDYw3bz^2%#b)w`AOw?hR@uUFGPnLZvs|Pv3r|%?uzUtXUipcCZ8q_<
z-=5C<C4qnYZO@_qGE936jW1yAKg|5w;o<4Qf#FccSesN@)Jd8OUUOB{MX5Drop9A8
zX##V_K*8HKQGJmhC~{p2ndp)y@OXmvNtv`Akg@J<@JZo=9pBC$yQ4xJQllElM3*uO
zd6s#`iJ428wKw+xTo~d-wa8-6QVavi-?|)!JeV&89P(fiz&|iE$PrAHZ~)U52m}Zb
zu(3q7HSbD*sHE;0FzpM;i<C)Sup(`NGu$4+2ycbvZI>px;sU~?#01E}<fH&dOf2FB
zoQ-wahB&xjJ>U=vtXWw?#_mZNX9h?dnjZhy15ZDDu^1mpHZ{$~_>?>^e`}M_xPKxv
ze6V=MU1Q|+!Q%JtvH~Oo^W@<0V9x}L8F=Lsyks4Csct%f57xX%@JDyUA!Qjjc=tT8
zMS{$IYeR<>bYM_y@9?2B_M|h9g5rn$u@oANz#((bO-!+1n!r(+`;2ajP0*n`2+n^L
z!g&Yae8>iipLo>4`9J{YLtEc=(7hpm^DV*HTV{rSbb7eEAAJ`7ZDy=>k~A7IK}*%j
zn)h|Ddf6A9s(Mx=Y6R@|y<*2F%=#^J=l1FIo!h4;bKbE5GMw|T&g}83b3Xr{{i2ZY
zhLdB?Z&@?vJHt8ur@%zKC}^V5l>T@Y$E?}pba&T4O?5J93ZW|U*QBy)Qe7~YR+UD_
z!0;E&j~Zo7QG8)Ng;6~2-@PzSV?pn2cqZ|^L|b@)vkzc0@4D}2J!|g`Ci8*UI3KY6
zk}*1c#7!o_*hglG+u`xa-fr~Sd!wikoTVDt1_8KEWuj7Dl}gc^Y9+ctLNjXEcq$Q^
zHPeh0qHVb1I^}iOrV^s3kQXL%ew*&8mIJ&09I*S@c{t*0!V%v(-5QZ%yEYvCd^95Z
z)3a2~qW3R>UMVHWortPqY0|+RB0m#Ia#fS^M)f>Nm{y2G9Tzp1LKY=cZJ}!|Yskd9
zgI{HQo8k{;993@9&)kI8+=i3PIKV7+j?SgO14wa*f^n3YduJoj;u3gT;t_Z|A$SkY
zz}pJIdvNP7JMjMI1Iyv(+DoJ8HPi63zjJ!9yJNynogyn~iARd3eNrT%Qyt<?T_LqY
z34uc>F-hKqCAw3{eUq|Jbc#YjLQtl<>nT*#sj|6>OGp+r0gHoTzz}8wt_1vX?*>0K
zYJeeN=f<Y}{WkiWFy$Ve2~Tkr+pIFS<AC!f+nST{`V7wp13a&9{m|Bk-8@+#jK47T
zOAb%cMw)9Fs}w-4;|w|}A*l&VPU=)3jCTp#G_o1EF5aYNqO?TFXDGYW$Z#5@;Sim&
zYNSX^fX2Tc<Rj7<>$YDg8Os(lRZ8V@r?9gT#60xpappBS?zA~w7H9Dw%wwmfyl5JW
zIaVYr7C!H^UU7!zDsjXn;^@D6CdNQBdIO81UyI%mj73w&xW9LBa&#C$)*zcp;MfQZ
zRhK;yI@B0CP_D+?sxAsClt>usdtE5rS+{NJXPvtFM>{=3zTfH=%Ot}fnyX2R4O@)F
zq&rE6=N5D;m~3&%pl0Sx^YjAROsgW2ym>UbbEic{C&;xorR1G7nMcZCU!sRY>KYVT
zFeJHXSe<IY;P(s7;rK->wN}Vw896*H=x*KJFlx&%ou(1nIoJmln;B;|L9p4GsQ}QN
zt$=fOWWmS`2wo@JpF)UQikSkJ7lBT4$M&Pv3h5J>F*xVLjzX5aq`rCf1(4+X`7ip?
z)m=gG{DK$I_iitHu*bKTs-FADkUyse%m#8<%C7Bdpb8~~TBeo^VYAho2?h&nMT-I!
zY%)<Itstd!4Na|Jhp<w77SR+XthP^Y++IDHZ@x7B3+I(EbIl6qXD{c#?sjm#Zh-UY
z=$+9x_RblcM@Of}htX5_fSYhqR0=EvL!Q^jriPkS$UwTLU@3wC=Sc|(f|_yBcF0yx
zC1n@IEx*E*=?hT2ZLX@nIz{5XfLC9g0P$KYD?%VXK9v<O7Kn?^X|eLIEU)qlVBrmH
z6A+<+ebsxb$G4Y58X4VU281;*Q-8bY<dv-36vY&lDg?2l>*#j)DL5;JPJ`ycQrc)M
z#~4>#t87s&cg1px#>JpqgVW{<9?EAo)lUAxq;}%<){n7eZ}km~Jv7x@2S<kohc?}3
zf)u?hnvN%`lReZt96z{Ll&B9gcz)#jB^StQ(0>q8l0567zZ86xVAPQ4I7$ErhBt1T
z)p@NDpZ^*AnDU`?nNx_h?=pLXuQb3!Z{EFix#F<bTi;;I<RpxLf2y|}o<z<8jT-ui
z6X`fdRXCNrlqJGyQ9|W+h@>3lJ;f5bvjE~f*o<tUZlPU~E`p?N1|~v3B?Cb(V;z`7
zDo|A`tAkg0Idaa>T*c076Lxwu|N6D%GU**@6z$H~d3<`Zb8O3`=y(-fiZ2>6zbL8#
zVyk7mD|lT<P!MUVLaj<j^t}{XMo=)|u1uIf&8kNwnFMa3HHop?Us@SoS?D!=em??d
zE2kuEhZ}8e4`}j@Xr?RCB;@9GQbPDkGj5J(LbU=mAsCX;w9&fmm@LVJ>t3W<3)J#D
z!nQ%0&{IDxL7Ko>$r2<7EkukKxPn}@mR_Md1GK^*cFl%($PB6p1k2xD`hlDqjyyLv
zqxrIxRp&oQm$`X;5k=2Tg~{O#`anlEvtntAS~9d;(m?eKtrT=yuhUYBE|G{jvPIz}
zB^?OUW>iICx<e6_Hw9HndPwP{?u1Hi3pZE%fDJ#^6|FOy_nLfT?1dO<^QBE-vI&@H
zTO)9un*T5w0e*NI(;b*OtrfT^RLHj|jtgaspNauUS0S-eJOo%da~Ua3ZxL8!=4U*Z
zW$cG85ATH^m5=62FjiaV7G80!b1?q<0mdhyUvw}&kFhV#IJv)fdVCta2{0Bg&O#te
zs|2{A@`^|TYb|vLn{Q=Oq=Lac_OgRZZPT8CwMg((pB5z(O`qa}LZ*f4w$I~ad^;kE
zP2~*TY&z&dH|4ka&>OIf%!3<%<>%&DKD;LOJUrO<?<|(>`=JM;=>N^IJlHwhvoiCx
zLCN@SiakCZ%O_-9LCnnc;RbJ+Pd%so*8t3KUM`UPr1<#(%#RH+lgN57vu{gGl9`o4
z%bO)A3n!dI4Z(Knjyj&5R1|R-<fcGHv_`Y8;V3(62@@A6`9VCjlFKs1PbdUW*tTf~
zWu_Kl+I5-3#AYit;m@pjXKv!L9nbPn#1En<4iKI-9W2)uffCKDRrL3E{<%#_acJk)
z2RkPodUxsV-0*b<r{yNu`Bkw)mLz&ectooXJ!e^0H*()mv#V45QmBRN<T}@BhATH@
zTEm`co>}>drg{J*esH)3XA#_!`|6JElsYWGxyjDASK9fF2bY%`eF9_uXJ+Te`^P&r
zj(6-r!0{PysEkqVKIzInP4FirMdshWsBYIQUOl`a5-lzB0GQ9LY<K?P|9bPIEA0HC
zfta6oaI|wzOaeHa5z~T6#6<9>wsFH#)+#}*p(O1^QFHv+l~S{oce)2$bpq)Q?TJ1J
z)OyhYOX}fscv(~t(KtZ>%0_D){248g3G}wFKnK)XkIvPDmnY^kn~3?+2{FInaQn3Z
z%cr9^24ioW6Z2^Q<RtnsT8b9~@s=l@?@|O+JIlr4`$gq%u~k+J<pdSX?+52qoL`<!
zFIZ@B_REvm?O7KV&hFpLnR40L`30QW<Vj?!pV07?Jj_(6P<bG~!0fid*_F-?-t_YE
z+5KEJj{V&9*&Xff?(XeJU;dsbLKzn&r!p-P$y1F?3fTfm!Ma7hm2xCvNEsyRQw4{B
zKzoKuSPq6!$P%f0rokheJ*Abe?EIl~tzrmkE_E}e1eAVej?#CgD8&Iv-<hDaV>u9_
z^!KJ5xP2(iCX;6)xmV0~r8mv5;t#wh*aJ#Gvj(MmLzKKv{o-F7xit9HS47bZGcFzO
z?(d%%pNd)=HCvz~(RV?Ll+>z9Q{+de1QTj0D|mwPlBmlRr8`;l9eSq<SmIXX?IlP3
z4tA!v6ZJL_$3sKYMQ#g3(0QwTwiLU1n)!%r-{roi?0e1GaJ%Vwl8eq<#lx6zZkFfJ
z?`)ATVz>B={5p{QAM^5DK5O8G!9L%rw?^Cu%R9ihe-MrM-Z5jG#j?fKr|!ik`WY^0
z-ifwHf=ObZdSCS7G%mnBR7mSaa#|vbZZlVKQRtHL)GiTU*mdK6wUYH!rbI`JIOE(2
zCkv6yul-W>CG}~0S*D3>7QQ-uI^>7Ge&E!6@x{<_3B>ye#P9sy=T@ynzkd+(ul(NS
z^~6tMY<p`;$djX;Q`e*K(*{iJP&(EvQLREfvq{ltRv=lU8OlWrKAaa!>!gzCD%VKa
z3v@<hrwWD+bSi4<B&K1Sc}YWDl0m~jyy`~tH7v_56{v#y2+lv<B<8b;nD+wosqbAb
z%MrxieqqW<3#UmmCx|3U^+Y=khu2GVD>O37rf+H4qS7^TP&mW{T;v_TrMnW{q4S*d
zit%}JIGKG`g+<AHCJ5HR99PR2U-0F*`KOyY^w%aGdSB!GkL7aK{-M7VML#~}<}n)C
zN4EanaCpFyTG2XHMTe}j=-~<(8a}iR0vFI1GV5=6(P*@0DaHUsw2<IR(F^JEk*4U_
zZ2eMxvB|ILSOh6k?%&LQcy9X`uD4;odjINwGHRULm#i53?hqm8sp&195dQTkH65B{
z=?Mz(O{oAMa!n+eD6%)yIP1~k5NHvh9D)iv08UdvnrRSJ0AeNEqG}ML>IBXLrMa%V
zWtL>ybt86m9tby4JJUQQaK4#zSMweI#{=w@c&=||dsF}W-?v<!yGumTgHv{%9Pga&
z+4QJj$Sfsmd(ckTigk$J0{x4+P>NPq2tjKv8l99PvAbJX&-*UvMACC=D<Mg3DK+i9
zL=1z$42gJpp>p9iSR9u5J9GaKsuif`!H_K(=RWM1ggy4r$&mA&txfq+2<F3EH)mkp
z8MH2+!Px(freGc(9qt`R+xO7YohW+SomT)=ib7X1?IlN(NDEZxsIiXVRcv3SQWq{8
zb5SM_3vTgn#TKYyw}6EzDfHf~m8snNAgXH7?lYs1OdW>Mg$MooN{Mt%2QwdFeP)1l
zEY%UzFPlmv2WzxtVSO*jq!JYn+9Pxx1$#uHu9LcjVv-7lJ*6P$Q~ZMdWNDeeMbz*S
zsk}v6iB{AvvSBe-W&|SSQ)Xo$Z6L6u%)Qvjja14)O5+2roEMwuWfwmZtL<%}O_1E0
z7x~=-ajpG1tak#|2ME^x`SxE~wGjM)t>1GH{!w81KkCLRnEv0MvGwHe^yn!1HuBv$
ze!A<?Rd46`-OAlcV6%m4_+@l<Y=?52yxU8$UrT3b=oXGyg>P=g6n|`1{5ktDX88Z>
zFZmhX9i%BA_OXKe*u2C`y!nG^tZ=@~E+QN4i1kn_94#7j4AHqkoTU;Q8a29>1zI{F
z1w$g&aRivIZ)mM}S{Jp@x&Wh^TQ*Ih#bS8gM3SK;VAsK#ydt*$+1$j5<;-s895dh8
zN^V;z9#$4E2XrUA{WrmDVMZ3Nzj)6jW5qK8IlLXdE+El|XLe`_^DNw2Np^_en{X7#
zjfLfh-3yyrc-)e^$}KpzgO)c=(k(m=X7PExiYd1+wU7MJJ6A3*f6OuE-->0&{nl3@
zIh#ho)BWRvy}js7cOyN!BQk5|H4@jZk?2L2FbuQs0Y(A!i{uQxyF(7!B7s1)Rr5B%
z-AvWEp-~M(`kN*#I?}j8otImeb==E0+j3)E`CW`5QS6G<*wq-an3kZDpRy$fvs210
z%(Tn;Ghd)uUM-Q=H_!ff>bPGPp!!X(Exsw1mG50Z_1{e$%<1mw(ebWPzUYH0po88N
z+WWNRkR#G5hIFFW$Q%#>p=njkk&cNr1;7FgE~LD)L;`7ss;Q%(V8Y2ome1vRHD6XT
z&8C7sgRFHiemu~__R<hEX>YP`LXUfX$d!G1lknv$`&9(oT=){eub&#91Ji7oy)t%b
zQ6OJ~FKlUL1((H*05UhF#Z+76PL$=<nml8CAgha?)kWzaY%2ToCSP{n#FzQl@YYu?
z#|H8OpPj}AXYmB*dW29e0ACvzd&iKCb?5}Bx-MF<SQNa12rdv&KyRkac5b5<oDR)w
zinPf6E6a!NdIn3gjn-+j2`lX@j=hDC3VLKexq-zRAU$EvTNkxHHpC9kPdz-^`g<Lo
zUwqhl{}9jP{nP!ODEc0<bqHPb|4=4Crx_tr=rW;vm-glr9JK;j(6*8e-0dl~twj%C
znJy&_S9`_hjOU|RvwsUX_KtZe`Pp;3C1=n6UNC##PVn-IQ2OZ1U@amfubA%9i)&IU
z<FTY?ZAUr3pnVOZKm#A#DPXBLve;=Rzc<4#Oken;A?)>h!&FBjddCKN6?POp{Bpm^
zb$H}YM1RLk@UMJ-6#dEUk)IqL9Ka*tkx%D=vQJm0DaJVEJsQdheK&;)9H=4j8(|e1
zlPiG=kz1G<PHD_ZmS36M!ZJS9nQI}yOw3!@<@SF2rjy&W<e#==W&r1R=Ns0(1#rIa
zsrw#Xu{Qc!KK2n1^HrAw*nP97Z^5D1$J>rensbkik}5ciL-pv|qOD5>Apw<1?eY?D
z70d#@hPE^SIw=*J=c%UE`W5i1i+nMz_s!e}dbj=Atwu3fNjxl3G0QT?9l`^!@v!M<
za{R>TgY3K6$rQ`c)bs3jH^sgOS1uF$&j9n<tRYTgAD?Ym-Pt`jI^DM&alv^b8q{i0
z-EDv@DvM~!aD1sHzyiwf*R*}87VtgbUD2n4<QkLaRF0y75h{$^j=1wU3%kw3ZH%^{
z*hXC(`$~B3>FLo3iRyolJC|j~UUz^*Y*ub(_Fxyc_bYQT&ul=kN<v?5W!nVJohdbG
z?47gCumPIwZA8-@Qe$Rl>S>sonXTDuV*^?4uwBHCU%qjZhwD@qS?*aL1iqo85E}c^
zyL@;+o|^^G)M0T#%*wlW255?{S2#4kJaF^r=>5?+_Wn6H_jZntkE5sH2457ImrD}z
zOI{A_c)pF0FnZ5C5I%qIivF|bCkVMf7$Wq+Ip1y*LgpW9#MFW^NqXng9=8G`b0u28
zlq$}r{ljpBW$;>rmpthfcG$Cs5PH=bgu+Pr{OcV;-K$6c_A!Uh*D&_Vtr>@oj`j{u
zqaBkeg+^5?<e)7I*hsUf0jPR9k)+a4p`8YNC};?sLQ|C-EyJ#Ddd-ubnhH7}0_$or
ztW1qO5ME+Vx8ArY66aw+$YW=m*-%#;zUE<9(C#i>u4lh;O@H8(gRcLluO7|O>-x`N
z>|f8#@n{cCd?W1Fp(7H(NSvm0Y*kk%D3qqCsq#hZ@o5b@LN>aoTA(l{+FSW$bNH*i
z{NIxW?HwB-7R!G<5%ZsK()DcRZrxuG@btmr>lpj|5}wC9`>uVZW!<Cy1U0KvE6}jw
zP%l#Sl0oaWN_G-e5M=pUx6pqb2de1Ipc9C<rQoP97Lwtq$YO+Zu~@kM-|;9s#$uOC
zIb(OykR@C`ax@!g=Z@vD>j7M;*0s2egwE^%9Con|YQ35JH{d=xg^c>*K=V_ZLhts<
z?$u`oG{+>K82cl0S2(isMPn%XRN>?BrniP#gg3OQ#}5dzB0tY(x8>o&9<R0Y>T=}w
zZgV3$@1C&pja#FWyf(|?$(erRg*l$aahekEi-J(&u3o1NZ5x($6LXM|8;&jfAqROr
z^qQQ$F%`COWsfCnyvD~)9zcHnr74~lXa}Y%&Hd{ELA|+?3wZ4kK?ifctdC4MV^kvN
zv~yU~T|%~i8-){Nx;9vwGyg1CArMVT*dKf75hsiEi*D`7Y`K%??$n}uvRE)5<ar#A
zhe^1{Fpc!%@*+piPi<=b-!p0b`y}wkua9xvH&;J|u`kS>=gG<O$)WA8c4@C!-=|1_
z6Y4>vvgQTaEYw<Qkt%n{nQ74}BvB70(jp4lh7Fai5;_Y3sWs{ds5!fg*`EGr`zo1>
z%x(?zK*YEjf)?e4057u(d}4Q$k^gN{6N_aweAM1%ma?0y*>83qmRVzT5!=!-f%%&&
z+f~2w$zOeR#lFx725Nrj`e^H(ngsA`=G3%c+Rb9?K}#)Y%7F&vHWakb(Q0VLp4lLQ
z{(qX7Q|>G}u9;`35gd7y<$)_TQ?{686Z2eOl5x&rX$k-Q6O**%H#bG1cx5Ci9$D5&
z4`b|~%#X9&-`hXjw~>gYZNY>DC_F8}Oq6i?5{e0B<)Wc-jc9s0k^|ikv<EvSfGn-0
zpz};LxT%3;CuqUXxXt$b+SK~jN|BX?S7&#alPiZh_3WD4^hbC;xyf-ZRyxjyA6a&s
zgz?+wM=)AE%^vU)4zrQ65k=kURF|TMq4NfuQ-Uc`q-c}zx<Nw{*m9&oy-H9GHT%F7
zfbPLwr5c&^OMJ;D5quCu`80;)vp-i{y69_GVfO5NauYl6UdhhKSFrOlk1UlIcXtnW
zkN0is_EP+7;ThmFbzP!nh9;ycSO-i|dFHLC&u_%d2hV5WIhnQRW#&s6fqw9n|M3Tm
z+RPkAvT8@k^))*~$8N`G9$D6%gz<fIc8++i!3W|V89AMf$I%ICp!VrhqZHl&7&A%j
zP&x$!E};<knr{0X)3zjsI<B?SEI7b%y8V4#PF(#YvMT7}h0<z%cRYCr=j_nAzUDx!
zQFrQ{Wqv-s8%6J$CG2~LCx<)s0NajyWsP#KED_+7f(|NT)Fti;S*A=%Mmw7zjygb2
z`|)UN4%Id2prbnJ;e9&<B_Jy=bEL{t$e%AD?Ufa4&h5#VW9{9AJO6Vtca^?}?tEaQ
z?z}bu^g9l`&kuk;5&cv!_EU3%KRMky+>3HLON1vyA>j*Ur4;RaVra5y)XIy7qJF~5
zqyV}p94b#Gl)#Hj&F2WMxC}!5#JocGx$}=F{M=;r2UgmIv;W6)^L!Drzk*M{_{*CU
z%|2T*`#ZwfdzHMiyR4Gmjj?|=d-h0)Pj>8f0`v%VN@pjaOC}QZU3m+u;z}|E&!lY%
zgi6sB3BFGxLa0uA!~=-Dx~NebgS;zSWGyUV#@D;#!iZLTn3u2F+fu?}VTWRS1Y2z7
zu{$fx+(0Y2+&cN#AM)wWcCx2r(NZ6?-!pSv8Dl%9zF-c*%3aMr^QU`$5Kum{P9@(x
zQOUlPcoKm5qyzIYjQy)Qm<LCP`zO&2+V4g{^7TOKH9$%Oq!OyQt;MfZKPjK|{Nm~D
ztGmL0@{@C)B-Z#OF$B%Wc9)9_1n-Z|tkOYgg64}sPt*z-N`>?UV@W>LzJ?X8c80;q
z%(=q-b9n{Hi?i}iZu0T>tPGlu?=FiF3ZzfIFvD{Unz}=vM5Vti6D=CFL%A$!6b=AU
zS3(stS+Nw|VWmV<bxamTA{q?S86jvKpht=^+M$`WH>})<o!k!^M!tsN7T5WS<|{E+
zhfj5ZPfUKQE7m-4$zqK^f9u@=LH|ZHI@6z=6Vy7+k0NYb1<38p9@1xXe{y1(+e5YR
zXVsa#LN4zg7Gk^8-The=);EqRhr5`&T1{;K-wYp|$Iq{vTUR~%inr8yZ&{B0(%#a+
z2*|xoc2BIl#T?N%FVH`!Sy3WTqP>o9i=vdY4LV7YYH`gO3RFp2>x3n^AW_wVHq$K%
zfvISUqGO$4lLgmaWX`1P+)`I6){z!tX(kkAc`y>&nzXG_TS1%I2;%E!7m~NvR;CNT
zw6|=jgfW{DE5I{pIm#u7A)HF7@M?6Xq=bWGsjPac8MG*%CPkg9Qh|ghz0%=jT69!7
zAYqGsAM#z6q}!F{mizxD*ew>Oupj%VoG-FL&1bV6E+eeN@Z_|q3ng~Tk0ieWo{#vm
z^#?qkh*UI=sX0IQcTY}EqZ{|q`URGlg~-tTbmAMu5>!{Y>@{trM1_E~9a3CaaFNus
znN*~zZNXD$A_)yR9Nf8Ts_UGa1vv{BcjVRVK;GcP=PsnmOz?gG9I|`Y__lkekhw()
zA!HwzQ}FgeHU%V~OlI9H`2R7))kR|QDnfdeJp{<!zYelHCy@CSqF>otZq@zp{?aDM
z-TmV|pk_)?morQtDbWLI+qz%{FQ6B@PSPGeg?Mp}_Pd%WSz4@RU)gBk@gUA^#nm-G
zJ&w*PP;}kexYO+AlNyv)(*Hex(<|w3?JZYG_YZask(7~=zCB3NxgG13^a_*9SG-#d
zZCLgT?#*Vm1;@O(7nu1AS5wm8S`kkO-nY+<*+OaJ$!&pAw0+ua`-9v}o7cb^hZ9|X
z7aR$*;dS0R<_)9X(tqJ<N}8;Cjwf0FY=5q|baO@~`8S==id$M$ioPa(6>Whs0(1|u
zVR%d4a@3rAp<C58yuz&lca<H2I8V`pLbeU(S>FoEKZD6FPtwJiO%SdQA4@@M{mU!q
ut*y&d@Ba*Ga5RY+ZU6uuiwFb&00000{{{d;LjnLB00RI30000000006cpcXO

literal 0
HcmV?d00001


From c3d87f54a1554a4dfbc2747d52b21d3b141b3e9f Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Sat, 26 Oct 2024 20:40:41 +0200
Subject: [PATCH 38/42] Rseqc inferexperiment (#158)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* full component with two tests

* adjust arg names, container base image, test data size

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   2 +-
 .../rseqc_inferexperiment/config.vsh.yaml     |  76 ++++++++++++++++++
 src/rseqc/rseqc_inferexperiment/help.txt      |  21 +++++
 src/rseqc/rseqc_inferexperiment/script.sh     |  10 +++
 src/rseqc/rseqc_inferexperiment/test.sh       |  72 +++++++++++++++++
 .../test_data/sample.bam                      | Bin 0 -> 5595 bytes
 .../test_data/test.bed12                      |   4 +
 .../test_data/test.paired_end.sorted.bam      | Bin 0 -> 19725 bytes
 8 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 src/rseqc/rseqc_inferexperiment/config.vsh.yaml
 create mode 100644 src/rseqc/rseqc_inferexperiment/help.txt
 create mode 100644 src/rseqc/rseqc_inferexperiment/script.sh
 create mode 100644 src/rseqc/rseqc_inferexperiment/test.sh
 create mode 100644 src/rseqc/rseqc_inferexperiment/test_data/sample.bam
 create mode 100644 src/rseqc/rseqc_inferexperiment/test_data/test.bed12
 create mode 100644 src/rseqc/rseqc_inferexperiment/test_data/test.paired_end.sorted.bam

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5f720035..3fc134fd 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,7 @@
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
 * `rseqc`:
+  - `rseqc/rseqc_inferexperiment`: Infer strandedness from sequencing reads (PR #158).
   - `rseqc/bam_stat`: Generate statistics from a bam file (PR #155).
 
 * `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
@@ -27,7 +28,6 @@
 
 * `cutadapt`: Fix the the non-functional `action` parameter (PR #161).
 
-
 ## MINOR CHANGES
 
 * `agat_convert_bed2gff`: change type of argument `inflate_off` from `boolean_false` to `boolean_true` (PR #160).
diff --git a/src/rseqc/rseqc_inferexperiment/config.vsh.yaml b/src/rseqc/rseqc_inferexperiment/config.vsh.yaml
new file mode 100644
index 00000000..184f2c10
--- /dev/null
+++ b/src/rseqc/rseqc_inferexperiment/config.vsh.yaml
@@ -0,0 +1,76 @@
+name: "rseqc_inferexperiment"
+namespace: "rseqc"
+description: |
+  Infer strandedness from sequencing reads
+links:
+  homepage: https://rseqc.sourceforge.net/
+  documentation: https://rseqc.sourceforge.net/#infer-experiment-py
+  issue_tracker: https://github.com/MonashBioinformaticsPlatform/RSeQC/issues
+  repository: https://github.com/MonashBioinformaticsPlatform/RSeQC
+references:
+  doi: 10.1093/bioinformatics/bts356
+license: GPL-3.0
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+- name: "Input"
+  arguments: 
+  - name: "--input_file"
+    alternatives: ["-i"]
+    type: file 
+    required: true
+    description: input alignment file in BAM or SAM format
+  - name: "--refgene"
+    alternatives: ["-r"]
+    type: file 
+    required: true
+    description: Reference gene model in bed format
+  
+- name: "Output"
+  arguments: 
+  - name: "--output"
+    type: file
+    direction: output
+    required: true
+    description: Output file (txt) of strandness report.
+    example: $id.strandedness.txt
+
+- name: "Options"
+  arguments:
+    - name: "--sample_size"
+      alternatives: ["-s"]
+      type: integer
+      description: |
+        Number of reads sampled from SAM/BAM file. Default: 200000
+      example: 200000
+    - name: "--mapq"
+      alternatives: ["-q"]
+      type: integer
+      description: |
+        Minimum mapping quality (phred scaled) to determine uniquely mapped reads. Default: 30
+      example: 30
+
+resources:
+  - type: bash_script
+    path: script.sh
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+engines:
+- type: docker
+  image: python:3.10
+  setup:
+    - type: python
+      packages: [ RSeQC ]
+    - type: docker
+      run: |
+        echo "RSeQC - infer_experiment.py: $(infer_experiment.py --version | cut -d' ' -f2)" > /var/software_versions.txt
+        
+runners: 
+- type: executable
+- type: nextflow
diff --git a/src/rseqc/rseqc_inferexperiment/help.txt b/src/rseqc/rseqc_inferexperiment/help.txt
new file mode 100644
index 00000000..f19aa318
--- /dev/null
+++ b/src/rseqc/rseqc_inferexperiment/help.txt
@@ -0,0 +1,21 @@
+```
+infer_eperiment.py --help
+```
+
+Usage: infer_experiment.py [options]
+
+
+Options:
+  --version             show program's version number and exit
+  -h, --help            show this help message and exit
+  -i INPUT_FILE, --input-file=INPUT_FILE
+                        Input alignment file in SAM or BAM format
+  -r REFGENE_BED, --refgene=REFGENE_BED
+                        Reference gene model in bed fomat.
+  -s SAMPLE_SIZE, --sample-size=SAMPLE_SIZE
+                        Number of reads sampled from SAM/BAM file.
+                        default=200000
+  -q MAP_QUAL, --mapq=MAP_QUAL
+                        Minimum mapping quality (phred scaled) for an
+                        alignment to be considered as "uniquely mapped".
+                        default=30
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inferexperiment/script.sh b/src/rseqc/rseqc_inferexperiment/script.sh
new file mode 100644
index 00000000..c425b6f3
--- /dev/null
+++ b/src/rseqc/rseqc_inferexperiment/script.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+set -eo pipefail 
+
+infer_experiment.py \
+    -i $par_input_file \
+    -r $par_refgene \
+    ${par_sample_size:+-s "${par_sample_size}"} \
+    ${par_mapq:+-q "${par_mapq}"} \
+> $par_output
diff --git a/src/rseqc/rseqc_inferexperiment/test.sh b/src/rseqc/rseqc_inferexperiment/test.sh
new file mode 100644
index 00000000..ff2e870c
--- /dev/null
+++ b/src/rseqc/rseqc_inferexperiment/test.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+
+# define input and output for script
+input_bam="$meta_resources_dir/test_data/sample.bam"
+input_bed="$meta_resources_dir/test_data/test.bed12"
+output="strandedness.txt"
+
+echo ">>> Prepare test output data"
+
+cat > "$meta_resources_dir/test_data/strandedness.txt" <<EOF
+
+
+This is PairEnd Data
+Fraction of reads failed to determine: 0.0000
+Fraction of reads explained by "1++,1--,2+-,2-+": 1.0000
+Fraction of reads explained by "1+-,1-+,2++,2--": 0.0000
+EOF
+
+cat > "$meta_resources_dir/test_data/strandedness2.txt" <<EOF
+Unknown Data type
+EOF
+
+################################################################################
+# run executable and tests
+
+echo ">>> Test 1: Test with default parameters"
+
+"$meta_executable" \
+    --input_file "$input_bam" \
+    --refgene "$input_bed" \
+    --output "$output"
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Checking whether output can be found and has content"
+
+[ ! -f "$output" ] && echo "$output is missing" && exit 1
+[ ! -s "$output" ] && echo "$output is empty" && exit 1
+
+
+echo ">> Checking whether output is correct"
+diff "$output" "$meta_resources_dir/test_data/strandedness.txt" || { echo "Output is not correct"; exit 1; }
+
+rm "$output"
+
+################################################################################
+
+echo ">>> Test 2: Test with non-default sample size and map quality"
+
+"$meta_executable" \
+    --input_file "$input_bam" \
+    --refgene "$input_bed" \
+    --output "$output" \
+    --sample_size 150000 \
+    --mapq 90
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Checking whether output can be found and has content"
+
+[ ! -f "$output" ] && echo "$output is missing" && exit 1
+[ ! -s "$output" ] && echo "$output is empty" && exit 1
+
+echo ">> Checking whether output is correct"
+diff "$output" "$meta_resources_dir/test_data/strandedness2.txt" || { echo "Output is not correct"; exit 1; }
+
+
+echo "All tests passed"
+
+exit 0
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inferexperiment/test_data/sample.bam b/src/rseqc/rseqc_inferexperiment/test_data/sample.bam
new file mode 100644
index 0000000000000000000000000000000000000000..9b8d417c8cc1651725269c13a1c915b08b2e4d38
GIT binary patch
literal 5595
zcmV<16(s5(iwFb&00000{{{d;LjnM80fmy^PQox0#*25wm)Hxe<5ULHs|`qyWSi3o
zw@dd2T*6jd8#LbgV7{4Qz@|9#?rYEK@9TGR#<tt}yh6yjo8qO%fDCYO&tf6UBrCW|
zyH@ak1CO~+FrveONdP+@qoZ3o>ROL8JfAYa&X{eo2(a(4x#KL{xo6|RWh#{l`wJHF
zG8Rb+UCXZ?<XMsBd`q$KHG1hWN?@p$qdwq?Qx`OyziG16_AqHATybN?rQ(L<rHsXn
z8ncNV_5jSjY4%M5p&hu#(+<LQT8KQ0$*SPzh!-O%8cYbJx+LrOe;R4scnTzWu7udg
zeV|7BKf(clD%Yv5?XvV$(}PH>!F*E?$@6n6fpc!H+qhDcO4CJy-SVtVlLK9pDosel
z^VV|IVoooa6FAt@UQ4X!YKCDo!o4C#m$XQ}ed3qd%|$c%r<opgI|!j`giy1n=tCj&
z1x#_s3g7|&03VA81ONa4009360763o08kXITW_c>*HxdBm&7DZ#=Cd+dEI8Pof!w+
z#1ig(@BVGm@@DP?C)H@kgK8v|NUTC=YcQ=?Xd!#t;wB9uwoRaZNs0X+SP=ZsekfS6
zMub8lhF6J*p<vNQC1~|SD2czd|IC@0bKbl6y}o(xo;zpHo^$X1&2O#sTWjsxEfRP9
z4!+pE_wz;aLh@|lkK1tVb$q|?p@;Ui_m7TtkM22lw7-9tJbG)AXf3s5idk;@&Tvr|
zmC(HJlvKSbTGd$I+TNC3XNsnhMK6k?*R8c(*>+uB)~wY+wnCIP&-1iOt9azYS(?!}
zu4MF*=arwArgY+_rZm}~(<K_FcQil6v<#QC%zYBg&M}r(>C%0tGx}K>z2Y20bUQ|H
zg1Jxb9Dn7>bH~X;Kl95w$<IG<yr?f;EEm7_kzfC_&;HZ<e(R&h$zvZozT@K`KTbaJ
zME&5$p2W4+N4R+`FD4HsD}0x4zy0<amVbCtl9T{NwU#p_7%z&dFL*DkDUG#UbbV7-
zMpe2pjWzWJ`c2cN2Z9!s(OX&~9T~99@NY?r!wRufhiClqkmXet-lzWN9HVOsK2u*j
z{yf%u=GNQZAJ%*GYQ4vwzdul?zadGuTkc~6bzaBV+f(Z7?Cc&MY&-5SQ8%V*ODRB-
zs;qm>y22W6MN?NonT8u#T3HKedsZ`UYR)w)bR$gH>9R6R6{a#mX`a)OM$U|=GkG^K
zi~emK&og2|o)04%ro}6H7Jpop<sM5Kr}Lb?stQoPF@`b$ciuzXA#4Jc%Mo|pv-sHn
z=9dOAUr3${U_N^$NiHs?VD26q9UUf508FWv=%uZdZcAP_wk=fMD{WY9Z6O#_Eh`k`
zQWRZlEHAs7_p<3QlT~%w7P?TCl3gWPv6Z`{S_S|e$?~N?cZ_nT=FK&L9_==B)4`(*
zwuY}9FDc&T7)Zn2aeD4X+$6`^D!}LrJ6qDF+;cgnQDWw|=cZT@oG-Wk7)|lc5uDW`
za`W{8&d&|pB)%@Dra0#2sgUAUfxoIYjV_fk!W5PXu6o_@4%B8s;ZCATo)Jg9HPX2|
zQX&if`-RiWJ*DV<<^c+>D32fyWXx8?%$-Y&T--?6bodB2g^ts3lTEn!KO-J`FfS&*
zlC1dpmA9r~dTzc3$7Z#yK|X+Iv{sEFNM+r~npqH47gf!>-Y`}-;Hz!i>Z;T&-Vuag
z3eGB922SSQUuT|+^s<ap4Ft<)FyvKS__t>ZKmPi3-R~Irx35PI{r#})RcL$*V}CvK
zZ+m-3ySt7<9p^2ptg4w1O4ia8bx~T2RWoTCCK${W1BGl`X8M95C`wx@&1@+dJf4z0
zD_PqC8SmZ(pEQ23tJC>o4^&9U)TmZuqDvWtJkLDm#N4IK+grE+7lwFIEwePT6vKe>
zt5?UN2<BTO4n;5t;P=lAas{Sg62P<t0s%q<Y>b(<mR$)DjkY}prhTDhA-J{$FGLHR
z;r0+lWUH)fJHc!v6@*EN1(1WuNdb_USf)!j8}G6mad5$Vz#$fRv+{&Y!;^5%43Gpg
zBmQXwo<4iITs@R*YMRT{Q;NL&>PbT5wu#V)!Qx#vt|D&?7T>+m3y?9Ghr4^b=UlLu
zfmcq!V^;!C*rrpsv6c<Px9&nf$}@2A?)k_T39|6khYm03z@XI6h@mr$q;rpg;)nlO
z3Jp%+kh$kBrg$)2;Hca@r`u8&bZ7>_`Nv~8ZvdQk`C#!&_Xap`AHjLo;*9{^{UbOZ
z8I3(MGxYtVy`A&PH{jo9##%GhSi}UYO{;6!*S+a=Uv#GGdBMyIu<LiEflpZVBXj5W
zwaYuVuT9px?E~a^&3`hp$7@~l<$vuT3W;bqIoABh71n%ayym|fnMf1`O|mkj|C_}z
zZ#Fs2?i#46PP3*^rXqh$Dyt^d1#@Xl>2wSXe;NF!Q`Qv4m)=u2#S?z-rE?lfI``q3
z#P=F);RTLAjm3QQ)=x#&-ZEOur{5cVz}7of(dn767>2QDW{KP0!Qr``<QumnNuwke
z7TN{@xE7ijVXM*@T2pOQS7>NP3mX?2q1kfFd7;`y8fk^ByH;q3o<g6R%=vYiQ>_Q~
zcptF))I1!?D}*DtI9jZb;yXSZeKuJk`iWVp=F$6epjT=Qax+zRT(A!A5c!!xl4}|v
z8`De1xHX7Fom92dN*5(JZDDJxYskd9gJ0!xi{cMu998bq&)kL9+=r9QIlwFpjxMCX
z14!u@1?MQUaONY?@+$E3#3S&w$KaivfwvfecXsh}0leS5eLei#c}J4GcN%`qZy)XM
zY`gGN3uGlN@yJNgvx2G4bcj23h13ou1PP(UHF+1#Y-f=Bva)AZpisaFN@2U6LRBq{
z&s9P~vh)d9Iw}SXVRpoo5r4wj;D=TkU<lZ`v*|eBM}HTl!ozdnDa}%!Ri=I%a9-q#
zIT`Pn;dyq1=RJ!LE>_shlNG}F#i?Hkc#<}<(!yA^0&<<?&`FG>CM=oNLLrQI3~n0P
z3|yCNM9GZR2>Bdkml_#PgESnXQ&){v%mrxt{vaQb&RMsAp=2yu(o!jv%Y(wsRv_kM
zf1YMhlVhgO>9RCS2VtHDH5EnEs+i+N!g3k&PVW_GXl@cmd?JqiAu=%qn#uh<NxqzX
zWHgpc9pm|PyNCOG31khjIR?i@V3@k>k<g*W(1CI_POG{ov@(iu-uJdJvh!}+)6Y9~
z_e%#oL%!eZ7SANdAX=+Siyd2>#H5*|!*dV17fil5<xq3$rgcUE?Uq#$NgiGq-DQ_W
zMyJTNH>H-HcA1CZurJlaA$1LkEEv*KHM|y9arphhN;rPiYHJO0SxyemDw=J(3r=l0
zrqeP~zXt!nQn%vFCkQ?}a}@wuvlno|jyxEd10m`}|0#s1r<f~nMG+Vzcj|BTR!Dbb
z&ftO%+aI&!9reZI7eSJrmS2das~bnb^TjBjZ`oS+U?1LEt9qV4fc!afU^bA;T6b+%
z163#?)H1E+2%D{yTya=nt6CJWV3W)!Q9(-E8k*X`4q>IZ7ts_YthN^yuCE@<Prfw$
zf%8h3yXFP-YggyMZVGVTH^BKq@_4cudwd4x{{GRyUh@3S;3k|Dl>!UJk>@qKsi7tf
zGLWt*xIhq)l9iAks2LY+hinB^Qhr+8@@u#<Jpjd{b5;Gy6p33$yn1B<#QVIg7z6R_
zR95`3K%90>%Z;<Lyvk33MKrLJfEXLtE74nhcxyeRk<l$@KzIXl^|#Y5-srj&D5h{>
z5X4y5(RBDJI4h1$gO$ot+G(r6m{47-d{HhB#d43v>7cv<PA6Z8P`-9j?c_%%wUel~
zKE&6()vFl0YpS<)_xE=9e7esSDSBNron)rdJ=8oLKe$$us1I{^e&qY5RLE-3e^6SJ
zJnNyq41AU1)R5>TN&pCk7p|MtMXiut{vQ9B@}YE@Q;79vnV%3V4KUG*H!ZHNI2`rX
zt9+fDgz<N$dMn^b<Q&kbp`Rp?PD)gTg_fl*5mt*5D!)S{l_>8S&S+)<#CxzA*+Shy
zyCPi#N%;&+gnmi}M!k$HfjOiCRkg7?cr8aD=M2qF?7VZr&WPr3+__#Ry+I_&&WxQ0
zM~B-7zD$aaSJ4T)SjhaMs0xU!mdmb?b)i8)q^SnADy7l)GH4k=!GOD(afO;yk4iEL
z+`?KCW7og5vO2P{*Npl73OHXmC1E>E^tC;pDNjUGY($fgo7YJR;g@FIT%ifo3fPq5
zNJd3tZQXHQk_p$n64olz@;bt{L7LE0KaC+xki6s!$w3PdV->C-SFLqa=)nNJFi1nQ
zAssS<Y68LfyQ_X6mkmden<u0B&W%;)?}>G8-m^@SOH*O8w~ao~zR#?<5U3?X%drNk
zUs+?I+j=WXtvaR=b##lui69*a)8<q~;kH8&RW=1xN_t3XR(Hy<>%z?qA8^9QhN5-m
z^In&4ta>3%+PriUn0x}}@nQvBq~;$bD}e8w#`FMYPJ0DT3l;J$isQ;S<ELT((p5-o
zfrkJq=OH7d<t;~6nZ*?kW|_vJtB3d0k19s<H5jYKWecyk)&&^<a)9x%<O>1D?_=y+
zGftjAcXV)+JO~&I7-yjnrVRsbsJtSQz*<Y&!RA}d3ZXdMW3M~7)K>HytVM!ndQp^I
zHNC)%N{hmDTbJWxdOaeEPvsold^#9HH|4ka&>QfK%tsS|<u~V8-hGAGbN681f3;k<
z@4L<>$$!nT+}++g=Vj(~gObbZ6nkPimQTpIftXq7!wudtpL$OJ+W^e(TwNfKN%8jw
zFh4iQOd{*<%)UJ_NoH0CEpN_H7G_dH4Z(Knjyj&5HWYC<<fcMJv_`Y8ktjP`4HH)=
z`9VCjTIy2Z6AC36-*U~M%+*3%yDkfu_-w@|{FyiJEG#_r<5|8E@q;KzM+ncF4xa1F
zkrK_TP4tg;{>_t;;<26IGuS!tFuKcV=Z>#4I6XJX&aV|aWJ#)rgh#aM&~w&xbs>)(
zHNQJ0ZiQNgPHu3WZnz3Vt~DH)7MYcAXsVBZ#0QV};4DXT@>t#Rozj5icTTeNt&Mhm
z;q3ZWqvtU8A2T~YIDfG1<9NqU1)Q7#hsqe$?pas%g5f(QMecWBR@duDHxHi>iPpAx
z0L)7p+npcyw}<cDVCQ!Y#C-1T%FZJ(31BfJrU#RViQr9b<3<YJ8bz%kCVExW65qR0
zTi(jf_JFHqknYf)=z~D57ag#q9=?Q^MHLZ^69k}aWvzohqb)Kcy&WsiBWhh4oof$X
zk(ifGBIX?vV!j%1`|^P03&{hcu?ObF+&_PKn7oXZ;^{zqDv~aCDMnR0&&A>WvWioD
zmDNKzK?Upkz`Tm{#p&`!8x4-XI9c5}@508b`-?eKu6A{C17|jQ64~Y_G`y0BnF<vu
z59Fs<-D14DGT6a`Kel>ypGj6@pP4?p{hghibLW$ne<Vpz#zo1g6a~{#SY%Sj7ElVd
zE%GfPk%%E>(5O!p5&{D48L4487)BvWiSSHEk8t*sHXhmKhc0UsLs$!`n=>V#^vO9&
zZ%$E4M<~5HL227_V2sl5PC0P>P?{|!&sOB#u-c8@G`|*q5Iw;;p!CTrpmfU^rRY<?
z_2Vlp4L<c<N%Gc=OM5%#&mTIUidq{rTc9J+cR`B8YEub;{0Nm`LQU{OGL)B8T?&-$
zbkTR{of=??smR+)iTWMv%*Y_>eIQPUhGvM|9*9xrt%})F8tQ587T><hV^2BunzLcL
z>v@uk&O*h*nQ-p5=g?QS%$I4{d`5m9NdC*bJeSWJc;jH7FY3h#H^TA;FzyeM6}~sj
z80WEUarOCIa7RDG1<gCv_DC>E>{IWH4oTw*+(U)5ZnUH=vgkHT0~duZsT6*T_|oqi
zkGqw;uW}_i+QgaWK{$Dce109bs;{X}-^+4MWVi7(@G~Gk^z}zhEf!xK9oIm-jX-?!
zKVRRp7yY(D%>U%O*Y^`YkFl-Al#qw}+ee{C--`xJ>`*$kEm5sPJ+l#LG#iktu^i<h
z4j(QHZY`@cy2>?D_5z(z-I;=;11(fdokTHCGq2JR*JRKzkZzjjUdLs*wF1>>9>Mv8
zlf-;B5%W=iKL76ZvK&GD##>WPdN^I8IYERO(-Z9^9A2-{t+2=_o4%!Oi^kT-LE#V?
zxF|ckrCH2&=sdIDa5+y7C#%n@aZxg#34$xYoL1`?pYn*@{J}{b`a35b`dH)q+j2c?
z|NWmxl3$#1^8k(PeP4fXBs^eAt!OJu(IM+BdbmQ4h7aw7zy<V$-2NL`G#0H{iZOr@
zEhM;7^g?>v(i9z^?O)1IxA-e`EJi6)9#3W;p8GyV=xzAd=wIU}qsF;)l@;UI9U|l+
zHKU~y!hbrYW<ZlHouL5Vlm_r2*F=(uB6~xPvmPxDg%%OYA*iqeU?B);Zb485h>dQG
zszHdV8Jq=5b6xk8+mh|~jri3?Ai_fZO7k&+^OH$;ZN4LZJiy+F=ark;(bRwMr`GFp
zH>xB#J7wqL!S>NPpB@z)nWg4!58Bz<@DA}?p?^^q#?TH6rD*R(V_B(~|LYds%f4fs
zV!foc5|Y%GQqwL<#Bnf~A(2imR3Y4sHiu<#W*#3xwE^{fG~`Rhc?>%)VNYXpa^$>u
zaZ-LX2J`O4#Tl4qMy<<B82k5R3g+Ja-noNh>t@=zlO&Ixxdu?RDr}`iuO*s9R-r;i
zjdcXCV(VHeb>nSgA<E?Af?GOV@dc_hY+#{E3Z1*XGF8|gL{%;N>ocd2Tpfncg$Mnc
zjS}fH9n5@y_0j<As#Hf%zjG>)0<6ix!}<itq!JYn`WvCkQLsl8>N;&(C?;)C*fR!l
zUf@6UCl@7yi>Tovgno*&678s8X2WK#%n3xur_9Sj`U3(_$~=ml+)1TuOljN@%6X}a
zUVif<vD!}?+XTt2d67RnkkHzn!+Hj=-cGRo>#u%k(?;;y7yld}{NIu3|3Vn6VEVs3
zW9#AG(f)q&2J+oGezxn;Rd46`J(Y)@!0u0|hW}1yr+z5+$-BRl#=UfohGFBFSNLu+
zmiVDr@#pwytnk0<m*NU<8l@>u$5=srY+m9e-h6KwD}rzHo5)sn#Cxa}i53kyhUnZN
z&QggDjT&3)0xcbof+LaZBmzv=H?&t=)J3hVEx@SeDW9g$W-&Z(BFWGa@cUq0UXl8L
zJ~s(sIrD$axypP~FS&iCc-UFE9?*mEj&DZC(v2)bfAQw4j1`wg<cM~7--tv{&+O0>
z<~6vrlI##)C*i2rg{9|*|6ADH!xOgLRbj)qAN0Hll5QDsaGTHbO-y+TOMB*nk8j*w
p{{Oo8K`a<N001A02m}BC000301^_}s0stET0{{R300000003G?*<k<x

literal 0
HcmV?d00001

diff --git a/src/rseqc/rseqc_inferexperiment/test_data/test.bed12 b/src/rseqc/rseqc_inferexperiment/test_data/test.bed12
new file mode 100644
index 00000000..33a46951
--- /dev/null
+++ b/src/rseqc/rseqc_inferexperiment/test_data/test.bed12
@@ -0,0 +1,4 @@
+MT192765.1	1242	1264	nCoV-2019_5_LEFT	1	+	1242	1264	0	2	10,12,	0,10,
+MT192765.1	1573	1595	nCoV-2019_6_LEFT	2	+	1573	1595	0	2	7,15,	0,7,
+MT192765.1	1623	1651	nCoV-2019_5_RIGHT	1	-	1623	1651	0	2	14,14,	0,14,
+MT192765.1	1942	1964	nCoV-2019_6_RIGHT	2	-	1942	1964	0	2	11,11	0,11,
diff --git a/src/rseqc/rseqc_inferexperiment/test_data/test.paired_end.sorted.bam b/src/rseqc/rseqc_inferexperiment/test_data/test.paired_end.sorted.bam
new file mode 100644
index 0000000000000000000000000000000000000000..85cccf14b05a885509f1a1a2945ead6370bff9d5
GIT binary patch
literal 19725
zcmV)uK$gEBiwFb&00000{{{d;LjnNsOtrlWxGhI@Ce}wUbS3%Btx|2d;4pYsPa_wy
ze0BCdZy}speHtivBrJ0=0}QXr(Ul{DB%&?A<B!CuN_kv1cm|I#&PNiDFgC-CJ&yD6
zD}fA7usz{(VqPEEco_3B_z^Rck4eB`Fot|<t*Y+cz4v+DdtK7qrysTZocepcYSmh+
zYIFa}nQMyTylu|B^hI}1&TXH$@I7}QJoeb*4?g_p#V0P^bpFEkp1JU%yYGL=WN*5&
zy>)JK=Dru*J>A=#?%j0$#V<T_Z*%wL%zgFUZ+Q51lQR$8clW*b-S>U>-}@qS=ED7V
zKXGaH#F+=a@9x)KeEi_C2QN)-I{yHE;gN?QefW`!Z=9Za;6>f>OJ8*N^xS0UJ0IWM
zoSfOeue~w;&Wk@Vo;^N>ht2MK@Zu8}?|SIs?1_ta%`QGZJ9zBP)4N`{Z(g%LnQU!t
zZEbFyo4m$cxIdk}dGea?zx3#1k34*E?x9C74j=KK-u2+4-D6%eIfoa|eg6+0({b8%
z%mbQTeB_D89(%(K2GkutFgLi#xe1Od-ygsE;Y)8Bf9HkQjoSxbe8Z#XW)~kBKK0yW
z_}un**>h))J^sY_J0I&F`^Jk8KYr;!dirCJKXK{7bFaJj$kN-dyZFdWD@9S<Togt3
z!iyh%q9`m}FN)#^ilSim+{;&r_ZIJ>O9g!Ai$zg9|HUufTHoH=+uS=h-P_*YEna)3
zD4df{GR-0n-Ucp=s)Xa-YN>rtjjkhaqK~DuLDiL1UMS^V6QeB~YfWib<AiL4DC2lM
zUaeNE+{W$E>Ka^Mt*SNn7>_IJx4H@k*mV^;(=ptF%Www$S65f7wQ*Igt>KfP_jq-6
zbzH5E@FyI>?^=V;F?5GN1Fo#XZyL`g`~Q44Jt$uM!dFa+7u|obVlK?el?N|9^owu%
z!{<K!hJ)hZk%Mn}^wERji8q=D9+}O`l`r;iBP=fzFE2WLN8kMBHy5yc$Mr=~MqBD8
za;632N>yHQFJdS|j9ge>n=0taRiTc-oQJ=3bwohGz}Dath6rtKJRYx&*Ty3l-0HX*
zjd6_aKZ0yrAsp#TBSdC#z}K1!`|o`4(nHstx#@Xnyf<{?-GAr3iFEumMZs~n2QBG*
z@tUIe<~-?4CY!rE>qt9Hm^#>|lp^w`Dvfu{su*|_wW)*-H4n0k(g+znGmHnrxns)J
zB3SFnDln}=6@+$t42{Ov4w0y#f0i1Ie7yYncnv6FoNsk_4I6wP$O@0w*2aV-T!-T^
z{Hv<jWdBQjC<}98b|=sdV3V*M^=RkLmA{?9d`}DJdyAh=V7}|xqPVa!59a31-rjEU
z#xq4BH4|P&qg_*SQ^!VW<FyOS#Ha*gx?xH)E`_p9h`h9hds$mkM^>4pQBLVfOIt~%
z&LW!Q3dt$4R5c#aF;Wz=8nJ5vJ@f4vb|0^f*H+i6wbiw81%x?X8;{pkkrau==>d2J
zJ0aB~bnpP+;I$Fl8WSzUYgq-#{Pd6&D}eK(&F^PcytM~swUTM`i!Gc#+0rJ^>&iSU
z_G$A(Nb#s`Q)&}xS85$ZP>~6)y{oycgJnW{Sp%tz*MLTd8{rt+DiBMxHl~5Y$cd!j
z9I0u%nnV#&6WnNt5jv+sAUF7*PL`4^KBeuC?Ig6x=4kW(?a?8Dd7*f9(b4Na|HeF+
zM4O+xz9^!LMrmf6COD_-0FX*kOT%Iv8>cG6tq+Wu+65MC99;#n83}F;1$YuxYokPy
z$n8j7X@H4bljtW!G1rhS#|u<cm<#(Km>>7eJ^A*5OhX@Qhux{hZ(Lgx|Jy*mZEfvs
zZXyj?&Kp)mZI}>R8X1&PWsHVJCPU2xlMyZ`*)%M8#bUHFmfEpcO6Hl-(zB8^HZVrw
zEe&J-#de3o*PcjWwNDyJ{1GREJ_MRX%0nTBw}4skdDw3aH7DtdYP7bRsdQ}(<lC>E
zj)pRr-{{d$1`_~&-hd%HFawj0#YQp1V&GhR78)aM$z%*JdLEhd%1I@-i;64JM8gDQ
zLI>ICC>twStfaPDmeqI-1k4yL05B#-tE16s64^*NtD__VTU|hGMFdH7nM7e#jS`v}
zf8tocbzF@{bQc?<)}4;Jx0LDf*N!tZZkjVRva)#Ab)DwT%HmJ1BLk!l=I-X!<`k91
z0K9S@Ja!a#BGy)yOd46Ul1Y0$Aw?9}O80S(7Xh+#N~(hlx(aL-?2%O`Y7;^U<slFo
zesLA35H$qq5>hPqR^TG`L%VGil@9a(IRC3YoY$BOvuBX9c-cJ(&RcqLK4axe3A%fG
zaK5s44UYS8ZtrbPHi{2jUlapoHH_8KggUy=I3vCB!8@;Ps61CJbbwvGwwmZ^HaU3Z
zP`Lg2;ll0L=f=EF3bH@upBwPw<&OFAKl_T1NQ`qZo=p#4dBm8n?T`6?@3BZW3hJU`
zrN0>HF=CrB^tQE$(K=Qu9V(F5U}YJwx|m>1DAA6oR>z|xN71qdEj}Viffi5tyFrnS
zS4VJ6>KTmh1+rZ+7Y^R~;H8J&aOUr2s=cur%)wjlPBP%^&7F37dm0S8xxnN8WMgY*
zcRDFPbYoG}T5=H$_l;9VI2J^#$`D{oCTOc1V>;HX6wU`0dE{K_rj|iQA&qT>3$;`3
z#Ke4(PABO@Y(Ka_KW3AIzc(QMnLI)r$(6m84k<n%b@aBPL-dKkRE^O4<r|BlbWYj8
zbY;0<mPuoMt*s5A7P1arGR9+2Tv@4&blRzshekycoe3fsD;v$_S<pW)<EX&MLHh(~
z26Z-wJP>Rs1JQD%>2C{CR_k;EA1FpqqQWhh&-HqofQtaU^*(sF4ZvIJfp^==eF?nB
zZdp`6w|+}e0BQEX+gRV*oUEhzF@l-c038J>JX1{DU=?Skszne+No5pwAYC|%HVErk
z=~)y)HVmK?(Rxr-Mg*FxqzMU#Ku``DkX9`A2RlbU<AEINY=Bl_gI1xZp&5kQAJtTP
zd8nRN!I($?0bL)@CNsV=B*t?CJa6mad2Z#qS32ynT*|f)VEo(jaw*{nwo#OkG3~Tb
z(n>BJV_<4V+b|<EW5P04GG&`+ovcO4LUd(hTsFcqA=bjw+A&wvPBT={V=xDi=#abV
z114i@;BZ)jsT>kyn)N0SQOCE460sr?g4YFgH|lA^9}uLMM%_DS4jBm}wDZvzL*g8u
zxy(4?bH>ppG9|7S1uyR9Me*~+rQS7YeKe<=yW3lZG*X+uoN@q0Fr|-7geq7Ot%*!j
zMk%L*W}JH;Rgjj%HjZROLLL9;WY2)?C%c6x2`LE1iqm4G7Bn%TC)nXQM}&^3I?y?V
zsNuL_oY{cJVO2mRFYlP{&|v}7X%Ovo>7;cy^AKEmrhRnA)=?TJqm#PkMnuhH%~X`m
z2i>?BgRKJxffqIOjuxCxNE0wQ!|6Sl$><Nj5d&GE`~o1*0{~+t11|9qfdQ5veHBF-
z9s;67V(8$?Mqo0z$yJc#9DHL987|4N?LL*<Y%c8o_Jfxm`VRS1*>!bY&l33UY(U?5
zc2R=8>g<Bob7N<FZ)*?1ti5v1*~Xd>t!JjOHaZ@(ZKC9wJH{H_gqk~TSkR(kLPukm
z2qjZSN#`6_+Lcw~MbCh~8tsTD<y-T=FirZ}hWE&T{@m#V*!2m{=eKaaw|ITgU3>ii
z&h72Jovq@j8;Syi6u1JF=7GuD)rQqE2*zx!xZo^G$x0r1*;v)s$Yc{a`TY8S`62D_
zJ}3D$wAtj~wF4;jKRJ)YnI2tzat?^+l3CFQ;$8D*#gzha(lL#e_Cnvv@mORAI}V6G
zgMBhfD`2LfN&}`FxWW^Iq5tiqgAdL$qGp1NpdDk@LU$cOSaIcDl$uByZL35v$-4%V
zN}A}Svz*ZA54m@I;&$kl#zQC<*=y)K`5SY-lWcFjlrKuF&+wvn#=O0?xxKZyMbmw*
z!J>Dnwvq+weB@F4s2NjbV1`TZ&#_W!rGsvwb`H!lucZ%0NzGwF!b-uSkL>(a<GO4s
ztRDUydQHhOgNLK^yXg&*TE$1qCbR2TPUjrX_7-3_Bu>EiPv-5dgeMSZi-s{xkU(@&
zI;Dk^r7OAgs&pn;BcfDg44gqvrG4}<2*IprxQna_(J2sOG`2iKaLKLp)-vdFzt7_3
zY;hdk^W-fN=K#%R)Y(V#&KAw@zqJ$o8O>`%QA`HZxwE&szC&)($eFT&GZk6sm8z78
zw&Bt$X_N~_aS?+pnRYRlVCo>P(~LRBwQ|yPX~4i$F@iC6GUB|vmP7;{=lOkm1#Z%d
zYZr!4w<4PISTw~_Gy%DxpA-=Oy#Z}@XlfRuQCf3rgQ#OPmb(&I*mx~s)QuGuu&t$I
z9@f*CZJ6XGXV%FkIv%xjEHuivY@sIxB*TDKMbTDEGlQ<pL*E18bnQ@elxg#LG{1GJ
z*ZI%IB5j@<6~(9K4U?_)$>#1h&8)Z(N`NiL>Y$m9At>#96s6OaIl+Z%f`|g_Kp_|h
zFA9$~DlTgUo)RBx#*EbgUHD5!b(*&10mnQxIjzyJfKya*C=iyoMEQHif!Uu}-MO;A
z(gBxA^B)x*z<12+^aSP@Vg*hLRTV7gaUGEP!5L6S)>tc89ip_H#*CvL^|&(WZkk?V
z!+FSI)U>4a`knMEtu|kPv06FQ@j9zka>T{IXkk1kek#HE<xx@m?tmsYrh7Yk#RF%G
zLWM>t9bs)($;b?75@lQ&Ew{FDObN{ybKY6bR3kjsF+^#C7pmmC_QD0GolwD&^JK}G
zh(xZufy)4AfH)mYsvG9F=sqCnl6+rXhuLKR!$U0ZI70W_(en3iju!d*8MhV1|2@ER
zbA4+{X699clFP%eJY~z%vDWCplm2n~iFQI?TBdKdVi&(Oo9uu7bc#Grih&5TS(MF8
zAlA(Re-mMXnOTKk8E3UqOiC@B;=)*1@w6_0#^I_AT3Km$WVLLZicW{1MG#Ku#z|L-
z5UkRY@w2!Xw1&PAT&_zNJ1*^@9{`MYV(;WKMT(*xiR)W2Knr!bWy7;Ot-4qT>YhKA
z@%e|3n-ur?{9McDK!;geW<E!H9l%Mn34DIJ*h$H>cg!#ytqjVU>OA=-(bbHuyd)Y&
zX(cy_PP|O=bd8pVsbwD@K4pMQ?-XBDYq!E;aj#E`ry`lqV)^;w`26frK0klkVyMwi
z-&Pd=cEIO58$0Wy$1Oh*aB={gtt0sEnJv9wPBEAi;oqKzcI7P--JmUD__^B&)3(hf
z`=466-1%L9_VRm{@cA=Z!UW(lpJ&1ZfW?3?2}~eN0c#sB)>7~$XjmFz!fRzDXTp{)
z@<!U|9W2#z5jkgyE62F94yKl|kxhs;!otZ}h%{)qwUuWOB9j=K07&}2J@2)-lVh$P
z-!fr->Nvvehy2~K@|lF&&$n2<x46G|4UXrq+}_yTEk1f(QJf6KCo|E}xJck-G>{hC
z^HE=hp>MXuH}wE_)^`oPIPabxUN6vK|J`$=o016YkM4gMQp#yZmjgJvzZ7Ypch<Y!
zeu9J{;hh_}2gWBDU2h$0|J`Ncg9o18y}SRc=&r$WZ*<#}$z-}weDrBWA)V8Xl|nHm
zMRbfsE|_UzQ*th(W}yz+1!t7>&T6Nn6O3yXSd?0%cbfk0b_jFhS!_!%SzGqC{z<FW
z($o!EX*M}{(-5Vv&7-v1L+NXCC}E($g*lY|i+LKjdMFJBgH(}-ySunB*16X4%IPJ+
z!E`n~c+(Lm^;YBezm{d{@BYn>mRgzmtfKhFfR?r<8ymYQQ>72E*kZYi(peLj36&5$
zTBSLlCb*JJmQI^e_^6%oHb@a77l9es#8Q?L_$<g|)RQ8ns4cQ_2sIpBvb<@a>6BJ#
z(pPBtj!cFWztcq>;)D{-GYC{n9uH(;Sh#}an6>fRXgtD2tQ06ao9zG5&^<REEa3H(
z{dmP>W&bR~@|xm8@ehg)-)jaGM_4vRQ9N~HQE)%NB{HEK&zJ#YAJ)F0Nw}_^QCu5S
zI|(7O!g?7>7g>l>P>A@5;>L5d63Hrb(LoUBD%n$s5!YybokP_T-~m-)fGj`=fv<@^
z6Y})Rx|f4T<DOz#0P!XO@oWF?i_2osZ)%PC+rO<N$ZX6%^=(CQcHS$uy}P}>m)7XL
zsHL-3n-CiywaSQCi^y1rLDrF5C%KJYDjp-NT*-oAOe@B@vqABojL-&F5=DQS*^TQ&
zZM;nIV;f-FyW4lmb-$QS`~3jUuODa3ugw|r*#Ld&+ZNq&0P&aR-Es*|oM_G=!b0#s
zcG7C+osKSgp=#ei*kXt#DifSxnh9w`G|-DhtCV9taM>G2Zr!NRYHzU_4-A6CTK>-X
z_^wM@rp>P(w?f}HK+D$0`8#Dtn;Fa>`CCQtcjjqxXLoyRoBZ#!jM|m36dgrSHdv;7
zh@5lfAr8VB%Xz@~8>#AOZ3G=-q~$C|$4t;ZSXL{`2k}eDXh)BAKzhU8<6(IA+T5-5
z57sf#+6H}Q`I?_Ro6OFh(qo)ghXA=unwir9;eR?$nh8zdbY>&3%fLCeE(EQD2{kO7
zc^189s&r*E!g}!90n$816Sc8CxJFeqkJd2b!8K>?lNgdsaU(Rk3`80zjkIUWXAHXY
zB`w`Ms>`7b?B?-OJda+?&aD2!-`>qNvnsz%7sYM!)VaH}zBi@mk>XBi#~U95ghX;H
zqSLlyMuh+o7Ft8>MIBkG8D+O{FTG_}GcRFTsd1`-sj1B-;`neR=6n{3y6^?0^Ns?O
zDbUhf?%sVbUm}+=O)>G6QD86LrlqzUR*st=^})Pj<-!2WYkNzVpZfNq_`l}uqTQ|S
zt?5p2_67*uDT>!#dl{fQtzzYbcTzbSb+8;3)>()ty(JI)gssCTfrjZEEmNoWK&KeH
zlLA=4lLE&WD}#nUhW%jd`6^mS=!XGxXOsQkT52L4vVu8oVf|DKYv-y1sK0gIL`tw0
zD+KEs!AvTh#2lf+NYFFQtaY*B(z~FT48dt9v=cT+E=uM#GpcmLJqflFMAVOHxCjy&
zh@;gtv=c$*jgpm=L>>qVFqJxI6I{b-9cB1YNFN4^qg9%SvZY)ww-SZ-X&Jpy)P}HL
zYc9-g0a*X|;(L|_g5R?8*#zNNdaVCF=~|_W;>`nU-QC*T-Y&j$T~Q42i`Gl6njwC`
z*BA+mDOBzMXlDbrVSrf5Qltz<0ED2x>y*5Kga$oklY^HIoIm?N_TZ(5{vY#ud4$*Z
zrYS#`bp^;V8sI$L{Q11DNV1KB$T~jaUNlh#qoS5IW6@f0u|?KI<5aC==>+GRS;=g)
zzJ^$FVU*D^Dr;Sh!VSZ@AgU0Eq+JrAI9QxltkQopH%Z3w8YOeKl8wSYaTx&;JCgj$
zlrUZ}ys5PZRW_0+?PIBUJ6czKsz)4|51-#7q8}UZArZ{yt}TiZ_^^No&ZBYE>^z3T
zVRS1B5Q;&ikUNkoG*lcskr9oMtUqv-U{v8UN_q0ZOAo#MyI;RFy!^$9Qhr4*3hq}w
zrzp<PYr(yZoz3aAc;Nb?xaOJyW9B1Q($+4Bx@6H~;DhzjYglYnAy^S&6C+d3MA@*&
z6%WRGS%;bn9tCS0k5M+T^3wB|oUAQ&CU+yoB4S`+pHL`b74w^r+#m!M<P^CaFjI;G
zGbyY;<>=u&Vh}AcHW&8)VqS3f7Nqw-pKXiJ>qX1=y5|(dZ_Epry~*D8&IB#rGO}R3
zw2iYiK**sK(ghxZ_c4aZWz8XELb{UaMnn~Kqm<DOLL_u_!N7__;G$+H37?~|b3AG<
z<%8Za85HeDqB;C}?vhy*<9A4gLuORAZ(6+xO7Pc@GkoQo;hU$p0Nl{<1%O{T&wL3?
zOqsoG?1WN6)xq)xA}gd*=hOJezm`n=_8JY8EU5m!h1R{04A`L^j&2#-WpEB<Sub>P
zcCMXAy!ttdx&frXhvs!d;&_1bXoRejHo7`C(efIi(HslLs3zE`m8@jsN^u`q;1Scg
z(OO7$D8xl$7JF_OwucZBrkF-cgKmM%Kgn5@IJ~`_C=c&~<nKDqx0b9AkBd55H>@T+
zf8maf*FoY#``tT8`uFkN+1T4yFN&uDuSbY31~)<Ng|$w{lEHiza`Ti_qHzk+N^FSS
z!_?Mj&tnt|lZMM{Su_qCzK(nd|Ia1|mj;;Zf9}xLlKszpwm15`oRG~cK<OO=h9yM6
zyyDR-W@`a%JkC6CEX?^e<TYreB$N(NI=b^1l#cH;<!0o-^ieJiLHEbU47Gjh^8;LB
zUiDWLe(dS#m4Bms<p;%E(g0uc9Yyi~4PN=~_U7iKc;nNGVtyP}J6j6jw30CIi8{s@
zTwpAe5i&Sp4as&<1}&{lftmdwb#s!%2F4N=(nxJ#WfuTLoj0Oz`<%X+%xwtyhm_2<
z*<}A;4rVR;UwH7+L*M??nR}MRMt>peJ^&)imo}aHj-og-c>4qn=sww4>0r!0=opkV
zAqLO5WVHl`khamna(NlG4kZt+j*!wI8}D2wgBB5@^(z;pQQ&`};etItOCl-#IYp!3
ztc1o)Gh)ikK#`LWqT`(OGgp2h9kfTiTixf}K@!&g@;Kdh+tM(>|MNSF;?_iI8BDka
z$2~A7>yypxy$xCs*HYHHb~ZFY))q9OY+@rNn4Mb0;K9Ct^dY0Ib-@MYg$8qtJFLo~
zYG!x{F3^g2k~m<9=J^VF5)cE*<4ER2GEFlm*~my|=|QP3ON$V-R6t@CBe!98Noy9c
zghml=XyxL3U^dy0r;BWxLo+4A_R)O(AQ`rYW^)zMOhsyNgy>Hj!R#4>U?_DcMR5SO
zI8&xs06BQ*ZbEB>G(A*&saLv{zQ1skCzzcKRI}-R?6LVw58S)GEZdo0c~(O6qb+T|
zxA?)Ly9UQ|XinF6c6N%Vo>mlB1ZE<}6SiY|F5J#2&I3p|%_awr4k&K_%ZEnv?|*p?
zp`-}=2>tyby<H`Q@Q+1fY6LS$ILgzW6oG*<5&t7Q6lXm&$UNjsc>R{FsB|$A-H0w{
zDguO_a|A+tP5R{*CWP#roqYSrgwUt&EQ;F)6%N+7x2Ics#X8QE`b=eltz9DoM3lk;
zVDM0pq>9cl8>It>hK@oiQOa0K+tfZv=3!9*%2FlC-;O=Ec|_WSwj~Cdw=c7NOQ_~+
zsEhIU<aoBkru2p0svu>TF8Z@?KVp60_SUZd*%r!fCGdlH7R7%*WXJ94_Jrn1N1>yc
z=3tx(C|hM!h%wa4faeS1@gWA?#kzDcYA=~?XbMk*{_Ab|_hb>67#5cU7CX--`~T!P
zyPhx2*8QhFJhQU+{X2`|w-)f+S>H&@SJ4^IY8AZ-q0!noskL?5JMOe<g0oKX;H8bR
zDTS~y#OOi@(O4NuEu)Xh@nD%#9h{>PxmHQ{676*CnWvsmO(`uB=AawuNxGB{DMu6j
zQ506Y#Em92rb>b3hrbmnanRqe2DO^t6#S9<zi^!DJ-c-E>Vqws-6S3;_C-Tc*iq*d
zW2n6d&dJ)r@)e1>Eb9_#bkc9x_cM2`{b<j-z@BVtrx$bLmJP#Wsk7(cot;~`f29+W
zw+?3UAWT0qsBw`5C%SmA2*M~s^ikB1HtZ;gIS&{J4>8F_4p6MpT2r!b(8Jc7l_Y{C
z;g_Cqcsl~y@c_>gEC-6E?Ed>b0?q6WBH+0T1lqFuX=NmnoP&#8LgsLUUfMNU#%MXR
zE-7oQ`A4+623470C57g;tziNjnS*YTm#HubayJUWOrAu!^k7sr)SaG&EzN{FI7w`*
zX7c=n<Cgw=B?D(=ySVyy&UJb{FRuQ`xuW>(q43<@-Pzru)m1Be<lc)2N$T7S;fz#X
zYFJunw361A7<3vS#wQ90qJT7PS;oMi7D7~-N8iMlX3RJeSf)mwZo&#RM@NXbn3O=R
zRsp;<3h=?~{46cW;Q(tBC=c7OQa21`$5}IBNo#*N$6kXaFLEsdn7_Jox#~CH|8w^&
z$qT)uB~1V>^Li#t0Ql}9X%d)}EOs<#f<Oxiv=CzrN=r4ihKM~(kcid@hE?teowS&T
zq>;=?yhA)s)PqAN%*dP2qbzE3<ksFKVS#tMuPlBe)AJ&l%QVsIQca}pTC|hyxT`4s
z@vxlb#&lzAgESErjpEuVp`|Fj5Ybs*#!@O5R1NMLU3)0W;et?&wn4LCA(jH_nMB4!
zG=f<T0Y6=u?NT64&@#QwwYC1;K@Y54NgUC=s<u1%hXBua9w#_QO9dxjHx!(J@x@^Y
zM#2+wz{?2PB(B!V*eGJDz4B7lLdT$a&`L0<Tv3ZU#M;RwNFSID5%a(UGs4HfLZ}_{
z3-r>i-lXv*qbRv}+UtX#OKzRC^=1Hl>U`&M)Or0<>U`A_>ipnc3$De<WNWgsK})x%
zqQ6EnC|ndxX-!bI3W{51gJD88+8n+RH@vvC*MQ}7*Zi5xcD#V=Z@B|Wlq_01(SlzB
zb9Pxq$+;skL%Y?E5B8|DtpIY@f;X~{=MngLR=dEVI-X=))XsRQY9vIXn26lLawt>*
zsbf@H)-eT0qMpdQCW$aq@7erGObADL>9O}YXx+uMj?zlb-1?Fwi<<U<nLdxjlP$G7
z<N6|fzG_kwZx~G2r(3&Q>r}wjf~+)LGFN)Y2!}#N99D^~a;4zTamZ|9l3Qy6<l{kV
z4)`^o(2-$YabbfqK^fOc&*NJ{GPOg$lpeRb*pn(_w(l%`wpkZQ$W*}=BByRS*6!?w
z_?%n$S_1F4T0jqq*Y&Q!@f?Bf?oGF*#TaUdNT!r?jJYZV$b8~5@+fPmYS8*jmP|F#
z3Rw#&9FtdCm~e>DlH0S%!OMmovJW4cJ-7eixzTTuY11G5&kWh~3P#_5r&qltcEZuG
z9WnZA`lHWQ^7>@aN`7Qg6z?0n`|0*%cb(D+tYJ|=?F8nzX3^EMDSec|aiJq?Y9&3?
z_+yR>=A@~k@Y)8YxltzgXzNfm3@U$cl=W$HfkrD8^Gfd-!y>@>K$l>Gfu1Zha#O7|
z>;REmDNY_{cqqJ6Z6~Ud1))Be?}<JKV#_s%C4*TE<_we1AKecing0H4dT`fKR&rl+
z-m;QiiMu^8-<iOC@uVpJyCImH+glsE#d*kg10dzmK*l40bO>b7f!sC6kjjbMvL~e(
zOS&^ZXKUAJ=G$zt|Nfy&LeKsoGD){e6JQ%U7XaSx8E|ES5|!qapr=jXEL2fwr#aPA
z=zD8O2%|3{4U}$l2rP{Ks<iI0S!uriI2nJ#Ql$wBa%hBrB7Nt;S<_RR(K1y>Cz_H)
zT?fykQ${kViEqmgYwjv8q=k&vy0Y9U#dIB^us+xy*42?a1sR&GIZx&WIS>%NJgDqL
zNWpb^pyQ<&EVkwM4*SO3Qwh%#D9w3iO|HN7NRL1tE;?cQiXnlL;C#moMR7Smo&|3x
zO=dBbnhFpDUNEaC!H7#KBtqLj>{Q)7acK%y$m3659{3{pYQTYz#NM3#{P>}M)%}lW
zPBqg-bL_vIE)+(XY)p42yJX&4?!0r#z^)6bj8@oaoia+54$`2R2+YbTd32Ep6B%cv
zW6m@#SRNZ^qtLZdmRrr|0<N=>Ii7#p?Lj(QYO&_K{m6yBVV0gr)Ji^S@=}vQ3pMvs
zWGR}sCoym5t+9+JU^nDcz<7|%-^Y_R(lNy~li*5ioQ9f`t_~#^uJYhBXq>E>G9r|r
zWKr<|#ml0$;5h)pR@aWzkVtzLd2y8YOP~}BbnK@>r4VzSN@LLq$PnfP>v&<ID&1Uc
zOE30rpLDt~kLO(p&yTfu9u%SIuEFsfo*R?h-M!-cjS#=UA?1Vx1qFC(4Z4JBE1i#!
zN*RL&+b(D(Ac5aNGN~4!X*esH=T3&u7VaF5rTyq%=rsa4%}@zEIBHpUIVmsRSwn;P
z2ZxZ|c!X@baUQZ1q|k@#twR#Lx{#$YwS(&q=BP%CoB6Q+%|qHUg3MO*GiK9+A3O@O
zek8(;ZG`B@dr~mFp#Jv8LK5U;V`sXtK`}?CV@&D1tDS8cqqvevLoBz09KPVxVYc6r
zODAZ9C#??C&S*8c7pO-zQFb3BKMKVzuN<3c_VhU)RJNr5RS(W=Nq=Fw=po(MT;JN7
zq9uKGkc7G&hoCvYqt|+*+8!-!JtKL}88{&D?A`9Q*<}AGF3*zw!V-M~@D81$2};x_
zR|Q7hcu}8C1vbENNu<>Su%|rQqc)(ZyK+c7EROk@t687?#N}Diz|})Mf%^yfxtY_k
zMkdI&$ROetqDpzNJXqBz$&_`fY*?+NY@`y*>(~Ta%PK^xEf1v)CB)$?uOJ&rm?wMg
zpO)no2RU9Kv6Iqr_vMqeY)LQj`pLZa7dI9~r{5@w|8c<Uo0IA829?&jQWRsfMms28
zj%6JO{_{gxM&$6`NVRjagJz55RZf~5rDG0XOfn(Wc28%tQun^omDYN}N{3t~&m7JK
z2;Vng_k<>dGeMbOBTD9i@eCI;Z4;EYHu&IlUBy}{XS^(-NCazS41q@mo9nsY6mz9h
zJlfjH^E5*yia-k)7v3?kXsxH8J2B82l83Zb_4h$Ns`WV_Gy40Ei)OlMX*AQ>&CYPl
zjQ;%1qJTX<=8%M%G-QZ7X-aJy8zG^TF{Oa~DjzB<ZNr3hp$^W5AQksj#Yz>l7hY%+
zl#&%BkXKC%m9;Fb(k*Au+~zSM`dZRDDUmy*=$!Vroh#zzMXjqRy{(q2;b=D5|G?>H
z%vq-cc6r8}bvj_YGo(zy6LfkAhBZ+(8f)Zxrpwy!Qr1yfRR>vf=536OH#XKya3ZL>
zw6KeVtsA8pD4VNI@K$mf1=!WeGS7LAxQ$^RJPncH%KJJlwuBDmSrRJkw@%b>H^dQ=
z<_C_`={GIa>3_1-iOfuzU)?H-_YX*OYh!nBo4h%yl8rJY?4Aiuom$3##ciWCN=4XY
z$LZj?#9c8Us8PD;85Eo6!Ag*rg0WH==VCkO&aSZbAK(RHAi(qkj!o4Xmo)0v&2ZX-
z{)H5;Ow-bA9YK(CFEnK~*?;#iRs>nSmn+u~w7b1kL4X}ic)A5-!1!guEyD;;3~#t%
z7<p|auN9A!L5cP4a8pCT5B7V=oHtq>rFFVYkIC7Su)&|{BuUkTA>ouh(zRjBZ>2}Z
zc`o#v*>wNiN2#DQOJm&MvfT;JjOItSi{j6R)VVdC>=Bw$x~g)nf=bK{6lsOS8Y<{8
zE*!XWgRE7kn5+W?i!}juB#8p|vh$t=%XH&{jMaRd=96jm%WZ2zSi9r?)kK?re7d3=
zd4=YEEt*}ur(&llzBEL0dotPBp%og?Jy4NETN6rI!#)uZ5*DoGflFs#J2nZaKM-Rl
zP4rEWQ8~p*UaQ90M#M6}0@mTH@r!M-Uy21C_8Xk%zWG+qM(79mP_QyA;w;oET2>^Y
z59S@M-~=+vIw%WHAlKW5j83G95z3bdosZH*h1=q2DIM1Vln`wy0>h*IP1aJ3Jf#HT
zc(b4+ZYIMJLz(apx1>w+!_2x@(6Wv6$H(dPzp>PI2SUu&W@dFD*t0`Ci8#@AKP70Y
zsbTvQWvo&GBI}KIoQX1WN!gp^>85gFsI`+vD3cAf1*{I$+`XQQcbD{Eg61C`$Lim*
zRB+z7x40Pl{=K5u9ypP9);G6ysUVDTtWui8^juk?YaY4dEJQemfKXOemO&0YGuk^S
zT%@hltg#Um$ymdn_>Ped%Hrmk(2||4GJ`_vN=4Eo$Yn=$lvGgLH7jO3#whT1{fq(>
zDKfXcKua0qEuh_vFsxy8Y(bd*yz-9LB?rjeI@}m!uP8n;n5ZW-QK_p&f*&bpscI9Y
z(cF0p@!wo=7DAAD`><#sqAA)SnT5)DA#xLJ10GLVJFcL*8;+WMIfN_)v}2tXC}W4!
zg!8IQl7G~j))86)O?etEnq+A8T6=xr68%8SHf?sb2<Oog%jYCGpM7_yhg{KJgX4KP
z_cpe7inBKsMFZ2F$bx1iv&zFZY^8&^0il~1OBosL24GpE1+4W*6O}a91p)I=r?8gE
zP^&Cb<`P<+YYB-#OOG8qeQ>%^$`UeAzxv>%hpM0c=5v>%U43;9tdw@u2kVCha`Ebd
z)rmbcDJcy%PsfSRNQ<P1+1~mAeK(i4LmEuqUV=*3h&nrq5x~0R2(bDYSYORz@2&SN
zaRA*@6hAYhtexG-28C~ilEWIQtQ{1pgjyBRDxs~aY9&E?Kw%HfAvlLK*!Ed6D{Wab
z*cSzgf4Crm_s*GRdnYW{3x~B{*ZiQ?dz)4Y$;U%p^JLeg9{H6~Nul&S+B@NGr{mL?
zWCswSxen!2I{zdf{JJ6O5t^`h0#s~z0Jf^UN{~7s&$k#7?X`&XTE4U}HAzG_xK5AA
zC@r>uOk8N0Mt5e@{kI*pmCsE}xAJ+)0F{3C9F=~;6vamdRJymjzBQrj49f)g#~L9V
zY}NxA0!BkDaScVPc_kp!5b77ht|(9m(?y6&;ZiB5EDNOsmqrQO|16=>D{sqSr=x^r
zhj`wOk(PJxHkB3o&C}&UXG8oGEt(yb0>aw^Doto&@{onSB@Kkz!A_m9)47EhWAI06
zqhQi4oUB!3w&og2h1N<5p|rFP_DOBDbEV?kGxy}epfm6F<1O<i{Uhz1e$toPpLE<m
zJ9W#Z<8K~kh~Kibl;HnVchs3p$KS0NJVMj;jg9Sf%60+wIWty4@OErUS9|X?FQcw~
zEg|Ph)v@+o)|Iy4YjYt8)_{}4#ZWqD919M1@6Kg`=f*!ilsDVs;a9rYJ-XfdR>%9j
z5+M*7llEssw`HCzrmxK=`%fKbIrhu_+|jZ+0C(#;5ocy~0C?aI%3xw5)|Fz&b<J(<
ztp!#GHIdm3H~1W5jAJ_?B#lB)&51o!;Nu>a`9(l4>e-1AI%vD3lR;bU1ZO#$?mu-D
ztM?bo?ray#KGTwB=SP3jeo?H?&uXUYu+J0~2)-IHf)X>1&DoG?W6w-uu<2|XH`Rp0
zKD1>YI(L#QJaIj}0D2Y_Sj5nuHWmhJ3XQ7~l%}8;hasi3*)38!)>2E4f{D$yteh^P
zpe^my12psc+I~^|(va7;H}@v%$m=HtQymFrRC$xW)&Y^`c=!(|aD8(d<JB^k(hEo#
zmza8q;ks}i!i_Ne7axknVQ{k$gt7G;W05qR<YUMi){lXh-Bt7m{rGVu!2615-10gA
zm%+@u9`}plD+4em)2*#NQtQ$h0|l&^l(E)QNG2_03OZK?RRvuIXP_{!WYR+{8*D@6
z3>PkVNVE)&1r=*;Om&h%Y5mr9C0;(N$J}X4q21X~^`YN{+{m-Z{=Yk&v+`T}Xrko&
z$>L2#N3Jg(O3p-@P{ZFEdkQpphvt)zO-=#aNIRI3nv+<mA!VTq!}!uH@@N7CB~oFZ
zpVS){Dq(t_D3$i@K}1@SN@WgjpH26FykD{DQx9Hx=>E~$my|i&|5PTlH?@S;?S%rO
z@uTzZh%5$iGn6A~N5CqA4IWk?SoAE^N<n?v7(|Fx$y%AnZSBAUIzJvkg(?{H8vMlS
zm>`<Ev|dcaU!?^VQZ2OaI4<X)(*i+CCJ@%c>}ZH$^66IZ^U&FMPVQYI1COruyS37>
zXht*kWZ=yMLQ8D><c*NsE3J+JHrJE23}q19s9H!VA|$2-CZm!~3D$U!(ZKRI3#E{*
z3Ywvqbc(WVDTz6i4}~fxWQm|MPGOR%q1(1a=_Cw$MAvykae+h69>Jk!+xURLYJq!i
z8z1mwQxs5GY!0~Ht;uA9@c~3a)^ZtYQ$n6!1kIo-rDE;46R^>lU`jCOJcIOdD>!WX
z&7Et!GAhK<H%zA@o2l4nR;2hZEn9(=LuJ5l0|*fJ#KdWr1gZ_ON;OY#%8d0Ss9*%;
zIAbYf#2l+6BjZBd2y~jmRno_6P}~=O>uj?B)#2tw9FY9ckH7k!C0V?@)f`W2IwZ5^
zxU(sW|9Pl6wkEqNUH^CxOTl2l@CXonQ$<@26RMW8S+AX@FSYjZ^t1Qd$fra|<|}|G
zP6&UkpF4AkoakoJN_?~_ihnaib7y;FopPeD3_xvQiI(#aM40n5&i}}*)g8SnUF*xj
zVZ+O5&6>(h<JI&VvroKzO8N?Qnoaiq;PQ$|043%LcBW3C*yj%gWJjI6<`Om{Wlhbw
zGh7)7BGHQw9BysGv=p`!Tw1Ry!=tTg8@#RqlejWtjjdy_Q5y6Uv|~vIF`d)c7iS}<
zYV$hg_(o_a>O95DOs-M9pU{LF#Tm_CZi?dCfz+I?Z%wCr#XF$<IaJyL!oF;xj7@Yw
zGOUOW8+0J5nL0BQ_vtay=&=dnb3aD76hrcL&|0@$`+K|Aa{sd3Ii7Vy*u}F}{wN`M
zZ;$66=v{m9kmt7{I<+XShrIohL5Z`3Z1<62l1vyVkKYc0>Yh_AnS>}6^rRZ<lyn0}
zv&q5p2c)zA)S<KA{ii<PAL0(#DKG9_`=ucp(h$d|7-ETkBbHOkR$K}n>4FK<v<&P%
zTAw4Ye^Qz-Ff5iaf)|VtNsFWWxkJO>_kS+$*ZZ2DD*TBw$`3U~@#VpboKChk)_01t
zn38=gBbX(Oz-lg;;7qF8L*+1b7L%9A9%xAz-6X3cN{u3)Nvb?SvredMGair7^iTHv
zDwaE>a;_t=I{+|Bt%8l1&_|EU%UH=BrVg~)j8jycd902qcq;J2XNLSBfgdhC{qxUV
zq61|svq9fwqqePM(G<ns91;uZK&)eNVv+156n-Z`f~!hUB)~DKoa{ssqHO)0<Wb9f
zkCi*Xf<iXI5~k!2PCe#+%wHT_%d-hO9rhU?YU%#L<*lYaxkLkk_4}uT30Jao(P4ig
z5Zw_Vs!9cEqw>Up<Upc8*wxk18Z657hDUXj(tIoVh+2fuJA(V_5oMC^32Vcn9|ZWA
zv@BWAw8)E-vEEA|ok1aHh~`(1E2q@2o>s1W?@FibNCS4$SI~-oH9#|Izz?O3K;H2*
z*lc!=1cw~VLW~l&;C7zZP<hu&U9-p(&c#G+8z%&dW#s%kE}Q2`Ln0@fOVVhQ(-db3
zSSyKqP}I9^)H#>nZRsvOqwTsnw92v!>AJThx}X4`IRLf4R`<+yGWMY!iRQ@|$kJN|
z44>pCOvaAq<CX-2lFke83b>{MU%!F95me=P8FYY+d&zTxJL=KW&MG(nh)Br|x(2!N
zc{^k*iHp-KB36Q-JEn*S^ef;o%*9aaj_^`)<F!bFOPi!Gd1nyG&n!jqM?e0rm#nwX
zp4mE?*Pq{Mptfku^NZqd4Gpx-jrA!_9~KZXh6!zTlu?yYvZ$1ib$}8<%*5Cj2m<kp
z<0fjJiw5=_l}1ae4h@qV7+xoCb41x|q~ntFzeku|S9n_~aul7+Y~^(Gha5cv;0>iG
zfc)7ZzZ0N1e{jJ|=>a394O6w}K3WzWQzklTgN7ZEg<`@r%IR237JQIGa18}jMZ~pa
z;iF;XjibdLs)k0xOH-#T;0O%j@)m)9Yl*b~<8oSJ^d)c}?HJAUIeEb%eg5hTisG3=
z`rKJx-=V<61yEwit$+ebio+#sn6yka+{4Cm(F!Ll^E@<>%hEykJA9U2gj|m9WbLH=
zQd>a2wGQZF5}D~er8k-4OJ<YVtt*$8CjHzOEcy%H{DPwR*wEzM+S%HrRi&$grBRxl
zDlT@|T3F`JdlwG5OI$FFsZA9Di3d)!hy3zdvu}REqU{M7-!)W4geTgbP;0|z9hs29
zS=KnIbYq?Jnsb3`@LVek^I2Rv;(>u74-mRC)RJq~aA{!uZb&&TxO6%gbJvz#yDBX!
z_75M&?l&)GcVO)-Ju`KJ2>8NKdiL1;<Y)=*rPk5}*xd+sGgbmN?{TGODntjhW^oW$
z+94gKhE7l{v`d93QKge&rcOP#O;2ABTIzIq`FZOp*F3X2dgO(j{LJL}#TOQZ7|73!
z>F&<@CX%NLCN|oe;7!v6FKed-Y$X{RS2Eb+!l~Ld5w;3%LS&`0L0hglq-nwSSf!K9
zDHEf=;@UcW$6nXEw={PtQG2K3@@vQ!@*AB^W;d=}Uii(wdf}q<tZY&I>w)y#*qrWc
zQ*N7TqJ(|<xMogrA!96sh$3jlRaxssNF^f!8g?a*upz22+UsDX6`^t{RymW_iICz&
zPSujpbNZI`4jtv$5GkRAWeTSNe*g68v{5EcAjQn@nLL4H|LuU^`{dc9FJqx_A1irP
z24g&nj;Yd14)x@nVhxXFY+zNswi1P{Vj(g}If67pm}_?Sc=|~i1E-j6)xXYAM@2%t
z(ihtA(X8moTITlS$#Z$m%YW#RXP#_6V2k35L-O37Z0~MRApBLp(fh!KAl;s#3_4RL
z#ir*h_Brv0SJEPJo`vEK4F8_%*0)`8?w-ymmZZ&+$5rUGK!9;3v&sIyIb9rl*6hI7
zL(LBSUk)^T;&qH>IvF2x4tc;*Yi*RVm5aeg+0<M%IvN|e)Y4?X4=%lRGxhc}Bh#Ms
zFQ2Flzc4<TX8-i*s>tUo)(3iM<}6kq*U=D7B2CC*y)4`W#DW?og(-um6r{&!IbMTG
zcWI#$Rd*f{h0ws*!Iz7~3;x}X-;p-a)r&+5`^D1!rnsRanV0mqv@n_*8JzHIR_9Go
zLUJi#>LUty>Q-p}d<$i_sB+)EMe$t&oxVBQ+MSTltYZ+-DDSXBR0xc@hHF#TT-)GX
z2vTqe`wTjIrR%70OO7(cMtSL7Y-EI>2rZev936B@d#K0xb3d{I>V2XkhIZe$=2)RQ
zTI%ooXG`Gxx-E)78w$<ct*r#-k;G<=GZhq0sADWSWU6y0a;a^oCF~~}Obin0OF3EE
zNWP>|Z%{|Hya)=8O#Bj%D_&@~D^KCZ?H4+kORp_es3-8Kynkxos>;s6*<}BBE^phb
z-(P~}{r48d%?rf2v%XFKkIR8g8rX;rl8dlvK;yXPOtc2)5+sM05^J}2@_ZZ%F=Pmr
zJSQQSrT{r+f-ciPJSG<@z7=5gubeI^A*1=(C1_rMZ&7@3fi`#dc8NAm$<7rl5e3*(
zTFIt~kqgU8r&!HO+C(f3C}B0<x|`jl@zMA5)i!Rlcdh*m$x)Y&(+3LbHK5J+9ET=b
zissjsXmsFyIn+TNr(ZTDauuYuGI(Z7%#GtI#*WeeP;-!(LcYh$kaHtWV73u>>6~yG
zUdcz@mDEVMEH<>(hMc}^W#i1!jUs>!Gn(1h1bW?HKoe;bH;TACIy62wNUHG)E2XL6
z>L*c)Vsapjm`D{O6T7L(cpeYT63o-|1Rwmv0+HFv-u6zdJs;i}w5-$Lb6hxWzXa~M
z4V1X&-o-$Pm)*Ni%xAi}0l9C*ld$t<Q52uMt|(4If*o*JMx|z68ZE5VLfew#LQ`b*
zd<}q8w#)rqYNUgR&{Ng|Tug<9-#H956BH=%vSr(Eh}NBSUynrdbRDdx|8+3EOqLRD
z7;!8kODi<gRS8O}z+Dr(tAke+thiezst}ECxQmd>mFJ5+MKd?6zFgdT-x^=W1asb2
zT9(Xpc4;!#{XIb2owaOHd}RU9R=Y3d=W58$N0OxtJQyuQw|GPQpyt^ngk#X=c}ZB3
z9W%%O8Kv~|B)~840MQXcrvJ*2JTY$JyZ-Fu_bfqEwkd}%Yni?yPY?uynp2tSKXeT-
z{V(586j#LN8BgZCmMPPNG*gy}G=px(*$sFAOaFLTTRo<4|9-D!To>5>wgKDsh3RcA
zmA?JHOr@PL{rC43#a|AD>DF|T_chV5n!zJ(Y62VZ)KRh!EsLx%3Zk41>?9MI<;q4a
zRb-C)5TWL<h$dJYB&2@Wh>5Wj8$^37;vmyfI;MG~Y81KTPsRXenHXWzi*=wo6lB%n
zm#UoNZap~)9}BnRuSKs1t>05#39eFu)%4%?(Y(%Fm|X{Ge&shlbI+0l&g)t<|NVW7
zJP!z8KcLbcno?;SJ%=q{WD}wZ&d30XoQ}aJ5nRL0<FI86#3<EHR1mGtN&(B9FrR{4
zLm3-u8xC<E5H5}}G|4efu^Y6+4p}cL^?*{4AOtN%=wsA5S_!m>4$n&IZ*U1s^c25F
zShYzDxN1-DgkDrftC=<z(7b7YW<OTxrj;h4`N@}bwE2$`nxB2iLWR!h_WIUjYYWqi
zt&dzsFy*C%8pU1(ExEK>H>PC4awmk11{UMJ<|;baTGK*)r}j!4BfNtxJj7K>hV7n^
z5XmiDVkDVU=ka{SAKW1r&0yRfZP{)+sEH1l8FiWnFdI@OU^^(!nQ=t#oR6_&utR1O
zOIbUog7=}}P?Sb9(?G6Fscq??XaEwfgUz){9$*U|rkrG^(T$9;Nko%J)#?+%H;q0y
zhkAIVUiovO9dh{utuHa3b@E)pysqJ2)-4ldP4p`-Td3JL-QHT?-Jv}RrDU;$2~jM0
zj10E4HP-6VX-vzOTG!EO2GdbxqKd&uXBw#b<spix(k1M@qZ@WqwWPywN_PC|TK5{}
zuceyg!@puS+5gaS>l$a5t_lNo!*vb7cu=!1;|bL1ATir`*hrl_3FYR5^b%AM$i%>`
zg0Q99tEOf=7-nJ}q;t|a7P&G(f=Qrj>s+1KFv-bwl<kOVy1AsK&U3vFyK*_y`JvNA
zB<66hm%O5rnK|6+;a3#JFAS-3dv|wt9mBnBv<=sc3GXzE0V{j45IiJ&X{dV!adHY4
zFtyVLxSKH>b**E`l#GosM#VaC+H2*?Mt>^!s9*s0r*!7_x3|SbfG&ULbUO`ax_s3u
z7MFwH`--CY%n-?)&FLOV%@THQ(;W0MRFz}WS|y|6kn|ND#48938Y+OAfep)iD5F=L
z#ZU_YnGr@o`HMOXUB}~-I4Kr;X4W6~2$mr0i(<Ocx;Il+Y<oY=5fYjoJKb_yMiXdo
zsDpr3zi-IpM3)rnbro=|f-Jcp4_c>NFxim6=xd?$GoR&??!0}%mA1^~3$%H(4$3P;
zK!;go<`p8K*V~6^5^drN(WwCAJQ@=<l%rFnv_3+$W2>xnjt>#a0dw@3Czdov%9GI4
ztNn2m0nJc;?6?)8zLBX{U<C-;pZ?vB&Hq8sU4!F3lsg;i8@toubC7o9r8L$;(2R;r
zP>pk;(jrP_v@)7SUHYbKROKwVKRknd9|P1_kyhzoYr!D%f|zZw4NS#ljkd8$C;j<O
z*OFMoJVf}J*MKofb~Nf1L8y2TAovf52tM-&)_&$Zf~k~6AHl~5#Q4{MAeMV0!h^{h
zu{h}hQUOVB4XExApmVswVKh0soX@GS5@fcl&!+qT@F)cP?%`*)?%}Tif-8XF_qOig
zzx=yJ@rDHiH#c|4J<Ll9Fjy(;+F9oX_kug=y{_t}=2oC%L&;J?E;om`Sq?}WCcTJt
z<laf9wG8E1D(XdpNahZGrga2V4&!;c!f24J7-%9$6T%*-?NKuB(}ng)ZJUoDqriyS
z<lyevbUz*yi_njYyQ7tjf7vqhds^8D0K@STFhMq65fKLMv<4T4i#0B)^%2N*4%6W1
zN7n(aWG7rYnOh7`Yh}Gmczh%gDo*q<#~$Ocu{b4VX5mz71b#l4zt1N7@i^Jo=WN~X
z+W1#J+4#ENwbv~|xx2H0`xc%W$-|`J)oZiK!7B&jAMRUtX4$Tdf7O#~{m~D0ZM>q<
zANRMXb{IhI_0^?!lT>>8Y<lp@BSwE+fAnqH)mL@zJ{xTNuPTZ^9KQRV>FyqtUDdIQ
zHFKdBF~SZV;HR=!0i+CD^I6qMp>1GFaK<<<nSmZk2@l3y<cz73LwCr52sXOGa_S-N
zNIUpMZShy6;OESFog{>Ik_EuN3#M${*MJ<kue)GfK=|21S|T)yR59<3P%+Oaz()yY
zW2`x|iYbUOXPm2&L0~j&rVt~G;6{{?{{T{-aV3-xRz(+$kg)$b>!wG;k+rRp^GVR3
zs<kpa{XG74W$W#Cj;pHPpYD!VZb-O(>#Muz?(2*08XV8zy1TQrwOu@QLs8U0Lpn$m
zLqLzT<4qYk><teE^t4h?ro+VAGpDo<N?TVqC!)t;TW)n6)iAa5;6(#0_CNCArH9^h
z%`=wxp7%d8hs2a7SbZekHKedBM50ePq?Nh=#5pauczp7fCD(CC^lN>7<nFvw^qP()
z-d1$i;JA;(?#6U`XS?`i*jh|+#bB=oNOMz_4zMAhcC}Z)Ts3$?tc~ERlV$YW>cC+w
zqtQkL#iDUdG`_2DJU5i%Fu<bx)T!-DpKa@|ryHr#L#D};pwY=I;HWUv`M4kiU@QsS
zzHMa=y;NSQkKUUGbaeI6%aMU&tYMb-4X1&oNe<XJJqxvOVbT(KK1a(xopdeV9<(`n
zO*!h%o=x||QRppSPWzSDE^1A2u_*pxNPD}xJ9`v1<4e$lj2Rx4;z5_8gtDkXHjLGr
zN3W~E8mPe(J(CS?d=-P^5Hu@el!hzCymXo`S+rPc7aY?TaXSV`49mflvy{b7S5xc`
z(Bhw-Zi}|;Y6s*9%}#3q!Z!_Rk<i3_uwc_B*j84thQ;UtgA88<`@|Z>q2R35P**=l
z8=Mpd)`Fq7EQHjQF(|H$bVjSF#CfVglZMj{wxhp{_V_5cyPGFz$MnwH@=s6aYR@qq
z#X%<-GoC+iP!zWe0uJ}KHug3sq0QEw34?naO9u(T71Iu)x}X*>^HC`o3=fXSz-rb6
zCY3NjbFV!ZCb6nC4~iRCnk#n?FZ(jF{*sp7#To|Lli)}@N5L0xk8i5eG@HzdV^fCu
z5uE4R70T~DSgdsQO9u<vkWBY>rm(3Cc-wU}Lcz+dXAR^Fl@anYw6ANYYgS2KNzQ7m
zbg5lsU~xUTriwL>5Sb98GIbqGmWt_TebCLj=hr3joP2d{G+M**6RC(kR8vcf<LN7y
zl8tNQR23D9xKW)FL=D!8!BvEl($!UPAYr9>$buh_M(rBq+;&6Pr}EtI+Q0dEOA6P0
z)vk0zIE;7)V*T<URwfZ9%-ILT<&-Er_hs~rY&7hqQNvE$KGq>dFGQuiVp2I?dd7s1
zjpDB6-Ye0}<Cb_k@!H(Id2Y`?(-A4C`A2lOmbBHQd3*^?4kzvXKrX%Vyp@jrve5kT
zqNCHn{v(+tKZ-l6E&)s)3Rs9rM;mLV0~Aa~acd)gas<H_7$iW&QJ~&Q8s!M`#i>{h
zIw%q)5?N8bHF9iFy_B^SVwe#{!ozrk1YJ5vF-E}Q6nC)Nmy#bChQ!<g06y!V{Kw0}
z4sKbwF#-4=J=96b|0Dr;{-FYodjM{%Pj+^vyBK05O<<*|IBe!n+geA<z#IzFSZ*-G
z#Y=BvWrX0ua_gjsxom9TKXh4bjw%-#-Iz@dUOONstSGtMo3{U{Jgb0lXKOIeDgd9K
znkS}=1<WdfG`!-b;!cYylrcap6WD)FIMf|Zt5`+fFk@8Xs>qymk~R`Z(mcQY^!KmL
z`Ml>$kG%YsdmP@zy;;SxXVZh%_L2DH;q`|;nHN@PJ*vo}9<IUh91_#%bc2c-9`Wv7
zZF$r&>Y!s4tn?AQxJrvs!W691-b8C_%Y&|&JGJIq?0dww53!0{8J5qwb2wPGd%*2S
zB$3>{GKXZMsXmgA3~B1G1If;!qaVr7lT6hGl;-+g2}dsXRrSkWzwIa_`^$j0xBl6$
ze?QUWHN`>kn%=dS4ry|0XMMU}y!LwXf-9_oTPrS%6gEZ+@hQwY8NJqxhAkwOWjYES
zY;g2Pju-#-r6W3ZLyyM`hqCDx=ZD`n8h&wZ^cyhxpPU>0KO54~m5u(m0bG8|Zy!dl
zju`zl{n1yMiyrPw&;LAg(ZfaYLxXoenNIh1){D<U8Ds^?SEXYH>Q2PkDbEeBBkYK!
zSZqukD^*)7R1>%tHU?cwE`04&BSW3YAX9Q5m;p$6HKPq>OT7|bz>id+8%u4&V=*kb
zS2Yj6z6OQR*2Z8*08VciQpwj3&1BjC`t5z3k{2Ja$=BR~N)k?Ax^kSLTpVt=h9Q_V
z=<LJ5ZJByQY*)~*7w(Ii@8pYkno8BsGv{SVQk!&48!#Pg&ZY-%=_B=x!`C~$k$o4x
z^!i2L#iw6i6#rsKQM;SlIZV)bCp>o$_UD+3T1jUDGc_w&FeaMFxZ%<;=^!XGHcA`E
zxv!lLLFg!z=1PbV;_`sYJ_d_zby16D%A;@LI@nz7K?`-#13-2(lZW{atQwPp2|{6E
zDQz~{|BcH_l?3Fn8J^Jugw2rdIy9vb5TY&{A%ldOwhD|z=|Q>GJV>T=rFpcbj?Qye
zGGncExLb*EIyxp;8EYSP1A*1m2%TxL3<x&)2)7YH3}6NbV{Lc*>{$RQOgaS+cOlma
zwcI-2+;J^^KilNTqS+JmvpZKt3C+Dni{jSXi8h}|Xnx|+qWIGRZB94$wx`qLJaTd1
z7Iv79QOKyg0YD*k4GKcBV9Js)FJZe+UREmdCN{y8x->CGrPJIQcke~)fUB|A7*%^r
zGaVcsI1*BkXt+w(=?Hs+n4{YF6rN2tXvG{SO{3Z5;No!R{HqUMdg!SiU4PyZv-DU0
zum{4e&n7avzv#ev`hd(5Fdx0HC@KjtRgG`T63V}n5TLJX3(?ZN(pqW7WuwmGpNK{7
zu=XShs`N6`EG$(p?Y(G<+iztOek*Iax#3-FE*!k|!AlRl<$w9MCBr*->nGAszTUp~
zdkfpU25oM5J9}G`4b1flm5+@r%i2SkFcoUnFjGb*ed$fZSyS1v6*kG=k%p0C*J0ZY
zEE5hT>(Y!PjWmbF)6ij9)8Vk5c<|Cg|M>s7<9Umg+rbk*+aFf(-MlD%xM+?a7WjUJ
zDy3+cErs-<E`=>!<E&(oD<82`caG?3JrQXNo#w=t1_b5sQYlEJM5s@i2@(eZe{~jN
z&v<N6A^gx|Me!dN{>qK@b*k^Y1nmfml&WN|7G9aaL4#`UN+X2k%2W)@_9pPgF==3@
zUE!6Gjcec0p(=p{cuSR3Xg9Z>fTD#3>YOP#$u<=JXm${#lLVw)P|>T9(oF#~+{Sx0
z+5h6{@{`}5ko%6u@;m?eVyky;;hk^oZfxxql84H(CaSXLM93vTs~jW8A9+sj<)J*z
zQ)bG;pe-L0iLXL}P5XAUa~=O4EF1jt)){=n-viEI7E4_T*qOw7B^~EV3I!BtN+v_?
zG_RPcAv?%OZ<}C^wop6@cC)pmGr`ITuiiBb|44AlVtPInbi^!`?AW1e=xWUA<CxEa
zA}Q5b#u5~C6%QzeDen)HpdniTott>p%IsE)C7QypBjD)mVYbQ*@P2>fwm-OM$z<$?
zcCq4>k1aaa|C`5(;(ZI;va`9lMdcw+%nT7WE7HR7(3DVM%4lD@U~AZ>mP3XxL~$)+
zm!1vTe)KPNt)-Z3BjtRlj*#*h)x9P|3+;pyc$fO6UieCe8>TPwt@jP6b8Q?KNSbUw
zWuGueq^Auv8%ZDV2^-lU)m1&JV=dJ|?dPUeMrbI~1DoVQD7&d@sa$BlooO0V>%g2U
zaUUR{s#MxJZA}NYODphd;^G$2XWAPKt2z^Cz=??>DMpJbk7hZP8-bOu>g03_dA||E
z4@J+lc^irK``&lYl4O3q^35I@-nvH~mrF%QuCG{t5=rv`@Mz9s=NeWacGaQSou^%x
zXul=cwglxXE+$YNaOCb3vz9C)!YRe^qfW&=bA}+Kk{=0#IeNM#*-8KP#?Rif%udR>
z>@OZ))MYo%isBas+_|x_JKdrRMGBHf0{2lvon~cXfXoqHGR|VnAhgodB50!(_oCK}
zdsz#vloGX&P3@dxRg9I6zQ2}mHCleuv5!m^Kx*_?&<qDYXOsQ+Uf#kiAeVVF%Q`^#
zB?E#=Xo9ROgRX^)4Q$Qr6;m=O&b$gj!FH6MY1NdDX%&<yIcJSBq0v$@Zli}XCpL29
zoU@wyTzilPMH3Y|&U0&GuA|!_J%||R@1w!goi1saHW%<5%DR5oz`2$GAHlAj58(I!
g03VA81ONa4009360763o02=@U00000000000J~&Q761SM

literal 0
HcmV?d00001


From aa43543e1fb609901d09b7a9f0c5e72707cb47a4 Mon Sep 17 00:00:00 2001
From: Emma Rousseau <emmarou1@icloud.com>
Date: Sat, 26 Oct 2024 20:42:22 +0200
Subject: [PATCH 39/42] Rseqc innerdistance (#159)

* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586bec0ac9e4312b016e29c3aa0bd53f292b2.

* full component with two tests

* fix default values

* adjust argument names and container image

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   1 +
 .../rseqc_inner_distance/config.vsh.yaml      | 116 ++++++++++++++++++
 src/rseqc/rseqc_inner_distance/help.txt       |  43 +++++++
 src/rseqc/rseqc_inner_distance/script.sh      |  25 ++++
 src/rseqc/rseqc_inner_distance/test.sh        |  77 ++++++++++++
 .../rseqc_inner_distance/test_data/test.bed12 |   4 +
 .../test_data/test.paired_end.sorted.bam      | Bin 0 -> 10205 bytes
 .../test_data/test1.inner_distance.txt        |  49 ++++++++
 .../test_data/test1.inner_distance_freq.txt   | 100 +++++++++++++++
 .../test_data/test2.inner_distance.txt        |   4 +
 .../test_data/test2.inner_distance_freq.txt   | 100 +++++++++++++++
 11 files changed, 519 insertions(+)
 create mode 100644 src/rseqc/rseqc_inner_distance/config.vsh.yaml
 create mode 100644 src/rseqc/rseqc_inner_distance/help.txt
 create mode 100644 src/rseqc/rseqc_inner_distance/script.sh
 create mode 100644 src/rseqc/rseqc_inner_distance/test.sh
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test.bed12
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test.paired_end.sorted.bam
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance.txt
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance_freq.txt
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance.txt
 create mode 100644 src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance_freq.txt

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3fc134fd..0e32edb1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,7 @@
 * `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).
 
 * `rseqc`:
+  - `rseqc/rseqc_inner_distance`: Calculate inner distance between read pairs (PR #159).
   - `rseqc/rseqc_inferexperiment`: Infer strandedness from sequencing reads (PR #158).
   - `rseqc/bam_stat`: Generate statistics from a bam file (PR #155).
 
diff --git a/src/rseqc/rseqc_inner_distance/config.vsh.yaml b/src/rseqc/rseqc_inner_distance/config.vsh.yaml
new file mode 100644
index 00000000..e050bb24
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/config.vsh.yaml
@@ -0,0 +1,116 @@
+name: "rseqc_inner_distance"
+namespace: "rseqc"
+description: |
+  Calculate inner distance between read pairs.
+links:
+  homepage: https://rseqc.sourceforge.net/
+  documentation: https://rseqc.sourceforge.net/#inner-distance-py
+  issue_tracker: https://github.com/MonashBioinformaticsPlatform/RSeQC/issues
+  repository: https://github.com/MonashBioinformaticsPlatform/RSeQC
+references:
+  doi: 10.1093/bioinformatics/bts356
+license: GPL-3.0
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+- name: "Input"
+  arguments: 
+  - name: "--input_file"
+    alternatives: ["-i"]
+    type: file 
+    required: true
+    description: input alignment file in BAM or SAM format
+
+  - name: "--refgene"
+    alternatives: ["-r"]
+    type: file 
+    required: true
+    description: Reference gene model in bed format
+
+  - name: "--sample_size"
+    alternatives: ["-k"]
+    type: integer
+    example: 1000000
+    description: Numer of reads sampled from SAM/BAM file, default = 1000000.
+  
+  - name: "--mapq"
+    alternatives: ["-q"]
+    type: integer
+    example: 30 
+    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
+
+  - name: "--lower_bound"
+    alternatives: ["-l"]
+    type: integer
+    example: -250 
+    description: Lower bound of inner distance (bp). This option is used for ploting histograme, default=-250.
+
+  - name: "--upper_bound"
+    alternatives: ["-u"]
+    type: integer
+    example: 250 
+    description: Upper bound of inner distance (bp). This option is used for ploting histograme, default=250.
+
+  - name: "--step"
+    alternatives: ["-s"]
+    type: integer
+    example: 5 
+    description: Step size (bp) of histograme. This option is used for plotting histogram, default=5.
+
+- name: "Output"
+  arguments: 
+  - name: "--output_prefix"
+    alternatives: ["-o"]
+    type: string
+    required: true
+    description: Rrefix of output files.
+
+  - name: "--output_stats"
+    type: file
+    direction: output
+    description: output file (txt) with summary statistics of inner distances of paired reads
+
+  - name: "--output_dist"
+    type: file
+    direction: output
+    description: output file (txt) with inner distances of all paired reads
+
+  - name: "--output_freq"
+    type: file
+    direction: output
+    description: output file (txt) with frequencies of inner distances of all paired reads
+
+  - name: "--output_plot"
+    type: file
+    direction: output
+    description: output file (pdf) with histogram plot of of inner distances of all paired reads
+
+  - name: "--output_plot_r"
+    type: file
+    direction: output
+    description: output file (R) with script of histogram plot of of inner distances of all paired reads
+    
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+  
+engines:
+- type: docker
+  image: python:3.10
+  setup:   
+    - type: apt
+      packages: [r-base]
+    - type: python
+      packages: [ RSeQC ]
+    - type: docker
+      run: |
+        echo "RSeQC - inner_distance.py: $(inner_distance.py --version | cut -d' ' -f2)" > /var/software_versions.txt
+runners: 
+- type: executable
+- type: nextflow
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inner_distance/help.txt b/src/rseqc/rseqc_inner_distance/help.txt
new file mode 100644
index 00000000..18f97bb6
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/help.txt
@@ -0,0 +1,43 @@
+```
+inner_distance.py --help
+```
+
+Usage: inner_distance.py [options]
+
+Calculate the inner distance (insert size)  of RNA-seq fragments. 
+
+               RNA fragment
+ _________________||_________________
+|                                    |
+|                                    |
+||||||||||------------------||||||||||
+  read_1      insert_size     read_2
+
+fragment size = read_1 + insert_size + read_2
+
+
+
+Options:
+  --version             show program's version number and exit
+  -h, --help            show this help message and exit
+  -i INPUT_FILE, --input-file=INPUT_FILE
+                        Alignment file in BAM or SAM format.
+  -o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX
+                        Prefix of output files(s)
+  -r REF_GENE, --refgene=REF_GENE
+                        Reference gene model in BED format.
+  -k SAMPLESIZE, --sample-size=SAMPLESIZE
+                        Number of read-pairs used to estimate inner distance.
+                        default=1000000
+  -l LOWER_BOUND_SIZE, --lower-bound=LOWER_BOUND_SIZE
+                        Lower bound of inner distance (bp). This option is
+                        used for ploting histograme. default=-250
+  -u UPPER_BOUND_SIZE, --upper-bound=UPPER_BOUND_SIZE
+                        Upper bound of inner distance (bp). This option is
+                        used for plotting histogram. default=250
+  -s STEP_SIZE, --step=STEP_SIZE
+                        Step size (bp) of histograme. This option is used for
+                        plotting histogram. default=5
+  -q MAP_QUAL, --mapq=MAP_QUAL
+                        Minimum mapping quality (phred scaled) for an
+                        alignment to be called "uniquely mapped". default=30
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inner_distance/script.sh b/src/rseqc/rseqc_inner_distance/script.sh
new file mode 100644
index 00000000..fe00c590
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/script.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+set -exo pipefail 
+
+
+inner_distance.py \
+    -i $par_input_file \
+    -r $par_refgene \
+    -o $par_output_prefix \
+    ${par_sample_size:+-k "${par_sample_size}"} \
+    ${par_lower_bound:+-l "${par_lower_bound}"} \
+    ${par_upper_bound:+-u "${par_upper_bound}"} \
+    ${par_step:+-s "${par_step}"} \
+    ${par_mapq:+-q "${par_mapq}"} \
+> stdout.txt
+
+if [[ -n $par_output_stats ]]; then head -n 2 stdout.txt > $par_output_stats; fi
+
+
+[[ -n "$par_output_dist" && -f "$par_output_prefix.inner_distance.txt" ]] && mv $par_output_prefix.inner_distance.txt $par_output_dist
+[[ -n "$par_output_plot" && -f "$par_output_prefix.inner_distance_plot.pdf" ]] && mv $par_output_prefix.inner_distance_plot.pdf $par_output_plot
+[[ -n "$par_output_plot_r" && -f "$par_output_prefix.inner_distance_plot.r" ]] && mv $par_output_prefix.inner_distance_plot.r $par_output_plot_r
+[[ -n "$par_output_freq" && -f "$par_output_prefix.inner_distance_freq.txt" ]] && mv $par_output_prefix.inner_distance_freq.txt $par_output_freq
+
+exit 0
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inner_distance/test.sh b/src/rseqc/rseqc_inner_distance/test.sh
new file mode 100644
index 00000000..927a69a9
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+
+
+# define input and output for script
+input_bam="$meta_resources_dir/test_data/test.paired_end.sorted.bam"
+input_bed="$meta_resources_dir/test_data/test.bed12"
+
+output_stats="inner_distance_stats.txt"
+output_dist="inner_distance.txt"
+output_plot="inner_distance_plot.pdf"
+output_plot_r="inner_distance_plot.r"
+output_freq="inner_distance_freq.txt"
+
+# Run executable
+echo "> Running $meta_functionality_name"
+
+"$meta_executable" \
+    --input_file $input_bam \
+    --refgene $input_bed \
+    --output_prefix "test" \
+    --output_stats $output_stats \
+    --output_dist $output_dist \
+    --output_plot $output_plot \
+    --output_plot_r $output_plot_r \
+    --output_freq $output_freq
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Check whether output is present and not empty"
+
+[[ -f "$output_stats" ]] || { echo "$output_stats was not created"; exit 1; }
+[[ -s "$output_stats" ]] || { echo "$output_stats is empty"; exit 1; }
+[[ -f "$output_dist" ]] || { echo "$output_dist was not created"; exit 1; }
+[[ -s "$output_dist" ]] || { echo "$output_dist is empty"; exit 1; }
+[[ -f "$output_plot" ]] || { echo "$output_plot was not created"; exit 1; }
+[[ -s "$output_plot" ]] || { echo "$output_plot is empty"; exit 1; }
+[[ -f "$output_plot_r" ]] || { echo "$output_plot_r was not created"; exit 1; }
+[[ -s "$output_plot_r" ]] || { echo "$output_plot_r is empty"; exit 1; }
+[[ -f "$output_freq" ]] || { echo "$output_freq was created"; exit 1; }
+[[ -s "$output_freq" ]] || { echo "$output_freq is empty"; exit 1; }
+
+echo ">> Check whether output is correct"
+diff "$output_freq" "$meta_resources_dir/test_data/test1.inner_distance_freq.txt" || { echo "Output is not correct"; exit 1; }
+diff "$output_dist" "$meta_resources_dir/test_data/test1.inner_distance.txt" || { echo "Output is not correct"; exit 1; }
+
+# clean up
+rm "$output_stats" "$output_dist" "$output_plot" "$output_plot_r" "$output_freq"
+################################################################################
+
+echo "> Running $meta_functionality_name with non-default parameters and default output file names"
+"$meta_executable" \
+    --input_file $input_bam \
+    --refgene $input_bed \
+    --output_prefix "test" \
+    --sample_size 4 \
+    --mapq 10
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+echo ">> Check whether output is present and not empty"
+
+[[ -f "test.inner_distance.txt" ]] || { echo "test.inner_distance.txt was not created"; exit 1; }
+[[ -s "test.inner_distance.txt" ]] || { echo "test.inner_distance.txt is empty"; exit 1; }
+[[ -f "test.inner_distance_plot.pdf" ]] || { echo "test.inner_distance_plot.pdf was not created"; exit 1; }
+[[ -s "test.inner_distance_plot.pdf" ]] || { echo "test.inner_distance_plot.pdf is empty"; exit 1; }
+[[ -f "test.inner_distance_plot.r" ]] || { echo "test.inner_distance_plot.r was not created"; exit 1; }
+[[ -s "test.inner_distance_plot.r" ]] || { echo "test.inner_distance_plot.r is empty"; exit 1; }
+[[ -f "test.inner_distance_freq.txt" ]] || { echo "test.inner_distance_freq.txt was created"; exit 1; }
+[[ -s "test.inner_distance_freq.txt" ]] || { echo "test.inner_distance_freq.txt is empty"; exit 1; }
+
+echo ">> Check whether output is correct"
+diff "test.inner_distance_freq.txt" "$meta_resources_dir/test_data/test2.inner_distance_freq.txt" || { echo "Output is not correct"; exit 1; }
+diff "test.inner_distance.txt" "$meta_resources_dir/test_data/test2.inner_distance.txt" || { echo "Output is not correct"; exit 1; }
+
+exit 0
\ No newline at end of file
diff --git a/src/rseqc/rseqc_inner_distance/test_data/test.bed12 b/src/rseqc/rseqc_inner_distance/test_data/test.bed12
new file mode 100644
index 00000000..33a46951
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test_data/test.bed12
@@ -0,0 +1,4 @@
+MT192765.1	1242	1264	nCoV-2019_5_LEFT	1	+	1242	1264	0	2	10,12,	0,10,
+MT192765.1	1573	1595	nCoV-2019_6_LEFT	2	+	1573	1595	0	2	7,15,	0,7,
+MT192765.1	1623	1651	nCoV-2019_5_RIGHT	1	-	1623	1651	0	2	14,14,	0,14,
+MT192765.1	1942	1964	nCoV-2019_6_RIGHT	2	-	1942	1964	0	2	11,11	0,11,
diff --git a/src/rseqc/rseqc_inner_distance/test_data/test.paired_end.sorted.bam b/src/rseqc/rseqc_inner_distance/test_data/test.paired_end.sorted.bam
new file mode 100644
index 0000000000000000000000000000000000000000..8b215e12d1a932f1619cf7ded7e172141f45d479
GIT binary patch
literal 10205
zcmV<3CnDG%iwFb&00000{{{d;LjnMD0fmy^PQox0hpTtRm)Hxe<5ULHD+VM;vd!s)
z+ok&hE@3OK4I1x#Sl_~Iz@|9l?zE@<zWz=+ww$Z4YlKGkQ@nH;kUsVwSR_Odr#V+i
zXXHO(-(}7-4C$b662P|0=<tpXJENx=o=+KId(1Xz2-vgP+_o3a+_kXpFqKG!y#)(-
z5s5f~&d94SWNDsed`q$CHuchFl)ykQhCRR&yKZPYf7fK8l`v_<TybN?rQ(L<g^a|G
z8ncNNbOFqsY4%J4f#o}g)eeGnl8Y>j$SUWjh*u-d8%zMC+9d0b3kPX^@EAz)ObM}(
zWuT_^euV=9Rjy-S+oj2yru(5*gZU;Wl4qw>0;k-%ZsST(C`}g)cFWTuiT89-s3ayK
z&sy7Ii=3X56WHr%w<Xp~HN&79VOf!bCCTF-kN6|IJu9RC%5=SqbWvFj_X;6&gAi&y
zH2PEseFL0k-#`ch001A02m}BC000301^_}s0su`WwS5b$EmwKo9KT{4GdZ(7+bldP
znKd0b3kTWf{QxJrGk08uBqYdJN+7}%;?k-JC@M+QAR>89S7C{uiWCPTDz#3XP$C+s
z&xA^#5*26?p+YG#h&GCa0JXTSQ0t}<icG$LJ?6Fd+54PxZ|2@}&faTg?{n5S-~ayC
z+U-_kkH3qr<e^uwt>~%fLy<e~#JOkj^WjGxIovrqJv}&mVDI$k=p=gM9Z{qdS3Hr4
zVY+KIODj>Pigm5zveTlGwPB6vOwqPl)Fl^PDnzFmW7?u=+p4INMy0$-)57F=9+z<$
zj=X;qXEcs88GXz1(p?wF^uS&k)5Y#FouXlSM%Tx<EW_z6v+qP#=NQY&cwv9iBl=w#
zeaB-A(aSM{2e|f~yH|hvoqJcKM_&0eyU|a)`s!A7?OL(**WUcs|M@Tf*H`@Qw_c4N
zd)w6)zWwc2qn~?6^_sW66X%{C;pVWs7QHSS@LhcGd*55a@&mU<Q2|h7V^|{7gb7i0
zf_16Ug*JwzZC6*NmZd6nZFF^o{?d5ifS`$G^pvJZM+Pi2{9DlEFhfk$;u*diWO<qS
z=dpWnj?twF-&|e0`UK{C^Bs5ou%GX3!+ftk@hVT9?xrYWcDj%C)Oi+T-<wirclY4r
zc*k-lNvm47O~F%;q%5jVC9N=;nY6CTRO*^(UKn1b+;mBmFkLaG5~1o;x2-Blok*ce
zol3=WI^w{Y5p^cddS=nTmE(CvOvv+oWW{CSOrC{bmu0!blE&$APG4mSD8Dm?G6HvA
zM%*E6JeG?QcV4#jV;;<3@4<X3`mhJ{LpMj!wXG?b2gj$UC(%0qCYMRtaZ^dv6s)RE
zBV^S{rIX4SktT_35+M`DQ_(isu%fM4$LkhXCS}z$LJ3()-j+NO+qpf;g$K}*EMK_C
zmQhyJoVa?>gWP8Ja_}gFtl=xiw;0cI48;D`;qu&$*o!P{OAn(J>})}&a>wPIMv0kU
znTujcaDK7*Y7oV{M{t%~ft%0vaDK4oCh>J^DvD!no(m~vB={>UU8_P$ooZo{l*vxj
ztOd1`RN|FHl{_PkIBBHE_K1lr`0pl8Gk27t=a~b@tD-!B+><dIh?!fJ7`V8SvT^?w
z_7XbI!_8#E&Hp>%p#$?;^oD5Q>u*0l1=Df!87MZROa<}*Jgt<hH9^X&npcSdQI#kw
z)^<8csv3MXwVhoV+sRvk5KLZKWed;A-0ADgagn|)0#!Z1@)-<y87BUf*~G7Yaysu9
zjP%<l1Bd=|KkdOZK98~gFw<{`ho=VzmP0LLO;Q?JC21;o#kHtJVGL%Sa9t;90&&HF
z;7yb0P9!jj+!RtJrr-%Qp5$FpBuxusoO<hZ()h*>59hbtQz0Hxqa4UYr!q2mj(Jvz
z*;AR5H~#{h=<P+h$l}0K3<JvFy%>iAn9q+m6u=~ae|RR41DHDD0HzTz1TYbhu|zi&
zZwr8^mFXBT?S$eYWy%O9(grv~?ZJ%jMjGC<X<|w);g}Sd04bQ9JOFYg7V!eg#;I&e
z9Gq|(u(t(HtQ;X@e<!Rm10){JfPWl-r}thghMSU2MRPISrNGPI-Q;N8IdL?Cv3SWX
zgXImz;#;>k2V@N9$-&{lo;4OT@QNvT$vW^-Q@0W?tazQ^M|;I1<rvs&_k1La1et&8
zOowymz@XS&5lm+sNM~;a*$?-&WE!l4L*kxWo8rK<hNHCCS-&l|Mu)B-IRDWY&YJ+|
zz0O#?_8|}FT_ZT}-TID)?o}f=Uq2dq{Y=o0P7inYqfbM>&4jf|lG?x~7^NFk@viE0
zr#jK<vST9A17J7qh&`V$>(|ef+b7OeZl9RUdB+*Z@tl8SCXbh$^Vxs)bs-T1C&!#$
zzh=%ikLUacBM}KsK^+aE^d~br=ENqatJ@mhRI8F&NL`Y?CNHZZuM6VR=)(Fj5d4ML
zqt>$~E52}=!g@UZ?_OA?v7pD!Jd^XiL|bTqtMA5S-g?I`2G-s_n#{Xj?sdTS3kK`-
zem|K6WAC2@ZimMwd%Mx6Z;zr{a+VtSHZZ_dsuG==ve265RB71?1)tGC##05;Y?xt8
z$fo9+o0L~=lPWkpLY<q;`3<_JTsG|Cb-?Zy=jMp7F-LssbZbD0?K*Sx%h7=7&&@(L
zhu*&cdWBLTcOuJ{rAZ5Qi1<t*$klbqYu)iAVMfCawOm$QNhJ!V8(}J=D!7SN3%$zt
zHrXGFILh3GpSd-yxicr3Re)LS6`c=%dywKW3RY2O{*kjpiwofCh)3Y<jKRBS2Hw^P
zynD8O+=KVG?piiKcfT--UOqKH`#YxxyF1qWR4Jm8hIpiT+9e{9t!`m=suH0cQV1N5
z5>upISYld@*f%M<#H2_RBm`w@+Kx<Bm1-BO_<&^L0<d_L4Cu}5h$|!h_(#1S8hn7>
zVCPn*!}ZSkTQlWvo;6Q#7Q3i2cH@Ba8rzza@qrnh_l)p-VCzS>2JGg+3Ss=Ysb2DU
zl5b?Vfv_qG<XX<)Cne-+LXwjzl`!LN0yT|j2C9qKX_087VDcH#E)^o28euq`PF2=Q
zCe}dX?+5vabXL0EHwwnG1x=Mgx!il$*#Kfb*5`2+d~#grV!ABO;@&Zjy*Cw{rXiW*
zoP@<9#GOtn&d^-t9I=UW^j`uKW1tzmibc_HMsFUCMN`GNzjttQbQr;{K{S^@u@M-$
zDmnyo$T75_T!n{aC4`b%CX98R5t_G7Z9DoorEdRd@6V9#cYcdwl4TIh)rQ5EE!M@P
zE6ESf9q7(sa>*%+nw>YzGdR$8S|yR>b%W5IIV}=8Nvgdr6mOM{JW>Yvk{uLMTO-MW
zA;o3Qs?<mZy)O)h;+KswMkAJGr0|TSt4({wdRvz1G>zEJ!QHUf&Ny=cf{V^<27u=5
z95^pW4vfr#5d1{<E`+F~m`!j6C(ygxv3seLLV87J1<vcRqcKZfP+hzFIgsRs_^*c4
z)h#3A`MKag-@d)9!5-aS%6je}!~Hq6VAgP#m1>)|0;=$YkjpfR!E82$Gsz%<jcky>
zf=nh_rX^fyQ^BX!kV8l*UJGlA6jsxvS8l97m~TEc{e|;Pm_6ql=w~jbz;5+$KHS6k
zRP>f;7<<bM&ZDE#<HP95+rUjIDM|$j$q?t&s;=NoYD6GyEm#U8z<E-@1wqbOG%cbP
zWJ%e1amz1pW_kdMH_pB4FHe!UW5la3Pk?xZb1TL`d}!)cTrUvkozr6FRasu<=fENe
z*d`#x0`}#gtsdQ8Hfbbuiy073z-<2Qypz|eYEmRqSgK*flD4JGp{Jm%7%B~hOGjzz
zTY1L#>{{uPa=A~IJ2cJ*<r<tep9oMsvnhA-{gd2D@V9=7Eo-Z<VeH<izjbhQcyQ>#
zeJ07HSE6otqFdF$n}^~D*F=H*FoWhtykBsMs0Q^1sT8?q9sDm1T_qV6BwCIX0F2?4
z8zyzZSBTGkjk`^GpSsM+#JWeByC7s5Afnf9-MUzEIQUy%W6R_ujK4MYw>+Li&K`{l
z{u3wCagMBTs(7Iam{n20%Wq*xInsNYC3K|#;vLwGXrXH0yCPf!N!bicn0^WdMzxG}
zVD_OvS+2|uUgq%RoT0gjop(>z8PNP2cQ2<&Z%U(RcgD`+)03TJmnKEUOSCDz8MygE
zlmgCH!+0xrB@`%#FjXU0B^Byk8YLrmFyO9Am_*L1LnfIVTwx43V>dpvG91}hYlirK
z0M2Dj$+7J(bh$mC$v2{zu0)fNo99Uh;V;a%IiLxz6|hOk5R9g^F;&Y{K_XmrGBrjb
zm)8=uHNu3B>S+nW1kMVUAUJ5?#2AS)h*c{UJajLB&M}C6u_5jwgK`4F^0N!ykTcCu
z;O1sDU$ipo{Oxp^n-46a=;^6ra=3#!(2<L*Sehc23|}s(;q^<SHT<^Dq=k}gqF{AY
zgTzTnejr4fQ5J=n7D-fI3(AyqaHW%~l{&d0++6Vn8{XC@tuq(*+IVBAg;>|-3!A`X
z6ELrC4ZsCz{=;Yh`2ML)_h9DKD{x+@5O0wkm)Z(HB?Azy!o^N;6CmZxN2D~p#Yif%
zFyme<<1ln_c+Y*Sf;C@)vD`W{@rrY;hw<O{Fg_Ojs)zA8jD3E_$^E_4<J0IhfKfm=
zg@l>b32;N^6_y0jT9_6x->5{Sl0iLos)b5z(vE?(2=H{5ih{|yOYuUfQlZ=Jvp5;w
z2utEZIZHPe4tmo~@onDM1{@>v(FMTrGjlBOUt@dj@8$dV7R&N|?>$lUzh_t;>>Tbn
zH}i%;$@mSjJs}*+CuCef%*@x}dTp6cHK+S)0Or>(X2?TO{9F&_2YWY@$a*l7Z%0gW
zGfRz<H%pKfPB@1*1lg@xs(7|alf_|hHzhKn6^eB=N7~sah`2<`56)AgxGGY-gOEI7
z+qM{#*<6S%*JU0P7p=H}KXc-p`H9DFJj(|szITe^5yG>ggX8*Q<ca3xD*6XG|IDVK
zcr51+^m0x-4C*q-x#jB&PRC7>^UGp~C`op3;bE;>)SOjSUdcm6&2^`QR;Y!q<a*U<
zhf6<XOT&R_fm!*AqWTC(yz#gPXEC}a57`}8DfL)>eUqGTuaxsE_bj&>eHdf^b0+7<
z`^P)Zj<@Vw!0{Py$c$0$K52_CP4JU~BKx;5${TgWtA{sOqNO$ufcf;wa_5iy_t!nN
zLeB5)iTUAs200JJB!JTyF&&shOc-w}8`nH#jh0jzO43e>isNTnD8m}wnhtPP30!yh
zo~VPsTNf>`q#8blmPHm3g%cQ{Y>?JopHYj<$lneb=n=KnqjP!V6^QxtCStyDLd>su
z+<vpi@~P<6qp?@diFvetauR(JCB^fB_;?^)s8Wowc8-hv^F<jhaamRe<pdSX??>iY
zoR3VWH)=Gv`jN@(_M8eE&+cE(nR3zDg$B-SawoFYcc}kP8fNOKP<kLg$LzMovn#wD
zyynHjz5AtT82hE^y*t|7-QC-dzW9AngfuQvPDLsb#Z!Yw3ef^m!KOjHm2w1P2pJUe
zQ-Z^RKzW8MNDhXP$dakQr=wdqyGtvN?Cha4rJ^@$K6JC91eAV$j?#ChD8(a`zB@r_
z$8lhc(%+kM;Krden@pY!<X$n`mD)7Fj6MjOU=Jw${2G*QAEOj>>gWH`z@=WNz9fpC
zpK<AMcYpuH>Qv;~sMrEO5_K1ZNJ*v3G(~)bOfaFAGQkt1mt<9>NZl#XwWytHV2Mi+
zw-+4wJII;l-l=znIPMFY-f}x2MwPcRL`$*Hr`ead@?9Qk%AwYr^_Sb4C#mSnXFRM3
z=e9kEzOqHWh<)=J>2)Ca$Mf`DJ}cmjjeWjVZ4I~)mN$WM{~#Lhy=lfchh>AaPu`AK
zbTeE~ypv6b0F#`3s(sPHX<UMP$dFdG;?yFGYBSeRQK*vg)U}8&T;F);R&u(^rs$}N
zGtRx^<REhKb!b&z@;+TH%NCJs<E!VVN4~G?kCa+SzF0agfp{l@_}%~Y?5bY$J9}sT
zz8_fbCw>xR+gnpYo*eC*`Wk(g)?i|b)Uj!ZY9;cSb&5i>2FYr}kS=1-;ao6dl2W0n
zTp?r^sEn#s3x*1GDl4iarsFX4f`Yi@2K58+stfJ+aawLEK{dLL;QX^q&U`j;=7R(J
z<PR*T<p|>MJU`{6gVP3@6GRfNJE9$j!s`^O6$TMy-8Iy<s7-|!6bdl`7kP`PbX8(n
zRGyPgGd>RvC$rDWaZ)m$IRtB9j>~0?&v^uH{@JDq{oRuaeaLbC$6`5ZfA3$8qMx2}
z^B9HfBbR@#IW%BFrD&6C(IV;;9aJGh!H4=FZ~}E9+keAFZBUve8v__oLV_wqEu_Os
zYSFS;|5ARw$*-wcj6$Y7Jea+C?&=u6w&A`9{TkjGInM11W{g91h>#1^43bU=|JIb6
z9!-+;1PS=M(0~uICW1^P*=s7CbtrL2l!%ZHL53Xwrzs)L3<xR#u~v;JYnZ4ifwDkq
zu5BN;E!nPb#LX@M;V0^5nvV&bZzkR4bVqo5fV~pW^~LNU>VN+Sm-BPC$SAsJ%FdJH
zozp!R9tnoXQn983?M$Uv3;Qimzo>-P)L|hd^<LB_DP-bSx3G?PZPLo5<5X5cklIja
z+6EVKYz$^d#M2X{H@Bnauq<4ehnrBYKs_G~xs-7p%#ID%<6xaEId9wA6d#SjynpN3
z49uHHrOT%=_CKO2n1@G)d&klCZPdCGMQ^<M5}+z6OsUdNaTJM+M23zE>o8to`!boj
zaod=;GWj^+7WZddf-3e6ER;#1$F^6d^!-7U)uPpB)<d#+7(y2s^s_5nq%$g*c@OK;
zJ*-2jj-Y<g)J5{JMq3WncaWP@AOk{cgwCR12P@RJ$~5qpltyAtYq;|%{=@%dX^}uh
zRL~KrdYpVE>Zo62eKS{P9YnZKnR5$i0f8fB9-N)rx=PuY(s+SS=f&1~x#mY=wYzLA
z6Xb5qll=aM_|pCy)|&zAT?Fel-}LLN8o}?{`XdkF{~L+^Px`SEqW^bhY&|(VJvxfM
zgLrq2pJ_W()tfnfkLSJ<*e;>!|64yhc0;)f-rcD<^wL=x`o=Nm;oFNb#h;obf3Cg{
zGyFf*7s3p09fc|H3$}vv*gVBcy!rOjR(Rd!n#cw@Vja8{juH(jhN#@Y&QgjEg&I>S
zfszhL!4Sx`90sQCYU&kFD^W>f1Q=C6?!pvm7Q_7}lJq43*9U9kirD>gv59w<Gq;#?
zh<sz`a=T1%-&we9(7odwo{Wx#9a;GL;%yhS6;F@I5#;dU5sBV6lS4<CXQ0*!l0$rL
z!city7LFfoEo|=K@hx|yZ#Z{@jyK+=TLc_z^Lf6CDUV}n@Bh)atZXlT#53jJ%4Nm<
zhL<8Zn_9us{o{kZz34T!B0Re(vOV(#fooeU)S?RrhHZR+P{8{|a0cDoB8F`cKp@*{
zc$462rYl@f>zX0_O_K%{X`G?T%i|YS+{+icbYonGRg7bzxQ^A>=NPh>T2M((xs-!l
zDdie7U0Z(^5>(6G5_xs)>Yq#%_lriTem(dWUzf|C?=3*}-%S<F>F(*#@vilJQ3usP
z2em1b_o?NO!_sMnaH2Da9AE+A)9Q*N9Ft87fF%lC2zjYR0%3-(sG^`D!pV8s=kmOq
zx0OtzsnO3-)Vdg6KJtg%sorSP+T>7#9@hMjD*MDH$Ct0HR}pY?$Cm(p<y81Qn0CqR
zrLjweK)eQB*idH$SK`VDGC!ro)VIpLQ<j%&;*8;jtWJKG=cWH>QQ0Rp>9RW}x-8g+
zH@tM&Hjozh^wc(Zi6=PMBZPDT_*z5QTZU+?MI}I2m1w|XDR>DdSiquy-pttL+}2z0
zerV=Gq(vT%Ebo`=SuE``TJNJxSm};94iY|i&;$EP4J_6G83=n;U(|Zr5PLj7e*Ylr
z@9=nj{(h(Z$9Nv^pYHEO(f5(8!_h_k4`~8anqe}fDH4izX>Fe1$Q39-%Su{sx1-Ru
zk{xtqx)d~A?ugGA&j)AC{T6WS&GS_9GiO#yu0HcequGaYLU6ALr4P&m)*(Xf6*C=b
zadk>*JeG8<X({HHw5~x46!5_v0ZF}*#olKMYcoP)`XX!%;m+q9raB1Gn>V;u<BG!j
zUK|#=_P6}8=okD1|LPA#(f^;_@{^;3185}N^65N~_GybW#Tci!$G~$!-Ay9{2WlvM
z!K}i^<XR#_<Qp@`QyOBD<s<V;ScbbgQ!NCTwRsEIZXcF!dUu;z{%J{O25^3VzF_Se
z0OtpvyyKx2z0uzYwvT|AcU=--cg*g-1BYxMZ(6Qs&K)XBT5t%5?ohQwS(g$*5?&^i
z%L_cz5DVxUTG9aMl+q}kr-r)qOW;+Bd@=O<W_|&^U;ga7QEXHa_bn>6Epu2Q+yfi;
zi+(2CPYgQ9J(}%Iv20BP&whWC?Yn1Xo8W%}n0L<#;xzW5*^<?r-GigkeOD2eoY%5O
zt`^zd8ptBEh@uR~w@LvlpbULY%ZDln-2>i<E|ui2F=cn<5H*BQW8785y~bI%WgdQE
zv;)Nz>f$(L!t+4SfX0}p;f?&hEHn1{4J2aQxt+O<UHsaw%-cM>0L6I{=3<>~6Et_G
z)TFVu%of9r&}?oXntqcSJ42gK!_@3-?P42e$a0VEBKG_Wg`0fbr@F}U!19sd>njRl
zVPATb_cthTQvgjfPEN>KdiUlLnsVzU9?dWI+<Yo}S2T>hYtGHRo#W%<=t-!->jLv)
zK|+4P(<3>aFC!$3-aa>k&z;%PfAzTuLf#OL5qi&@Z#M`b`^Q>mYC(}CJqlru@4(1>
ziZ--T#rd>;I9^~mIu_$ofpiNu99TpMy>tyiV@vwnD?LK(%Le`SL66W^Fm~V8j6+99
zdxxjdj*XPYLRD$RpbZk(2(zgGs5{z`q%`nCTLbuz(2ypDPgQV~4BMve3{N^LDww=C
ztgFeeGBgT61Q&C<^u}9}I3F8?Jodtw^?Akqdp@oTy49u2`Rx1F)CcbC{rZ3UvOye!
zU;inL{j0e+9_^usZyoz}=*UDea!ym)wyG5x9!f1L%6w5jKJ}o(WTTpDBobq?c|5;p
z41d`p{GM!7-f<RUvHa_anEz~(U(Z&q*8Qy!p21jr6=R=W!t;1%-<Pk9syfu4;LU2?
zNEEC%ycZ=q#o+5TTD1yU5JdULH1Pjg4pdQ@K_w6m3(1jR6pG=lC^5jfSS<Yd?|5)L
zhGdryIb*-mkRx0^a5U>_=eOl>{Qy2y>r33$h0fdt9Ijahx!%nG4Y-bWA)~rD(ERu&
z)4RR0diALu%^`><#{R@y6%OpYZVY9YYP=lU^zl9y5d>|j@kfMN5ufL?%kpt!Pw=(#
z@?zu<YI7qyZ=JC7)mwv-yn7bKlQR9<^K(3{;<PE=>w?hWs?MY}EgM#D5%VY@w;Wr9
zO%C$BuQhr9#%9>Yojs1Q@fsaFxdHjvr>1zGqa2v76!))=2pYtlRKP2i2-=JLMRjDt
z8KV?Ar<KEou9B(&+(?{Q(RIe!?)m4q3J1}qgu}Ln0dbPZ&~)o|X3M>M?sqN9C(VL+
z@1Do;u$hFv4O2@$v=@1Letc8u|F%i#KLmk4{lH+?Lvi)J82j8@d7hjcpB%dCYMXY3
zbzO?^H=!Ctswyr}W}(tbgHX9e%uG$EaEUrFk(wxI88*CZozPwg2(6JvK+f4m%x?Eb
zS69irW%fPLJrU!wH(C@I0=&#M_{4rIqwr-z6NhEiztml3TiI>Y92UFxZPplF#4fc=
zVE)d^a@FsB>^B}-u`cwko|^A{V374dO#=Akb80#;-D0uzprsBp?Lh-`Ycg7>Xw}rQ
zXBSAI{+}jxmpg}!FXmZljJCYW@{uaFQ+AkT6Y*T1lJUx7sfB<3ib>e=JDV&~ywVbh
z2bcY%`!V)U=G$5B@9iJ%J4?jUMlh)W3Qr3#6DgdofX9TivZyItBkPW~<Uln9<-t}<
zAWOXzw4aFqHw}<%B{lpE%WTiiO|3sxI$7EH?Cc7=b7h~Wo}KfH{s_+}HYv`<O2v8q
zgUgDOFn-f~3r2^hT?1Y~Vb)63QdF%;O(8o7I<LVwEt!@gMVX9OH42izmLnAEbb@TC
zT?eiKbO-ioU8|&9;!8HM;Js6nPi;s(`?=!Od5>9!Ik5AIP3*jNB|9Ho!Ol-TxRhGl
z-96kr-gl+j3-PakW`NE#Re_utijZ2c7MLRQ%o|yqU5J|xp3TBDBJ04*ETl3<{zJ(8
z4{tE|W_B}@RVzv!ShF&8sCInn!DYXbFur5X&H>Lg_`tbGL{7Woaa2NTc>A=gQ3~w<
zjG3ZxDD46Ql^`U(r`sXMbS23?kL#o~8*Sh?UH(2#C$7E|SvjiWja{qx)$!yeoUuaZ
zfi)Xy4SuKDS?1@XyHWJkS-`$`cyhSoHn43;S5`>pssaW+5wuYeqbhMLRFN{J7_DrA
zbJPNIT8~Fdb11Jt8y!_i2kqO!C;?e@ku6oGj{MmUq&u^s=iKd#Ipp5GxeM=``LpyK
ze&<~q{m#25fPUA5_t_rM$D*Gdjs5Ig;7?BX4)>y*_7dTV5DL0bm0HuvCx#-ML9SfX
zWc3qXBm(HBP^dgr@C2?iG@m21;xq{L+IfcTGiPs4_?gM<51eZ=p8Y?bi|2LB{t`O<
z{6`Qcntirr_BV}ZAH3w9-DNNNZ5aFL?A{|JKG|{02~Z<6DeawrDw#}BcjXPFifhGS
zJd>stFqN_u37#hkCe)-I>;as-N>s>=!M!UQL@g{~R@eLB!aA*PGp~@dcd3NM!VSf4
z3%1y~$9}IgdjWNF`9AqLZ1U;%c5=I9QLB$#@0t0kjFlamU$7ft<v!=1h21^h11KL{
z=Oy1d@sdL-@ni(%6CTV*F!nF!U>+PD?w>?gXuTT&$=3sE)&Qw2kXmYSLz7>t-YK8d
z{Nm~5tE<9*@?&$IB-iL9*&EGAc9)Y21n*DIq|!rajpp^BCo7Ezr9^mwu_W(vUwucb
zo8hQ4bME2(g&d>c#Y_3eHtG1=RvOJmcbA<IGNeyDKf`k{nx=)JM5ezf5+iGrL%9+a
z5(j{&E#MV1Rk9S-VXb9dwoD0;$Qr}6M+gcBsF7lfR%j;873Xfm-rXM+j6x2<H`j%U
z<|{E+hfjHqS4`g3Cu{DyV6rCczxB2eLH}knDAS*r6Vxfr55R0(2FUHqZPMpre^O$Z
zUqiJBd)3*sLO$;A8?pWF?qRPAryGZm!=KE3t|oS$&G4SN{ru9IdDW|53X)pwExTjC
zwzsq~0%EU|-4o~DVvZ=B3)D|4CJGozl-Kc8h(gga=p;p`#SLdjP$g+)5|-eELRJgD
znQ4#+Ol2)Z%Ua1M4cEcRoVl)ZNnPo&4zw6jGh@drAC1H=C+)J-&Y{hm1qu1HbGf&7
zuM8J{ZEsmp31c=R)(Fp};YgRjhHy%yLaWi9k_rlrrK;>GXV4&lnusdZg$4<cdZo?F
zjBF`$K#ncyeTa8il5UrdTmJJB+!70$*pGu%&KFtl&1bU}E(5In@MO2Ca|L$Gw<Ny=
zo)3n-^?N)Yi*z)M={Z05cTY}Eqbs*l{{l;FBQpGc+VPES39_qPb%vHwB11sF9YS14
zut+LeOe#~|G~g+GA_X6?-?($tRM#mt2XZ!U?#Rp8hP<N}pV>&2S>yYzIb^r5(QUU+
zA@fZNW60h;r{Il+Yzjy|nasLl^mz<%c^+H5ijaY2_X4tat%L0531p!|^h<lorMe&6
zUs?pYyMMd~)NBgsVuA@I6>1<&Qwb)xfWO$biq`OH*o!lKzsor#OU+vD$OZ|Id*|F`
zTz&B~;OLbCS=X(NE6rX!$w38A`oE3f44(8i_LeiG`v*IR2+GKlzA;GBz8&j5=@lZE
zuXwZ?+Hu-%bZs`fY_!drKfugixtu5cjTQEU;C<6vm>rbXp4<=^Wz(hgcG$?xmU%6#
z@pz)kk49U<Y&g#9W8N?tB>h(|=Sh=P&+#PbpRLagl5Y3NB>iR*>bRw@Qq(o^Uo-+~
z1o%COhM_Ha!;y3Dq-k_p^AeXzTvfC%;ygtW3eh%{XZ83f{nLxw@+7^T*#zNg^RZ;4
z)*r!>-rBlc_Wn)#%c`@(*uR;{`oZqr$^L2h^qcEK5%tarnfw%a)%@H$?gT^@Yp<B<
z4&V3bIi35Wb6$q_#gGtnd%Fh>Vd?qNi*2p-SGES^0zE@J6CwQLGjaE5?%0h8S&4F7
zf>Z=j_*Ben8zetbJnl5o`lW%hjXX25TAUnoZnS1d={E|=3m9`%Ak0HNt@sreA%_qy
zvpc=JB4lXxDtq3>Wx2Z8HbFES0ePNdECE5#Kf0-#>CTniOxp*8;}`_}VT}F3oR_pG
zjdR42XDDRbw1yUxCP)$z{mM?4Ey^CCo6^vs(2yL(<+4l)p=FnLDUuID@RAmgmyN;I
z;HAqm$LE`@f0nPiY^+k_ksgji%`^6O>8wxn_`5Yp@Bq(GUMyk`HvQoq&ky!CoiIM0
zGt=QoHeI9Y(HN9H>>l}WD2uvc1<D$t6}9FyOS;w|J_O=*qg9G>Nzu}$IJ9*mpc-VV
zfNj@u=4O%EQjVveoA|=T!SS?{)tBwg*d{3USGbPZ3$KT;pN6=w)coWon||j?oBplC
z<(T_VG4`?f=Y{Q`oF2JsjwpE}s)9Z}lae|$q}Xn53tWYOG*=12#*T6OsTkDM$d&05
z+H4+LmXpRz6A-pi%Jeb!Xpyq}%6yl0o|SVK%J0^=3QothGs5*=t-6>fb9$wZjuxZV
z6yW)hd9TQQvqjEqBI>~e5q2)%84AdR@oVQF8Rqb`?G4urBdfv78miN+ptSqj#eTI>
zwg~%ClX;34zE0P#x3}Lym>maB4%_W`470(MpQ#;2OAdP9)GN-H0?m)CGeLK(>~a5v
zqvaI?e~7VfFMXQb-tMVGlXF#;N|m$|b3>c7(!-i|&_R(?Q7RYGE_9h7hNNaO2)nF=
zN$qFZbscKW2|P~CO*!4C`Nr*~x9Z5|u$~zR5Srh-*rpqyLi5q1LDoZsCOU?{IY;wo
z7e1$}(2(t+9Z4k0nNXMXIT6$nrr~~S##Kch$Hr;tk8-T+8bBvt%L@<^3AQFRNK7JP
zzsR!K<?FJQe*EgU8w&R$A7R|&<C>MJh_kd;(W*@%#$evxE6zVWS}rRPUGG`?Y(Ps*
z+o60(=unGLM%a(Vk)St|nz;{Pu>zj^&o{|@kGxwFWak?crITjo7}{1Q`!IGN9GK<T
z(Z4gk8D7-Cxyhz~|4QGTh&b=OA;NCY@pR<0zWW716ActCZE(~Ip{cVTo;$+N!Z7Yu
zZ~D(Mb=yGXp(kM;94%pWrsbYDd(B^U4mAIGlc>LNBI<t*BU)<_B6|P;ABzYC00000
X0RIL6LPG)o8vp|U0000000000Q5X>i

literal 0
HcmV?d00001

diff --git a/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance.txt b/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance.txt
new file mode 100644
index 00000000..e5f09f8f
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance.txt
@@ -0,0 +1,49 @@
+ERR5069949.29668	-4	sameTranscript=No,dist=genomic
+ERR5069949.114870	-45	sameTranscript=No,dist=genomic
+ERR5069949.147998	94	sameTranscript=No,dist=genomic
+ERR5069949.155944	-105	sameTranscript=No,dist=genomic
+ERR5069949.184542	49	sameTranscript=No,dist=genomic
+ERR5069949.169513	-92	sameTranscript=No,dist=genomic
+ERR5069949.257821	-139	sameTranscript=No,dist=genomic
+ERR5069949.309410	13	sameTranscript=No,dist=genomic
+ERR5069949.376959	-66	sameTranscript=No,dist=genomic
+ERR5069949.366975	-106	sameTranscript=No,dist=genomic
+ERR5069949.465452	-19	sameTranscript=No,dist=genomic
+ERR5069949.479807	5	sameTranscript=No,dist=genomic
+ERR5069949.501486	-82	sameTranscript=No,dist=genomic
+ERR5069949.532979	-96	sameTranscript=No,dist=genomic
+ERR5069949.540529	-61	sameTranscript=No,dist=genomic
+ERR5069949.573706	-63	sameTranscript=No,dist=genomic
+ERR5069949.576388	-77	sameTranscript=No,dist=genomic
+ERR5069949.611123	-125	sameTranscript=No,dist=genomic
+ERR5069949.651338	-33	sameTranscript=No,dist=genomic
+ERR5069949.686090	-29	sameTranscript=No,dist=genomic
+ERR5069949.786562	42	sameTranscript=No,dist=genomic
+ERR5069949.870926	-22	sameTranscript=No,dist=genomic
+ERR5069949.856527	-69	sameTranscript=No,dist=genomic
+ERR5069949.885966	-32	sameTranscript=No,dist=genomic
+ERR5069949.937422	18	sameTranscript=No,dist=genomic
+ERR5069949.919671	-116	sameTranscript=No,dist=genomic
+ERR5069949.973930	-79	sameTranscript=No,dist=genomic
+ERR5069949.986441	-22	sameTranscript=No,dist=genomic
+ERR5069949.1014693	-150	sameTranscript=No,dist=genomic
+ERR5069949.1020777	-122	sameTranscript=No,dist=genomic
+ERR5069949.1066259	-4	sameTranscript=No,dist=genomic
+ERR5069949.1062611	-124	sameTranscript=No,dist=genomic
+ERR5069949.1067032	-103	sameTranscript=No,dist=genomic
+ERR5069949.1088785	-101	sameTranscript=No,dist=genomic
+ERR5069949.1132353	-142	sameTranscript=No,dist=genomic
+ERR5069949.1151736	-55	sameTranscript=No,dist=genomic
+ERR5069949.1258508	62	sameTranscript=No,dist=genomic
+ERR5069949.1189252	-98	sameTranscript=No,dist=genomic
+ERR5069949.1261808	-88	sameTranscript=No,dist=genomic
+ERR5069949.1246538	-122	sameTranscript=No,dist=genomic
+ERR5069949.1328186	-64	sameTranscript=No,dist=genomic
+ERR5069949.1331889	-132	sameTranscript=No,dist=genomic
+ERR5069949.1372331	-29	sameTranscript=No,dist=genomic
+ERR5069949.1340552	-140	sameTranscript=No,dist=genomic
+ERR5069949.1412839	-117	sameTranscript=No,dist=genomic
+ERR5069949.1476386	-98	sameTranscript=No,dist=genomic
+ERR5069949.1538968	-133	sameTranscript=No,dist=genomic
+ERR5069949.1552198	-67	sameTranscript=No,dist=genomic
+ERR5069949.1561137	-59	sameTranscript=No,dist=genomic
diff --git a/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance_freq.txt b/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance_freq.txt
new file mode 100644
index 00000000..908326ff
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test_data/test1.inner_distance_freq.txt
@@ -0,0 +1,100 @@
+-250	-245	0
+-245	-240	0
+-240	-235	0
+-235	-230	0
+-230	-225	0
+-225	-220	0
+-220	-215	0
+-215	-210	0
+-210	-205	0
+-205	-200	0
+-200	-195	0
+-195	-190	0
+-190	-185	0
+-185	-180	0
+-180	-175	0
+-175	-170	0
+-170	-165	0
+-165	-160	0
+-160	-155	0
+-155	-150	1
+-150	-145	0
+-145	-140	2
+-140	-135	1
+-135	-130	2
+-130	-125	1
+-125	-120	3
+-120	-115	2
+-115	-110	0
+-110	-105	2
+-105	-100	2
+-100	-95	3
+-95	-90	1
+-90	-85	1
+-85	-80	1
+-80	-75	2
+-75	-70	0
+-70	-65	3
+-65	-60	3
+-60	-55	2
+-55	-50	0
+-50	-45	1
+-45	-40	0
+-40	-35	0
+-35	-30	2
+-30	-25	2
+-25	-20	2
+-20	-15	1
+-15	-10	0
+-10	-5	0
+-5	0	2
+0	5	1
+5	10	0
+10	15	1
+15	20	1
+20	25	0
+25	30	0
+30	35	0
+35	40	0
+40	45	1
+45	50	1
+50	55	0
+55	60	0
+60	65	1
+65	70	0
+70	75	0
+75	80	0
+80	85	0
+85	90	0
+90	95	1
+95	100	0
+100	105	0
+105	110	0
+110	115	0
+115	120	0
+120	125	0
+125	130	0
+130	135	0
+135	140	0
+140	145	0
+145	150	0
+150	155	0
+155	160	0
+160	165	0
+165	170	0
+170	175	0
+175	180	0
+180	185	0
+185	190	0
+190	195	0
+195	200	0
+200	205	0
+205	210	0
+210	215	0
+215	220	0
+220	225	0
+225	230	0
+230	235	0
+235	240	0
+240	245	0
+245	250	0
diff --git a/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance.txt b/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance.txt
new file mode 100644
index 00000000..a1930c9e
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance.txt
@@ -0,0 +1,4 @@
+ERR5069949.29668	-4	sameTranscript=No,dist=genomic
+ERR5069949.114870	-45	sameTranscript=No,dist=genomic
+ERR5069949.147998	94	sameTranscript=No,dist=genomic
+ERR5069949.155944	-105	sameTranscript=No,dist=genomic
diff --git a/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance_freq.txt b/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance_freq.txt
new file mode 100644
index 00000000..021311a2
--- /dev/null
+++ b/src/rseqc/rseqc_inner_distance/test_data/test2.inner_distance_freq.txt
@@ -0,0 +1,100 @@
+-250	-245	0
+-245	-240	0
+-240	-235	0
+-235	-230	0
+-230	-225	0
+-225	-220	0
+-220	-215	0
+-215	-210	0
+-210	-205	0
+-205	-200	0
+-200	-195	0
+-195	-190	0
+-190	-185	0
+-185	-180	0
+-180	-175	0
+-175	-170	0
+-170	-165	0
+-165	-160	0
+-160	-155	0
+-155	-150	0
+-150	-145	0
+-145	-140	0
+-140	-135	0
+-135	-130	0
+-130	-125	0
+-125	-120	0
+-120	-115	0
+-115	-110	0
+-110	-105	1
+-105	-100	0
+-100	-95	0
+-95	-90	0
+-90	-85	0
+-85	-80	0
+-80	-75	0
+-75	-70	0
+-70	-65	0
+-65	-60	0
+-60	-55	0
+-55	-50	0
+-50	-45	1
+-45	-40	0
+-40	-35	0
+-35	-30	0
+-30	-25	0
+-25	-20	0
+-20	-15	0
+-15	-10	0
+-10	-5	0
+-5	0	1
+0	5	0
+5	10	0
+10	15	0
+15	20	0
+20	25	0
+25	30	0
+30	35	0
+35	40	0
+40	45	0
+45	50	0
+50	55	0
+55	60	0
+60	65	0
+65	70	0
+70	75	0
+75	80	0
+80	85	0
+85	90	0
+90	95	1
+95	100	0
+100	105	0
+105	110	0
+110	115	0
+115	120	0
+120	125	0
+125	130	0
+130	135	0
+135	140	0
+140	145	0
+145	150	0
+150	155	0
+155	160	0
+160	165	0
+165	170	0
+170	175	0
+175	180	0
+180	185	0
+185	190	0
+190	195	0
+195	200	0
+200	205	0
+205	210	0
+210	215	0
+215	220	0
+220	225	0
+225	230	0
+230	235	0
+235	240	0
+240	245	0
+245	250	0

From cc67547928466ba5e4bd36173b249ebb539f9509 Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 2 Nov 2024 10:28:08 +0100
Subject: [PATCH 40/42] Add agat sq stat basic (#110)

* add help

* add config

* add run script

* add test data and expected output + script to fetch them

* add test

* update changelog

* handle input --gff has multiple=true

* cleanup config

* add direction for input arguments

* update config: add requirements, add keywords, update --config description

* remove unset IFS

* add set -eo pipefail to script and test files

* create temporary directory and clean up on exit

* cleanup changelog

* Update CHANGELOG.md

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |   2 +-
 src/agat/agat_sq_stat_basic/config.vsh.yaml   |  92 ++
 src/agat/agat_sq_stat_basic/help.txt          |  79 ++
 src/agat/agat_sq_stat_basic/script.sh         |  31 +
 src/agat/agat_sq_stat_basic/test.sh           |  36 +
 src/agat/agat_sq_stat_basic/test_data/1.gff   | 942 ++++++++++++++++++
 .../test_data/agat_sq_stat_basic_1.gff        |  12 +
 .../agat_sq_stat_basic/test_data/script.sh    |  10 +
 8 files changed, 1203 insertions(+), 1 deletion(-)
 create mode 100644 src/agat/agat_sq_stat_basic/config.vsh.yaml
 create mode 100644 src/agat/agat_sq_stat_basic/help.txt
 create mode 100644 src/agat/agat_sq_stat_basic/script.sh
 create mode 100644 src/agat/agat_sq_stat_basic/test.sh
 create mode 100644 src/agat/agat_sq_stat_basic/test_data/1.gff
 create mode 100644 src/agat/agat_sq_stat_basic/test_data/agat_sq_stat_basic_1.gff
 create mode 100755 src/agat/agat_sq_stat_basic/test_data/script.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0e32edb1..c8d86fa5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,7 @@
   - `agat/agat_sp_filter_feature_from_kill_list`: remove features in a GFF file based on a kill list (PR #105).
   - `agat/agat_sp_merge_annotations`: merge different gff annotation files in one (PR #106).
   - `agat/agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).
+  - `agat/agat_sq_stat_basic`: provide basic statistics of a gtf/gff file (PR #110).
 
 * `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
 
@@ -68,7 +69,6 @@
   - `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
   - `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
 
-
 * `bedtools`:
   - `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
   - `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
diff --git a/src/agat/agat_sq_stat_basic/config.vsh.yaml b/src/agat/agat_sq_stat_basic/config.vsh.yaml
new file mode 100644
index 00000000..64958991
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/config.vsh.yaml
@@ -0,0 +1,92 @@
+name: agat_sq_stat_basic
+namespace: agat
+description: |
+  The script aims to provide basic statistics of a gtf/gff file.
+keywords: [gene annotations, gff, statistics]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_sq_stat_basic.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+requirements:
+ - commands: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-i, --file, --input]
+        description: |
+          Input GTF/GFF file.
+        type: file
+        required: true
+        multiple: true
+        direction: input
+        example: input.gff
+      - name: --genome_size
+        alternatives: [-g]
+        description: |
+          That input is designed to know the genome size in order to calculate the percentage of the genome represented by each kind of feature type. You can provide an INTEGER. Or you can also pass a fasta file using the argument --genome_size_fasta. If both are provided, only the value of --genome_size will be considered.
+        type: integer
+        required: false
+        direction: input
+        example: 10000
+      - name: --genome_size_fasta
+        description: |
+          That input is designed to know the genome size in order to calculate the percentage of the genome represented by each kind of feature type. You can provide the genome in fasta format. Or you can also pass the size directly as an integer using the argument --genome_size. If you provide the fasta, the genome size will be calculated on the fly. If both are provided, only the value of --genome_size will be considered.
+        type: file
+        required: false
+        direction: input
+        example: genome.fasta
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: [-o]
+        description: |
+          Output file. The result is in tabulate format.
+        type: file
+        direction: output
+        required: true
+        example: output.txt
+  - name: Arguments
+    arguments:
+      - name: --inflate
+        description: |
+            Inflate the statistics taking into account feature with
+            multi-parents. Indeed to avoid redundant information, some gff
+            factorize identical features. e.g: one exon used in two
+            different isoform will be defined only once, and will have
+            multiple parent. By default the script count such feature only
+            once. Using the inflate option allows to count the feature and
+            its size as many time there are parents.
+        type: boolean_true
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        required: false
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_sq_stat_basic/help.txt b/src/agat/agat_sq_stat_basic/help.txt
new file mode 100644
index 00000000..65096991
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/help.txt
@@ -0,0 +1,79 @@
+```sh
+agat_sq_stat_basic.pl --help
+```
+
+ ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_sq_stat_basic.pl
+
+Description:
+    The script aims to provide basic statistics of a gtf/gff file.
+
+Usage:
+        agat_sq_stat_basic.pl -i <input file> [-g <integer or fasta> -o <output file>]
+        agat_sq_stat_basic.pl --help
+
+Options:
+    -i, --gff, --file or --input
+            STRING: Input GTF/GFF file. Several files can be processed at
+            once: -i file1 -i file2
+
+    -g, --genome
+            That input is design to know the genome size in order to
+            calculate the percentage of the genome represented by each kind
+            of feature type. You can provide an INTEGER or the genome in
+            fasta format. If you provide the fasta, the genome size will be
+            calculated on the fly.
+
+    --inflate
+            Inflate the statistics taking into account feature with
+            multi-parents. Indeed to avoid redundant information, some gff
+            factorize identical features. e.g: one exon used in two
+            different isoform will be defined only once, and will have
+            multiple parent. By default the script count such feature only
+            once. Using the inflate option allows to count the feature and
+            its size as many time there are parents.
+
+    -o or --output
+            STRING: Output file. If no output file is specified, the output
+            will be written to STDOUT. The result is in tabulate format.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    --help or -h
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
\ No newline at end of file
diff --git a/src/agat/agat_sq_stat_basic/script.sh b/src/agat/agat_sq_stat_basic/script.sh
new file mode 100644
index 00000000..0f4ab2a6
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/script.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+# unset flags
+[[ "$par_inflate" == "false" ]] && unset par_inflate
+
+# Convert a list of file names to multiple -gff arguments
+input_files=""
+IFS=";" read -ra file_names <<< "$par_gff"
+for file in "${file_names[@]}"; do
+    input_files+="--gff $file "
+done
+
+# take care of --genome (can originally be either a fasta file or an integer)
+if [[ -n "$par_genome_size" ]]; then
+  genome_arg=$par_genome_size
+elif [[ -n "$par_genome_size_fasta" ]]; then
+  genome_arg=$par_genome_size_fasta
+fi
+
+# run agat_convert_sp_bed2gff.pl
+agat_sq_stat_basic.pl \
+  $input_files \
+  ${genome_arg:+--genome "${genome_arg}"} \
+  --output "${par_output}" \
+  ${par_inflate:+--inflate} \
+  ${par_config:+--config "${par_config}"}
diff --git a/src/agat/agat_sq_stat_basic/test.sh b/src/agat/agat_sq_stat_basic/test.sh
new file mode 100644
index 00000000..12bd28cd
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/test.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --gff "$test_dir/1.gff" \
+  --output "$TMPDIR/output.txt" 
+  
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.txt" ] && echo "Output file output.txt does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.txt" ] && echo "Output file output.txt is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.txt" "$test_dir/agat_sq_stat_basic_1.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.txt does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
\ No newline at end of file
diff --git a/src/agat/agat_sq_stat_basic/test_data/1.gff b/src/agat/agat_sq_stat_basic/test_data/1.gff
new file mode 100644
index 00000000..40a06c78
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/test_data/1.gff
@@ -0,0 +1,942 @@
+##gff-version 3
+##sequence-region   1 1 43270923
+#!genome-build RAP-DB IRGSP-1.0
+#!genome-version IRGSP-1.0
+#!genome-date 2015-10
+#!genome-build-accession GCA_001433935.1
+1	RAP-DB	chromosome	1	43270923	.	.	.	ID=chromosome:1;Alias=Chr1,AP014957.1,NC_029256.1
+###
+1	irgsp	repeat_region	2000	2100	.	+	.	ID=fakeRepeat1
+###
+1	irgsp	gene	2983	10815	.	+	.	ID=gene:Os01g0100100;biotype=protein_coding;description=RabGAP/TBC domain containing protein. (Os01t0100100-01);gene_id=Os01g0100100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	2983	10815	.	+	.	ID=transcript:Os01t0100100-01;Parent=gene:Os01g0100100;biotype=protein_coding;transcript_id=Os01t0100100-01
+1	irgsp	exon	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon1;rank=1
+1	irgsp	five_prime_UTR	2983	3268	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	five_prime_UTR	3354	3448	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	3354	3616	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100100-01.exon2;rank=2
+1	irgsp	CDS	3449	3616	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	4357	4455	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon3;rank=3
+1	irgsp	CDS	4357	4455	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	5457	5560	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100100-01.exon4;rank=4
+1	irgsp	CDS	5457	5560	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	7136	7944	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100100-01.exon5;rank=5
+1	irgsp	CDS	7136	7944	.	+	1	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8028	8150	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon6;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100100-01.exon6;rank=6
+1	irgsp	CDS	8028	8150	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8232	8320	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon7;rank=7
+1	irgsp	CDS	8232	8320	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	8408	8608	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100100-01.exon8;rank=8
+1	irgsp	CDS	8408	8608	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	9210	9615	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon9;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100100-01.exon9;rank=9
+1	irgsp	CDS	9210	9615	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10102	10187	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100100-01.exon10;rank=10
+1	irgsp	CDS	10102	10187	.	+	2	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	CDS	10274	10297	.	+	0	ID=CDS:Os01t0100100-01;Parent=transcript:Os01t0100100-01;protein_id=Os01t0100100-01
+1	irgsp	exon	10274	10430	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100100-01.exon11;rank=11
+1	irgsp	three_prime_UTR	10298	10430	.	+	.	Parent=transcript:Os01t0100100-01
+1	irgsp	exon	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01;Name=Os01t0100100-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100100-01.exon12;rank=12
+1	irgsp	three_prime_UTR	10504	10815	.	+	.	Parent=transcript:Os01t0100100-01
+###
+1	irgsp	gene	11218	12435	.	+	.	ID=gene:Os01g0100200;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0100200-01);gene_id=Os01g0100200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11218	12435	.	+	.	ID=transcript:Os01t0100200-01;Parent=gene:Os01g0100200;biotype=protein_coding;transcript_id=Os01t0100200-01
+1	irgsp	five_prime_UTR	11218	11797	.	+	.	Parent=transcript:Os01t0100200-01
+1	irgsp	exon	11218	12060	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100200-01.exon1;rank=1
+1	irgsp	CDS	11798	12060	.	+	0	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	CDS	12152	12317	.	+	1	ID=CDS:Os01t0100200-01;Parent=transcript:Os01t0100200-01;protein_id=Os01t0100200-01
+1	irgsp	exon	12152	12435	.	+	.	Parent=transcript:Os01t0100200-01;Name=Os01t0100200-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100200-01.exon2;rank=2
+1	irgsp	three_prime_UTR	12318	12435	.	+	.	Parent=transcript:Os01t0100200-01
+###
+1	irgsp	gene	11372	12284	.	-	.	ID=gene:Os01g0100300;biotype=protein_coding;description=Cytochrome P450 domain containing protein. (Os01t0100300-00);gene_id=Os01g0100300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	11372	12284	.	-	.	ID=transcript:Os01t0100300-00;Parent=gene:Os01g0100300;biotype=protein_coding;transcript_id=Os01t0100300-00
+1	irgsp	exon	11372	12042	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100300-00.exon2;rank=2
+1	irgsp	CDS	11372	12042	.	-	2	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+1	irgsp	exon	12146	12284	.	-	.	Parent=transcript:Os01t0100300-00;Name=Os01t0100300-00.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100300-00.exon1;rank=1
+1	irgsp	CDS	12146	12284	.	-	0	ID=CDS:Os01t0100300-00;Parent=transcript:Os01t0100300-00;protein_id=Os01t0100300-00
+###
+1	irgsp	gene	12721	15685	.	+	.	ID=gene:Os01g0100400;biotype=protein_coding;description=Similar to Pectinesterase-like protein. (Os01t0100400-01);gene_id=Os01g0100400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12721	15685	.	+	.	ID=transcript:Os01t0100400-01;Parent=gene:Os01g0100400;biotype=protein_coding;transcript_id=Os01t0100400-01
+1	irgsp	five_prime_UTR	12721	12773	.	+	.	Parent=transcript:Os01t0100400-01
+1	irgsp	exon	12721	13813	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100400-01.exon1;rank=1
+1	irgsp	CDS	12774	13813	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	13906	14271	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100400-01.exon2;rank=2
+1	irgsp	CDS	13906	14271	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14359	14437	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100400-01.exon3;rank=3
+1	irgsp	CDS	14359	14437	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	14969	15171	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100400-01.exon4;rank=4
+1	irgsp	CDS	14969	15171	.	+	0	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	CDS	15266	15359	.	+	1	ID=CDS:Os01t0100400-01;Parent=transcript:Os01t0100400-01;protein_id=Os01t0100400-01
+1	irgsp	exon	15266	15685	.	+	.	Parent=transcript:Os01t0100400-01;Name=Os01t0100400-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100400-01.exon5;rank=5
+1	irgsp	three_prime_UTR	15360	15685	.	+	.	Parent=transcript:Os01t0100400-01
+###
+1	irgsp	gene	12808	13978	.	-	.	ID=gene:Os01g0100466;biotype=protein_coding;description=Hypothetical protein. (Os01t0100466-00);gene_id=Os01g0100466;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	12808	13978	.	-	.	ID=transcript:Os01t0100466-00;Parent=gene:Os01g0100466;biotype=protein_coding;transcript_id=Os01t0100466-00
+1	irgsp	three_prime_UTR	12808	12868	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	12808	13782	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon2;rank=2
+1	irgsp	CDS	12869	13102	.	-	0	ID=CDS:Os01t0100466-00;Parent=transcript:Os01t0100466-00;protein_id=Os01t0100466-00
+1	irgsp	five_prime_UTR	13103	13782	.	-	.	Parent=transcript:Os01t0100466-00
+1	irgsp	exon	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00;Name=Os01t0100466-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100466-00.exon1;rank=1
+1	irgsp	five_prime_UTR	13880	13978	.	-	.	Parent=transcript:Os01t0100466-00
+###
+1	irgsp	gene	16399	20144	.	+	.	ID=gene:Os01g0100500;biotype=protein_coding;description=Immunoglobulin-like domain containing protein. (Os01t0100500-01);gene_id=Os01g0100500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	16399	20144	.	+	.	ID=transcript:Os01t0100500-01;Parent=gene:Os01g0100500;biotype=protein_coding;transcript_id=Os01t0100500-01
+1	irgsp	five_prime_UTR	16399	16598	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	16399	16976	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100500-01.exon1;rank=1
+1	irgsp	CDS	16599	16976	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17383	17474	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100500-01.exon2;rank=2
+1	irgsp	CDS	17383	17474	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	17558	18258	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100500-01.exon3;rank=3
+1	irgsp	CDS	17558	18258	.	+	1	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18501	18571	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100500-01.exon4;rank=4
+1	irgsp	CDS	18501	18571	.	+	2	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	18968	19057	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon5;rank=5
+1	irgsp	CDS	18968	19057	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19142	19321	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100500-01.exon6;rank=6
+1	irgsp	CDS	19142	19321	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	CDS	19531	19593	.	+	0	ID=CDS:Os01t0100500-01;Parent=transcript:Os01t0100500-01;protein_id=Os01t0100500-01
+1	irgsp	exon	19531	19629	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100500-01.exon7;rank=7
+1	irgsp	three_prime_UTR	19594	19629	.	+	.	Parent=transcript:Os01t0100500-01
+1	irgsp	exon	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01;Name=Os01t0100500-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100500-01.exon8;rank=8
+1	irgsp	three_prime_UTR	19734	20144	.	+	.	Parent=transcript:Os01t0100500-01
+###
+1	irgsp	gene	22841	26892	.	+	.	ID=gene:Os01g0100600;biotype=protein_coding;description=Single-stranded nucleic acid binding R3H domain containing protein. (Os01t0100600-01);gene_id=Os01g0100600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	22841	26892	.	+	.	ID=transcript:Os01t0100600-01;Parent=gene:Os01g0100600;biotype=protein_coding;transcript_id=Os01t0100600-01
+1	irgsp	five_prime_UTR	22841	23231	.	+	.	Parent=transcript:Os01t0100600-01
+1	irgsp	exon	22841	23281	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100600-01.exon1;rank=1
+1	irgsp	CDS	23232	23281	.	+	0	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	23572	23847	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon2;rank=2
+1	irgsp	CDS	23572	23847	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	23962	24033	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100600-01.exon3;rank=3
+1	irgsp	CDS	23962	24033	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	24492	24577	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0100600-01.exon4;rank=4
+1	irgsp	CDS	24492	24577	.	+	1	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	25445	25519	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100600-01.exon5;rank=5
+1	irgsp	CDS	25445	25519	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	CDS	25883	26391	.	+	2	ID=CDS:Os01t0100600-01;Parent=transcript:Os01t0100600-01;protein_id=Os01t0100600-01
+1	irgsp	exon	25883	26892	.	+	.	Parent=transcript:Os01t0100600-01;Name=Os01t0100600-01.exon6;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0100600-01.exon6;rank=6
+1	irgsp	three_prime_UTR	26392	26892	.	+	.	Parent=transcript:Os01t0100600-01
+###
+1	irgsp	gene	25861	26424	.	-	.	ID=gene:Os01g0100650;biotype=protein_coding;description=Hypothetical gene. (Os01t0100650-00);gene_id=Os01g0100650;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	25861	26424	.	-	.	ID=transcript:Os01t0100650-00;Parent=gene:Os01g0100650;biotype=protein_coding;transcript_id=Os01t0100650-00
+1	irgsp	three_prime_UTR	25861	26039	.	-	.	Parent=transcript:Os01t0100650-00
+1	irgsp	exon	25861	26424	.	-	.	Parent=transcript:Os01t0100650-00;Name=Os01t0100650-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0100650-00.exon1;rank=1
+1	irgsp	CDS	26040	26423	.	-	0	ID=CDS:Os01t0100650-00;Parent=transcript:Os01t0100650-00;protein_id=Os01t0100650-00
+1	irgsp	five_prime_UTR	26424	26424	.	-	.	Parent=transcript:Os01t0100650-00
+###
+1	irgsp	gene	27143	28644	.	+	.	ID=gene:Os01g0100700;biotype=protein_coding;description=Similar to 40S ribosomal protein S5-1. (Os01t0100700-01);gene_id=Os01g0100700;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	27143	28644	.	+	.	ID=transcript:Os01t0100700-01;Parent=gene:Os01g0100700;biotype=protein_coding;transcript_id=Os01t0100700-01
+1	irgsp	five_prime_UTR	27143	27220	.	+	.	Parent=transcript:Os01t0100700-01
+1	irgsp	exon	27143	27292	.	+	.	Parent=transcript:Os01t0100700-01;Name=Os01t0100700-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0100700-01.exon1;rank=1
+1	irgsp	CDS	27221	27292	.	+	0	ID=CDS:Os01t0100700-01;Parent=transcript:Os01t0100700-01;protein_id=Os01t0100700-01
+1	irgsp	exon	27370	27641	.	+	.	Parent=transcript:Os01t0100700-01;Name=Os01t0100700-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100700-01.exon2;rank=2
+1	irgsp	CDS	27370	27641	.	+	0	ID=CDS:Os01t0100700-01;Parent=transcript:Os01t0100700-01;protein_id=Os01t0100700-01
+1	irgsp	exon	28090	28293	.	+	.	Parent=transcript:Os01t0100700-01;Name=Os01t0100700-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100700-01.exon3;rank=3
+1	irgsp	CDS	28090	28293	.	+	1	ID=CDS:Os01t0100700-01;Parent=transcript:Os01t0100700-01;protein_id=Os01t0100700-01
+1	irgsp	CDS	28365	28419	.	+	1	ID=CDS:Os01t0100700-01;Parent=transcript:Os01t0100700-01;protein_id=Os01t0100700-01
+1	irgsp	exon	28365	28644	.	+	.	Parent=transcript:Os01t0100700-01;Name=Os01t0100700-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0100700-01.exon4;rank=4
+1	irgsp	three_prime_UTR	28420	28644	.	+	.	Parent=transcript:Os01t0100700-01
+###
+1	irgsp	gene	29818	34453	.	+	.	ID=gene:Os01g0100800;biotype=protein_coding;description=Protein of unknown function DUF1664 family protein. (Os01t0100800-01);gene_id=Os01g0100800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	29818	34453	.	+	.	ID=transcript:Os01t0100800-01;Parent=gene:Os01g0100800;biotype=protein_coding;transcript_id=Os01t0100800-01
+1	irgsp	five_prime_UTR	29818	29939	.	+	.	Parent=transcript:Os01t0100800-01
+1	irgsp	exon	29818	29976	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0100800-01.exon1;rank=1
+1	irgsp	CDS	29940	29976	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	30146	30228	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100800-01.exon2;rank=2
+1	irgsp	CDS	30146	30228	.	+	2	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	30735	30806	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon3;rank=3
+1	irgsp	CDS	30735	30806	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	30885	30963	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100800-01.exon4;rank=4
+1	irgsp	CDS	30885	30963	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	31258	31325	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0100800-01.exon5;rank=5
+1	irgsp	CDS	31258	31325	.	+	2	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	31505	31606	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon6;rank=6
+1	irgsp	CDS	31505	31606	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	32377	32466	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon7;rank=7
+1	irgsp	CDS	32377	32466	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	32542	32616	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon8;rank=8
+1	irgsp	CDS	32542	32616	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	32712	32744	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon9;rank=9
+1	irgsp	CDS	32712	32744	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	32828	32905	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon10;rank=10
+1	irgsp	CDS	32828	32905	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	33274	33330	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon11;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon11;rank=11
+1	irgsp	CDS	33274	33330	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	33400	33471	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon12;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon12;rank=12
+1	irgsp	CDS	33400	33471	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	33543	33617	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon13;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100800-01.exon13;rank=13
+1	irgsp	CDS	33543	33617	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	CDS	33975	34124	.	+	0	ID=CDS:Os01t0100800-01;Parent=transcript:Os01t0100800-01;protein_id=Os01t0100800-01
+1	irgsp	exon	33975	34453	.	+	.	Parent=transcript:Os01t0100800-01;Name=Os01t0100800-01.exon14;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100800-01.exon14;rank=14
+1	irgsp	three_prime_UTR	34125	34453	.	+	.	Parent=transcript:Os01t0100800-01
+###
+1	irgsp	gene	35623	41136	.	+	.	ID=gene:Os01g0100900;Name=SPHINGOSINE-1-PHOSPHATE LYASE 1%2C Sphingosine-1-Phoshpate Lyase 1;biotype=protein_coding;description=Sphingosine-1-phosphate lyase%2C Disease resistance response (Os01t0100900-01);gene_id=Os01g0100900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	35623	41136	.	+	.	ID=transcript:Os01t0100900-01;Parent=gene:Os01g0100900;biotype=protein_coding;transcript_id=Os01t0100900-01
+1	irgsp	five_prime_UTR	35623	35742	.	+	.	Parent=transcript:Os01t0100900-01
+1	irgsp	exon	35623	35939	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0100900-01.exon1;rank=1
+1	irgsp	CDS	35743	35939	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	36027	36072	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100900-01.exon2;rank=2
+1	irgsp	CDS	36027	36072	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	36517	36668	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100900-01.exon3;rank=3
+1	irgsp	CDS	36517	36668	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	36818	36877	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100900-01.exon4;rank=4
+1	irgsp	CDS	36818	36877	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	37594	37818	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon5;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100900-01.exon5;rank=5
+1	irgsp	CDS	37594	37818	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	37892	38033	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100900-01.exon6;rank=6
+1	irgsp	CDS	37892	38033	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	38276	38326	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0100900-01.exon7;rank=7
+1	irgsp	CDS	38276	38326	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	38434	38525	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon8;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0100900-01.exon8;rank=8
+1	irgsp	CDS	38434	38525	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	39319	39445	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100900-01.exon9;rank=9
+1	irgsp	CDS	39319	39445	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	39553	39568	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon10;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0100900-01.exon10;rank=10
+1	irgsp	CDS	39553	39568	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	39939	40046	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon11;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0100900-01.exon11;rank=11
+1	irgsp	CDS	39939	40046	.	+	2	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	40135	40189	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon12;constitutive=1;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0100900-01.exon12;rank=12
+1	irgsp	CDS	40135	40189	.	+	2	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	40456	40602	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon13;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0100900-01.exon13;rank=13
+1	irgsp	CDS	40456	40602	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	40703	40781	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon14;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0100900-01.exon14;rank=14
+1	irgsp	CDS	40703	40781	.	+	1	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	CDS	40885	41007	.	+	0	ID=CDS:Os01t0100900-01;Parent=transcript:Os01t0100900-01;protein_id=Os01t0100900-01
+1	irgsp	exon	40885	41136	.	+	.	Parent=transcript:Os01t0100900-01;Name=Os01t0100900-01.exon15;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0100900-01.exon15;rank=15
+1	irgsp	three_prime_UTR	41008	41136	.	+	.	Parent=transcript:Os01t0100900-01
+###
+1	irgsp	gene	58658	61090	.	+	.	ID=gene:Os01g0101150;biotype=protein_coding;description=Hypothetical conserved gene. (Os01t0101150-00);gene_id=Os01g0101150;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	58658	61090	.	+	.	ID=transcript:Os01t0101150-00;Parent=gene:Os01g0101150;biotype=protein_coding;transcript_id=Os01t0101150-00
+1	irgsp	exon	58658	61090	.	+	.	Parent=transcript:Os01t0101150-00;Name=Os01t0101150-00.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0101150-00.exon1;rank=1
+1	irgsp	CDS	58658	61090	.	+	0	ID=CDS:Os01t0101150-00;Parent=transcript:Os01t0101150-00;protein_id=Os01t0101150-00
+###
+1	irgsp	gene	62060	65537	.	+	.	ID=gene:Os01g0101200;biotype=protein_coding;description=2%2C3-diketo-5-methylthio-1-phosphopentane phosphatase domain containing protein. (Os01t0101200-01)%3B2%2C3-diketo-5-methylthio-1-phosphopentane phosphatase domain containing protein. (Os01t0101200-02);gene_id=Os01g0101200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	62060	63576	.	+	.	ID=transcript:Os01t0101200-01;Parent=gene:Os01g0101200;biotype=protein_coding;transcript_id=Os01t0101200-01
+1	irgsp	five_prime_UTR	62060	62103	.	+	.	Parent=transcript:Os01t0101200-01
+1	irgsp	exon	62060	62295	.	+	.	Parent=transcript:Os01t0101200-01;Name=Os01t0101200-01.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0101200-01.exon1;rank=1
+1	irgsp	CDS	62104	62295	.	+	0	ID=CDS:Os01t0101200-01;Parent=transcript:Os01t0101200-01;protein_id=Os01t0101200-01
+1	irgsp	exon	62385	62905	.	+	.	Parent=transcript:Os01t0101200-01;Name=Os01t0101200-02.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0101200-02.exon2;rank=2
+1	irgsp	CDS	62385	62905	.	+	0	ID=CDS:Os01t0101200-01;Parent=transcript:Os01t0101200-01;protein_id=Os01t0101200-01
+1	irgsp	exon	62996	63114	.	+	.	Parent=transcript:Os01t0101200-01;Name=Os01t0101200-02.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0101200-02.exon3;rank=3
+1	irgsp	CDS	62996	63114	.	+	1	ID=CDS:Os01t0101200-01;Parent=transcript:Os01t0101200-01;protein_id=Os01t0101200-01
+1	irgsp	CDS	63248	63345	.	+	2	ID=CDS:Os01t0101200-01;Parent=transcript:Os01t0101200-01;protein_id=Os01t0101200-01
+1	irgsp	exon	63248	63576	.	+	.	Parent=transcript:Os01t0101200-01;Name=Os01t0101200-01.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0101200-01.exon4;rank=4
+1	irgsp	three_prime_UTR	63346	63576	.	+	.	Parent=transcript:Os01t0101200-01
+1	irgsp	mRNA	62112	65537	.	+	.	ID=transcript:Os01t0101200-02;Parent=gene:Os01g0101200;biotype=protein_coding;transcript_id=Os01t0101200-02
+1	irgsp	five_prime_UTR	62112	62112	.	+	.	Parent=transcript:Os01t0101200-02
+1	irgsp	exon	62112	62295	.	+	.	Parent=transcript:Os01t0101200-02;Name=Os01t0101200-02.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0101200-02.exon1;rank=1
+1	irgsp	CDS	62113	62295	.	+	0	ID=CDS:Os01t0101200-02;Parent=transcript:Os01t0101200-02;protein_id=Os01t0101200-02
+1	irgsp	exon	62385	62905	.	+	.	Parent=transcript:Os01t0101200-02;Name=Os01t0101200-02.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0101200-02.exon2;rank=2
+1	irgsp	CDS	62385	62905	.	+	0	ID=CDS:Os01t0101200-02;Parent=transcript:Os01t0101200-02;protein_id=Os01t0101200-02
+1	irgsp	exon	62996	63114	.	+	.	Parent=transcript:Os01t0101200-02;Name=Os01t0101200-02.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=2;exon_id=Os01t0101200-02.exon3;rank=3
+1	irgsp	CDS	62996	63114	.	+	1	ID=CDS:Os01t0101200-02;Parent=transcript:Os01t0101200-02;protein_id=Os01t0101200-02
+1	irgsp	CDS	63248	63345	.	+	2	ID=CDS:Os01t0101200-02;Parent=transcript:Os01t0101200-02;protein_id=Os01t0101200-02
+1	irgsp	exon	63248	65537	.	+	.	Parent=transcript:Os01t0101200-02;Name=Os01t0101200-02.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0101200-02.exon4;rank=4
+1	irgsp	three_prime_UTR	63346	65537	.	+	.	Parent=transcript:Os01t0101200-02
+###
+1	irgsp	gene	63350	66302	.	-	.	ID=gene:Os01g0101300;biotype=protein_coding;description=Similar to MRNA%2C partial cds%2C clone: RAFL22-26-L17. (Fragment). (Os01t0101300-01);gene_id=Os01g0101300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	63350	66302	.	-	.	ID=transcript:Os01t0101300-01;Parent=gene:Os01g0101300;biotype=protein_coding;transcript_id=Os01t0101300-01
+1	irgsp	three_prime_UTR	63350	63669	.	-	.	Parent=transcript:Os01t0101300-01
+1	irgsp	exon	63350	63783	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0101300-01.exon7;rank=7
+1	irgsp	CDS	63670	63783	.	-	0	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	63877	64020	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0101300-01.exon6;rank=6
+1	irgsp	CDS	63877	64020	.	-	0	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	64339	64431	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0101300-01.exon5;rank=5
+1	irgsp	CDS	64339	64431	.	-	0	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	64665	64779	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0101300-01.exon4;rank=4
+1	irgsp	CDS	64665	64779	.	-	1	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	64902	65152	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0101300-01.exon3;rank=3
+1	irgsp	CDS	64902	65152	.	-	0	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	65248	65431	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0101300-01.exon2;rank=2
+1	irgsp	CDS	65248	65431	.	-	1	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	CDS	65628	65950	.	-	0	ID=CDS:Os01t0101300-01;Parent=transcript:Os01t0101300-01;protein_id=Os01t0101300-01
+1	irgsp	exon	65628	66302	.	-	.	Parent=transcript:Os01t0101300-01;Name=Os01t0101300-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0101300-01.exon1;rank=1
+1	irgsp	five_prime_UTR	65951	66302	.	-	.	Parent=transcript:Os01t0101300-01
+###
+1	irgsp	gene	72816	78349	.	+	.	ID=gene:Os01g0101600;biotype=protein_coding;description=Immunoglobulin-like fold domain containing protein. (Os01t0101600-01)%3BImmunoglobulin-like fold domain containing protein. (Os01t0101600-02)%3BHypothetical conserved gene. (Os01t0101600-03);gene_id=Os01g0101600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	72816	78349	.	+	.	ID=transcript:Os01t0101600-01;Parent=gene:Os01g0101600;biotype=protein_coding;transcript_id=Os01t0101600-01
+1	irgsp	five_prime_UTR	72816	72902	.	+	.	Parent=transcript:Os01t0101600-01
+1	irgsp	exon	72816	73935	.	+	.	Parent=transcript:Os01t0101600-01;Name=Os01t0101600-01.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0101600-01.exon1;rank=1
+1	irgsp	CDS	72903	73935	.	+	0	ID=CDS:Os01t0101600-01;Parent=transcript:Os01t0101600-01;protein_id=Os01t0101600-01
+1	irgsp	exon	74468	74981	.	+	.	Parent=transcript:Os01t0101600-01;Name=Os01t0101600-02.exon2;constitutive=0;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0101600-02.exon2;rank=2
+1	irgsp	CDS	74468	74981	.	+	2	ID=CDS:Os01t0101600-01;Parent=transcript:Os01t0101600-01;protein_id=Os01t0101600-01
+1	irgsp	CDS	75619	77008	.	+	1	ID=CDS:Os01t0101600-01;Parent=transcript:Os01t0101600-01;protein_id=Os01t0101600-01
+1	irgsp	exon	75619	77205	.	+	.	Parent=transcript:Os01t0101600-01;Name=Os01t0101600-01.exon3;constitutive=0;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0101600-01.exon3;rank=3
+1	irgsp	three_prime_UTR	77009	77205	.	+	.	Parent=transcript:Os01t0101600-01
+1	irgsp	exon	77333	78349	.	+	.	Parent=transcript:Os01t0101600-01;Name=Os01t0101600-01.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101600-01.exon4;rank=4
+1	irgsp	three_prime_UTR	77333	78349	.	+	.	Parent=transcript:Os01t0101600-01
+1	irgsp	mRNA	72823	77699	.	+	.	ID=transcript:Os01t0101600-02;Parent=gene:Os01g0101600;biotype=protein_coding;transcript_id=Os01t0101600-02
+1	irgsp	five_prime_UTR	72823	72902	.	+	.	Parent=transcript:Os01t0101600-02
+1	irgsp	exon	72823	73935	.	+	.	Parent=transcript:Os01t0101600-02;Name=Os01t0101600-02.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0101600-02.exon1;rank=1
+1	irgsp	CDS	72903	73935	.	+	0	ID=CDS:Os01t0101600-02;Parent=transcript:Os01t0101600-02;protein_id=Os01t0101600-02
+1	irgsp	exon	74468	74981	.	+	.	Parent=transcript:Os01t0101600-02;Name=Os01t0101600-02.exon2;constitutive=0;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0101600-02.exon2;rank=2
+1	irgsp	CDS	74468	74981	.	+	2	ID=CDS:Os01t0101600-02;Parent=transcript:Os01t0101600-02;protein_id=Os01t0101600-02
+1	irgsp	CDS	75619	77008	.	+	1	ID=CDS:Os01t0101600-02;Parent=transcript:Os01t0101600-02;protein_id=Os01t0101600-02
+1	irgsp	exon	75619	77699	.	+	.	Parent=transcript:Os01t0101600-02;Name=Os01t0101600-02.exon3;constitutive=0;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0101600-02.exon3;rank=3
+1	irgsp	three_prime_UTR	77009	77699	.	+	.	Parent=transcript:Os01t0101600-02
+1	irgsp	mRNA	75942	77699	.	+	.	ID=transcript:Os01t0101600-03;Parent=gene:Os01g0101600;biotype=protein_coding;transcript_id=Os01t0101600-03
+1	irgsp	five_prime_UTR	75942	75943	.	+	.	Parent=transcript:Os01t0101600-03
+1	irgsp	exon	75942	77699	.	+	.	Parent=transcript:Os01t0101600-03;Name=Os01t0101600-03.exon1;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101600-03.exon1;rank=1
+1	irgsp	CDS	75944	77008	.	+	0	ID=CDS:Os01t0101600-03;Parent=transcript:Os01t0101600-03;protein_id=Os01t0101600-03
+1	irgsp	three_prime_UTR	77009	77699	.	+	.	Parent=transcript:Os01t0101600-03
+###
+1	irgsp	gene	82426	84095	.	+	.	ID=gene:Os01g0101700;Name=DnaJ domain protein C1%2C rice DJC26 homolog;biotype=protein_coding;description=Similar to chaperone protein dnaJ 20. (Os01t0101700-00);gene_id=Os01g0101700;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	82426	84095	.	+	.	ID=transcript:Os01t0101700-00;Parent=gene:Os01g0101700;biotype=protein_coding;transcript_id=Os01t0101700-00
+1	irgsp	five_prime_UTR	82426	82506	.	+	.	Parent=transcript:Os01t0101700-00
+1	irgsp	exon	82426	82932	.	+	.	Parent=transcript:Os01t0101700-00;Name=Os01t0101700-00.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0101700-00.exon1;rank=1
+1	irgsp	CDS	82507	82932	.	+	0	ID=CDS:Os01t0101700-00;Parent=transcript:Os01t0101700-00;protein_id=Os01t0101700-00
+1	irgsp	CDS	83724	83864	.	+	0	ID=CDS:Os01t0101700-00;Parent=transcript:Os01t0101700-00;protein_id=Os01t0101700-00
+1	irgsp	exon	83724	84095	.	+	.	Parent=transcript:Os01t0101700-00;Name=Os01t0101700-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0101700-00.exon2;rank=2
+1	irgsp	three_prime_UTR	83865	84095	.	+	.	Parent=transcript:Os01t0101700-00
+###
+1	irgsp	gene	85337	88844	.	+	.	ID=gene:Os01g0101800;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0101800-01);gene_id=Os01g0101800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	85337	88844	.	+	.	ID=transcript:Os01t0101800-01;Parent=gene:Os01g0101800;biotype=protein_coding;transcript_id=Os01t0101800-01
+1	irgsp	five_prime_UTR	85337	85378	.	+	.	Parent=transcript:Os01t0101800-01
+1	irgsp	exon	85337	85600	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0101800-01.exon1;rank=1
+1	irgsp	CDS	85379	85600	.	+	0	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	85737	85830	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0101800-01.exon2;rank=2
+1	irgsp	CDS	85737	85830	.	+	0	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	85935	86086	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0101800-01.exon3;rank=3
+1	irgsp	CDS	85935	86086	.	+	2	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	86212	86299	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0101800-01.exon4;rank=4
+1	irgsp	CDS	86212	86299	.	+	0	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	86399	87681	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0101800-01.exon5;rank=5
+1	irgsp	CDS	86399	87681	.	+	2	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	88291	88398	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0101800-01.exon6;rank=6
+1	irgsp	CDS	88291	88398	.	+	0	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	CDS	88500	88583	.	+	0	ID=CDS:Os01t0101800-01;Parent=transcript:Os01t0101800-01;protein_id=Os01t0101800-01
+1	irgsp	exon	88500	88844	.	+	.	Parent=transcript:Os01t0101800-01;Name=Os01t0101800-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0101800-01.exon7;rank=7
+1	irgsp	three_prime_UTR	88584	88844	.	+	.	Parent=transcript:Os01t0101800-01
+###
+1	irgsp	gene	86211	88583	.	-	.	ID=gene:Os01g0101850;biotype=protein_coding;description=Hypothetical protein. (Os01t0101850-00);gene_id=Os01g0101850;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	86211	88583	.	-	.	ID=transcript:Os01t0101850-00;Parent=gene:Os01g0101850;biotype=protein_coding;transcript_id=Os01t0101850-00
+1	irgsp	exon	86211	86277	.	-	.	Parent=transcript:Os01t0101850-00;Name=Os01t0101850-00.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101850-00.exon4;rank=4
+1	irgsp	three_prime_UTR	86211	86277	.	-	.	Parent=transcript:Os01t0101850-00
+1	irgsp	three_prime_UTR	86384	87326	.	-	.	Parent=transcript:Os01t0101850-00
+1	irgsp	exon	86384	87694	.	-	.	Parent=transcript:Os01t0101850-00;Name=Os01t0101850-00.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101850-00.exon3;rank=3
+1	irgsp	CDS	87327	87662	.	-	0	ID=CDS:Os01t0101850-00;Parent=transcript:Os01t0101850-00;protein_id=Os01t0101850-00
+1	irgsp	five_prime_UTR	87663	87694	.	-	.	Parent=transcript:Os01t0101850-00
+1	irgsp	exon	88308	88396	.	-	.	Parent=transcript:Os01t0101850-00;Name=Os01t0101850-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101850-00.exon2;rank=2
+1	irgsp	five_prime_UTR	88308	88396	.	-	.	Parent=transcript:Os01t0101850-00
+1	irgsp	exon	88496	88583	.	-	.	Parent=transcript:Os01t0101850-00;Name=Os01t0101850-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101850-00.exon1;rank=1
+1	irgsp	five_prime_UTR	88496	88583	.	-	.	Parent=transcript:Os01t0101850-00
+###
+1	irgsp	gene	88883	89228	.	-	.	ID=gene:Os01g0101900;biotype=protein_coding;description=Similar to OSIGBa0075F02.3 protein. (Os01t0101900-00);gene_id=Os01g0101900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	88883	89228	.	-	.	ID=transcript:Os01t0101900-00;Parent=gene:Os01g0101900;biotype=protein_coding;transcript_id=Os01t0101900-00
+1	irgsp	three_prime_UTR	88883	88985	.	-	.	Parent=transcript:Os01t0101900-00
+1	irgsp	exon	88883	89228	.	-	.	Parent=transcript:Os01t0101900-00;Name=Os01t0101900-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0101900-00.exon1;rank=1
+1	irgsp	CDS	88986	89204	.	-	0	ID=CDS:Os01t0101900-00;Parent=transcript:Os01t0101900-00;protein_id=Os01t0101900-00
+1	irgsp	five_prime_UTR	89205	89228	.	-	.	Parent=transcript:Os01t0101900-00
+###
+1	irgsp	gene	89763	91465	.	-	.	ID=gene:Os01g0102000;Name=NON-SPECIFIC PHOSPHOLIPASE C5;biotype=protein_coding;description=Phosphoesterase family protein. (Os01t0102000-01);gene_id=Os01g0102000;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	89763	91465	.	-	.	ID=transcript:Os01t0102000-01;Parent=gene:Os01g0102000;biotype=protein_coding;transcript_id=Os01t0102000-01
+1	irgsp	three_prime_UTR	89763	89824	.	-	.	Parent=transcript:Os01t0102000-01
+1	irgsp	exon	89763	91465	.	-	.	Parent=transcript:Os01t0102000-01;Name=Os01t0102000-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0102000-01.exon1;rank=1
+1	irgsp	CDS	89825	91411	.	-	0	ID=CDS:Os01t0102000-01;Parent=transcript:Os01t0102000-01;protein_id=Os01t0102000-01
+1	irgsp	five_prime_UTR	91412	91465	.	-	.	Parent=transcript:Os01t0102000-01
+###
+1	irgsp	gene	134300	135439	.	+	.	ID=gene:Os01g0102300;Name=OsTLP27;biotype=protein_coding;description=Thylakoid lumen protein%2C Photosynthesis and chloroplast development (Os01t0102300-01);gene_id=Os01g0102300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	134300	135439	.	+	.	ID=transcript:Os01t0102300-01;Parent=gene:Os01g0102300;biotype=protein_coding;transcript_id=Os01t0102300-01
+1	irgsp	five_prime_UTR	134300	134310	.	+	.	Parent=transcript:Os01t0102300-01
+1	irgsp	exon	134300	134615	.	+	.	Parent=transcript:Os01t0102300-01;Name=Os01t0102300-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0102300-01.exon1;rank=1
+1	irgsp	CDS	134311	134615	.	+	0	ID=CDS:Os01t0102300-01;Parent=transcript:Os01t0102300-01;protein_id=Os01t0102300-01
+1	irgsp	exon	134698	134824	.	+	.	Parent=transcript:Os01t0102300-01;Name=Os01t0102300-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0102300-01.exon2;rank=2
+1	irgsp	CDS	134698	134824	.	+	1	ID=CDS:Os01t0102300-01;Parent=transcript:Os01t0102300-01;protein_id=Os01t0102300-01
+1	irgsp	CDS	134912	135253	.	+	0	ID=CDS:Os01t0102300-01;Parent=transcript:Os01t0102300-01;protein_id=Os01t0102300-01
+1	irgsp	exon	134912	135439	.	+	.	Parent=transcript:Os01t0102300-01;Name=Os01t0102300-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0102300-01.exon3;rank=3
+1	irgsp	three_prime_UTR	135254	135439	.	+	.	Parent=transcript:Os01t0102300-01
+###
+1	irgsp	gene	139826	141555	.	+	.	ID=gene:Os01g0102400;Name=HAP5H SUBUNIT OF CCAAT-BOX BINDING COMPLEX;biotype=protein_coding;description=Histone-fold domain containing protein. (Os01t0102400-01);gene_id=Os01g0102400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	139826	141555	.	+	.	ID=transcript:Os01t0102400-01;Parent=gene:Os01g0102400;biotype=protein_coding;transcript_id=Os01t0102400-01
+1	irgsp	exon	139826	139906	.	+	.	Parent=transcript:Os01t0102400-01;Name=Os01t0102400-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0102400-01.exon1;rank=1
+1	irgsp	five_prime_UTR	139826	139906	.	+	.	Parent=transcript:Os01t0102400-01
+1	irgsp	five_prime_UTR	140120	140149	.	+	.	Parent=transcript:Os01t0102400-01
+1	irgsp	exon	140120	141555	.	+	.	Parent=transcript:Os01t0102400-01;Name=Os01t0102400-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0102400-01.exon2;rank=2
+1	irgsp	CDS	140150	141415	.	+	0	ID=CDS:Os01t0102400-01;Parent=transcript:Os01t0102400-01;protein_id=Os01t0102400-01
+1	irgsp	three_prime_UTR	141416	141555	.	+	.	Parent=transcript:Os01t0102400-01
+###
+1	irgsp	gene	141959	144554	.	+	.	ID=gene:Os01g0102500;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0102500-01);gene_id=Os01g0102500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	141959	144554	.	+	.	ID=transcript:Os01t0102500-01;Parent=gene:Os01g0102500;biotype=protein_coding;transcript_id=Os01t0102500-01
+1	irgsp	five_prime_UTR	141959	142083	.	+	.	Parent=transcript:Os01t0102500-01
+1	irgsp	exon	141959	142631	.	+	.	Parent=transcript:Os01t0102500-01;Name=Os01t0102500-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0102500-01.exon1;rank=1
+1	irgsp	CDS	142084	142631	.	+	0	ID=CDS:Os01t0102500-01;Parent=transcript:Os01t0102500-01;protein_id=Os01t0102500-01
+1	irgsp	exon	143191	143431	.	+	.	Parent=transcript:Os01t0102500-01;Name=Os01t0102500-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0102500-01.exon2;rank=2
+1	irgsp	CDS	143191	143431	.	+	1	ID=CDS:Os01t0102500-01;Parent=transcript:Os01t0102500-01;protein_id=Os01t0102500-01
+1	irgsp	exon	143563	143680	.	+	.	Parent=transcript:Os01t0102500-01;Name=Os01t0102500-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0102500-01.exon3;rank=3
+1	irgsp	CDS	143563	143680	.	+	0	ID=CDS:Os01t0102500-01;Parent=transcript:Os01t0102500-01;protein_id=Os01t0102500-01
+1	irgsp	CDS	143817	143908	.	+	2	ID=CDS:Os01t0102500-01;Parent=transcript:Os01t0102500-01;protein_id=Os01t0102500-01
+1	irgsp	exon	143817	144554	.	+	.	Parent=transcript:Os01t0102500-01;Name=Os01t0102500-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0102500-01.exon4;rank=4
+1	irgsp	three_prime_UTR	143909	144554	.	+	.	Parent=transcript:Os01t0102500-01
+###
+1	irgsp	gene	145603	147847	.	+	.	ID=gene:Os01g0102600;Name=Shikimate kinase 4;biotype=protein_coding;description=Shikimate kinase domain containing protein. (Os01t0102600-01)%3BSimilar to shikimate kinase family protein. (Os01t0102600-02);gene_id=Os01g0102600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	145603	147847	.	+	.	ID=transcript:Os01t0102600-01;Parent=gene:Os01g0102600;biotype=protein_coding;transcript_id=Os01t0102600-01
+1	irgsp	five_prime_UTR	145603	145644	.	+	.	Parent=transcript:Os01t0102600-01
+1	irgsp	exon	145603	145786	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0102600-01.exon1;rank=1
+1	irgsp	CDS	145645	145786	.	+	0	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	145905	145951	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon2;constitutive=0;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0102600-01.exon2;rank=2
+1	irgsp	CDS	145905	145951	.	+	2	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	146028	146082	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon3;constitutive=0;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0102600-01.exon3;rank=3
+1	irgsp	CDS	146028	146082	.	+	0	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	146179	146339	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon4;constitutive=0;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0102600-01.exon4;rank=4
+1	irgsp	CDS	146179	146339	.	+	2	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	146450	146532	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon5;constitutive=0;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0102600-01.exon5;rank=5
+1	irgsp	CDS	146450	146532	.	+	0	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	146611	146719	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon6;constitutive=0;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0102600-01.exon6;rank=6
+1	irgsp	CDS	146611	146719	.	+	1	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	147106	147184	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon7;constitutive=0;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0102600-01.exon7;rank=7
+1	irgsp	CDS	147106	147184	.	+	0	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	147311	147375	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-02.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0102600-02.exon2;rank=8
+1	irgsp	CDS	147311	147375	.	+	2	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	CDS	147507	147575	.	+	0	ID=CDS:Os01t0102600-01;Parent=transcript:Os01t0102600-01;protein_id=Os01t0102600-01
+1	irgsp	exon	147507	147847	.	+	.	Parent=transcript:Os01t0102600-01;Name=Os01t0102600-01.exon9;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0102600-01.exon9;rank=9
+1	irgsp	three_prime_UTR	147576	147847	.	+	.	Parent=transcript:Os01t0102600-01
+1	irgsp	mRNA	147104	147805	.	+	.	ID=transcript:Os01t0102600-02;Parent=gene:Os01g0102600;biotype=protein_coding;transcript_id=Os01t0102600-02
+1	irgsp	five_prime_UTR	147104	147105	.	+	.	Parent=transcript:Os01t0102600-02
+1	irgsp	exon	147104	147184	.	+	.	Parent=transcript:Os01t0102600-02;Name=Os01t0102600-02.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0102600-02.exon1;rank=1
+1	irgsp	CDS	147106	147184	.	+	0	ID=CDS:Os01t0102600-02;Parent=transcript:Os01t0102600-02;protein_id=Os01t0102600-02
+1	irgsp	exon	147311	147375	.	+	.	Parent=transcript:Os01t0102600-02;Name=Os01t0102600-02.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0102600-02.exon2;rank=2
+1	irgsp	CDS	147311	147375	.	+	2	ID=CDS:Os01t0102600-02;Parent=transcript:Os01t0102600-02;protein_id=Os01t0102600-02
+1	irgsp	CDS	147507	147575	.	+	0	ID=CDS:Os01t0102600-02;Parent=transcript:Os01t0102600-02;protein_id=Os01t0102600-02
+1	irgsp	exon	147507	147805	.	+	.	Parent=transcript:Os01t0102600-02;Name=Os01t0102600-02.exon3;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0102600-02.exon3;rank=3
+1	irgsp	three_prime_UTR	147576	147805	.	+	.	Parent=transcript:Os01t0102600-02
+###
+1	irgsp	gene	148085	150568	.	+	.	ID=gene:Os01g0102700;biotype=protein_coding;description=Translocon-associated beta family protein. (Os01t0102700-01);gene_id=Os01g0102700;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	148085	150568	.	+	.	ID=transcript:Os01t0102700-01;Parent=gene:Os01g0102700;biotype=protein_coding;transcript_id=Os01t0102700-01
+1	irgsp	five_prime_UTR	148085	148146	.	+	.	Parent=transcript:Os01t0102700-01
+1	irgsp	exon	148085	148313	.	+	.	Parent=transcript:Os01t0102700-01;Name=Os01t0102700-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0102700-01.exon1;rank=1
+1	irgsp	CDS	148147	148313	.	+	0	ID=CDS:Os01t0102700-01;Parent=transcript:Os01t0102700-01;protein_id=Os01t0102700-01
+1	irgsp	exon	149450	149548	.	+	.	Parent=transcript:Os01t0102700-01;Name=Os01t0102700-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0102700-01.exon2;rank=2
+1	irgsp	CDS	149450	149548	.	+	1	ID=CDS:Os01t0102700-01;Parent=transcript:Os01t0102700-01;protein_id=Os01t0102700-01
+1	irgsp	exon	149634	149742	.	+	.	Parent=transcript:Os01t0102700-01;Name=Os01t0102700-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0102700-01.exon3;rank=3
+1	irgsp	CDS	149634	149742	.	+	1	ID=CDS:Os01t0102700-01;Parent=transcript:Os01t0102700-01;protein_id=Os01t0102700-01
+1	irgsp	exon	149856	149931	.	+	.	Parent=transcript:Os01t0102700-01;Name=Os01t0102700-01.exon4;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0102700-01.exon4;rank=4
+1	irgsp	CDS	149856	149931	.	+	0	ID=CDS:Os01t0102700-01;Parent=transcript:Os01t0102700-01;protein_id=Os01t0102700-01
+1	irgsp	CDS	150152	150318	.	+	2	ID=CDS:Os01t0102700-01;Parent=transcript:Os01t0102700-01;protein_id=Os01t0102700-01
+1	irgsp	exon	150152	150568	.	+	.	Parent=transcript:Os01t0102700-01;Name=Os01t0102700-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0102700-01.exon5;rank=5
+1	irgsp	three_prime_UTR	150319	150568	.	+	.	Parent=transcript:Os01t0102700-01
+###
+1	irgsp	gene	152853	156449	.	+	.	ID=gene:Os01g0102800;Name=Cockayne syndrome WD-repeat protein;biotype=protein_coding;description=Similar to chromatin remodeling complex subunit. (Os01t0102800-01);gene_id=Os01g0102800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	152853	156449	.	+	.	ID=transcript:Os01t0102800-01;Parent=gene:Os01g0102800;biotype=protein_coding;transcript_id=Os01t0102800-01
+1	irgsp	five_prime_UTR	152853	152853	.	+	.	Parent=transcript:Os01t0102800-01
+1	irgsp	exon	152853	153025	.	+	.	Parent=transcript:Os01t0102800-01;Name=Os01t0102800-01.exon1;constitutive=1;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0102800-01.exon1;rank=1
+1	irgsp	CDS	152854	153025	.	+	0	ID=CDS:Os01t0102800-01;Parent=transcript:Os01t0102800-01;protein_id=Os01t0102800-01
+1	irgsp	exon	153178	154646	.	+	.	Parent=transcript:Os01t0102800-01;Name=Os01t0102800-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0102800-01.exon2;rank=2
+1	irgsp	CDS	153178	154646	.	+	2	ID=CDS:Os01t0102800-01;Parent=transcript:Os01t0102800-01;protein_id=Os01t0102800-01
+1	irgsp	exon	155010	155450	.	+	.	Parent=transcript:Os01t0102800-01;Name=Os01t0102800-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0102800-01.exon3;rank=3
+1	irgsp	CDS	155010	155450	.	+	0	ID=CDS:Os01t0102800-01;Parent=transcript:Os01t0102800-01;protein_id=Os01t0102800-01
+1	irgsp	CDS	155543	156214	.	+	0	ID=CDS:Os01t0102800-01;Parent=transcript:Os01t0102800-01;protein_id=Os01t0102800-01
+1	irgsp	exon	155543	156449	.	+	.	Parent=transcript:Os01t0102800-01;Name=Os01t0102800-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0102800-01.exon4;rank=4
+1	irgsp	three_prime_UTR	156215	156449	.	+	.	Parent=transcript:Os01t0102800-01
+###
+1	irgsp	gene	164577	168921	.	+	.	ID=gene:Os01g0102850;biotype=protein_coding;description=Similar to nitrilase 2. (Os01t0102850-00);gene_id=Os01g0102850;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	164577	168921	.	+	.	ID=transcript:Os01t0102850-00;Parent=gene:Os01g0102850;biotype=protein_coding;transcript_id=Os01t0102850-00
+1	irgsp	exon	164577	164905	.	+	.	Parent=transcript:Os01t0102850-00;Name=Os01t0102850-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0102850-00.exon1;rank=1
+1	irgsp	five_prime_UTR	164577	164905	.	+	.	Parent=transcript:Os01t0102850-00
+1	irgsp	five_prime_UTR	168499	168804	.	+	.	Parent=transcript:Os01t0102850-00
+1	irgsp	exon	168499	168921	.	+	.	Parent=transcript:Os01t0102850-00;Name=Os01t0102850-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0102850-00.exon2;rank=2
+1	irgsp	CDS	168805	168921	.	+	0	ID=CDS:Os01t0102850-00;Parent=transcript:Os01t0102850-00;protein_id=Os01t0102850-00
+###
+1	irgsp	gene	169390	170316	.	-	.	ID=gene:Os01g0102900;Name=LIGHT-REGULATED GENE 1;biotype=protein_coding;description=Light-regulated protein%2C Regulation of light-dependent attachment of LEAF-TYPE FERREDOXIN-NADP+ OXIDOREDUCTASE (LFNR) to the thylakoid membrane (Os01t0102900-01);gene_id=Os01g0102900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	169390	170316	.	-	.	ID=transcript:Os01t0102900-01;Parent=gene:Os01g0102900;biotype=protein_coding;transcript_id=Os01t0102900-01
+1	irgsp	three_prime_UTR	169390	169598	.	-	.	Parent=transcript:Os01t0102900-01
+1	irgsp	exon	169390	169656	.	-	.	Parent=transcript:Os01t0102900-01;Name=Os01t0102900-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0102900-01.exon3;rank=3
+1	irgsp	CDS	169599	169656	.	-	1	ID=CDS:Os01t0102900-01;Parent=transcript:Os01t0102900-01;protein_id=Os01t0102900-01
+1	irgsp	exon	169751	169909	.	-	.	Parent=transcript:Os01t0102900-01;Name=Os01t0102900-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=Os01t0102900-01.exon2;rank=2
+1	irgsp	CDS	169751	169909	.	-	1	ID=CDS:Os01t0102900-01;Parent=transcript:Os01t0102900-01;protein_id=Os01t0102900-01
+1	irgsp	CDS	170091	170260	.	-	0	ID=CDS:Os01t0102900-01;Parent=transcript:Os01t0102900-01;protein_id=Os01t0102900-01
+1	irgsp	exon	170091	170316	.	-	.	Parent=transcript:Os01t0102900-01;Name=Os01t0102900-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0102900-01.exon1;rank=1
+1	irgsp	five_prime_UTR	170261	170316	.	-	.	Parent=transcript:Os01t0102900-01
+###
+1	irgsp	gene	170798	173144	.	-	.	ID=gene:Os01g0103000;biotype=protein_coding;description=Snf7 family protein. (Os01t0103000-01);gene_id=Os01g0103000;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	170798	173144	.	-	.	ID=transcript:Os01t0103000-01;Parent=gene:Os01g0103000;biotype=protein_coding;transcript_id=Os01t0103000-01
+1	irgsp	three_prime_UTR	170798	171044	.	-	.	Parent=transcript:Os01t0103000-01
+1	irgsp	exon	170798	171095	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0103000-01.exon7;rank=7
+1	irgsp	CDS	171045	171095	.	-	0	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	171406	171554	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0103000-01.exon6;rank=6
+1	irgsp	CDS	171406	171554	.	-	2	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	171764	171875	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103000-01.exon5;rank=5
+1	irgsp	CDS	171764	171875	.	-	0	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	172398	172469	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0103000-01.exon4;rank=4
+1	irgsp	CDS	172398	172469	.	-	0	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	172578	172671	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0103000-01.exon3;rank=3
+1	irgsp	CDS	172578	172671	.	-	1	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	172770	172921	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0103000-01.exon2;rank=2
+1	irgsp	CDS	172770	172921	.	-	0	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	CDS	173004	173072	.	-	0	ID=CDS:Os01t0103000-01;Parent=transcript:Os01t0103000-01;protein_id=Os01t0103000-01
+1	irgsp	exon	173004	173144	.	-	.	Parent=transcript:Os01t0103000-01;Name=Os01t0103000-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0103000-01.exon1;rank=1
+1	irgsp	five_prime_UTR	173073	173144	.	-	.	Parent=transcript:Os01t0103000-01
+###
+1	irgsp	gene	178607	180575	.	+	.	ID=gene:Os01g0103100;biotype=protein_coding;description=TGF-beta receptor%2C type I/II extracellular region family protein. (Os01t0103100-01)%3BSimilar to predicted protein. (Os01t0103100-02);gene_id=Os01g0103100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	178607	180548	.	+	.	ID=transcript:Os01t0103100-01;Parent=gene:Os01g0103100;biotype=protein_coding;transcript_id=Os01t0103100-01
+1	irgsp	five_prime_UTR	178607	178641	.	+	.	Parent=transcript:Os01t0103100-01
+1	irgsp	exon	178607	180548	.	+	.	Parent=transcript:Os01t0103100-01;Name=Os01t0103100-01.exon1;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103100-01.exon1;rank=1
+1	irgsp	CDS	178642	180462	.	+	0	ID=CDS:Os01t0103100-01;Parent=transcript:Os01t0103100-01;protein_id=Os01t0103100-01
+1	irgsp	three_prime_UTR	180463	180548	.	+	.	Parent=transcript:Os01t0103100-01
+1	irgsp	mRNA	178652	180575	.	+	.	ID=transcript:Os01t0103100-02;Parent=gene:Os01g0103100;biotype=protein_coding;transcript_id=Os01t0103100-02
+1	irgsp	five_prime_UTR	178652	178677	.	+	.	Parent=transcript:Os01t0103100-02
+1	irgsp	exon	178652	180575	.	+	.	Parent=transcript:Os01t0103100-02;Name=Os01t0103100-02.exon1;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103100-02.exon1;rank=1
+1	irgsp	CDS	178678	180462	.	+	0	ID=CDS:Os01t0103100-02;Parent=transcript:Os01t0103100-02;protein_id=Os01t0103100-02
+1	irgsp	three_prime_UTR	180463	180575	.	+	.	Parent=transcript:Os01t0103100-02
+###
+1	irgsp	gene	178815	180433	.	-	.	ID=gene:Os01g0103075;biotype=protein_coding;description=Hypothetical protein. (Os01t0103075-00);gene_id=Os01g0103075;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	178815	180433	.	-	.	ID=transcript:Os01t0103075-00;Parent=gene:Os01g0103075;biotype=protein_coding;transcript_id=Os01t0103075-00
+1	irgsp	three_prime_UTR	178815	179511	.	-	.	Parent=transcript:Os01t0103075-00
+1	irgsp	exon	178815	180433	.	-	.	Parent=transcript:Os01t0103075-00;Name=Os01t0103075-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103075-00.exon1;rank=1
+1	irgsp	CDS	179512	180054	.	-	0	ID=CDS:Os01t0103075-00;Parent=transcript:Os01t0103075-00;protein_id=Os01t0103075-00
+1	irgsp	five_prime_UTR	180055	180433	.	-	.	Parent=transcript:Os01t0103075-00
+###
+1	Ensembl_Plants	ncRNA_gene	182074	182154	.	+	.	ID=gene:ENSRNA049442722;Name=tRNA-Leu;biotype=tRNA;description=tRNA-Leu for anticodon AAG;gene_id=ENSRNA049442722;logic_name=trnascan_gene
+1	Ensembl_Plants	tRNA	182074	182154	.	+	.	ID=transcript:ENSRNA049442722-T1;Parent=gene:ENSRNA049442722;biotype=tRNA;transcript_id=ENSRNA049442722-T1
+1	Ensembl_Plants	exon	182074	182154	.	+	.	Parent=transcript:ENSRNA049442722-T1;Name=ENSRNA049442722-E1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSRNA049442722-E1;rank=1
+###
+1	irgsp	gene	185189	185828	.	-	.	ID=gene:Os01g0103400;biotype=protein_coding;description=Hypothetical gene. (Os01t0103400-01);gene_id=Os01g0103400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	185189	185828	.	-	.	ID=transcript:Os01t0103400-01;Parent=gene:Os01g0103400;biotype=protein_coding;transcript_id=Os01t0103400-01
+1	irgsp	three_prime_UTR	185189	185434	.	-	.	Parent=transcript:Os01t0103400-01
+1	irgsp	exon	185189	185828	.	-	.	Parent=transcript:Os01t0103400-01;Name=Os01t0103400-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103400-01.exon1;rank=1
+1	irgsp	CDS	185435	185827	.	-	0	ID=CDS:Os01t0103400-01;Parent=transcript:Os01t0103400-01;protein_id=Os01t0103400-01
+1	irgsp	five_prime_UTR	185828	185828	.	-	.	Parent=transcript:Os01t0103400-01
+###
+1	irgsp	repeat_region	186000	186100	.	+	.	ID=fakeRepeat2
+###
+1	irgsp	gene	186250	190904	.	-	.	ID=gene:Os01g0103600;biotype=protein_coding;description=Similar to sterol-8%2C7-isomerase. (Os01t0103600-01)%3BEmopamil-binding family protein. (Os01t0103600-02);gene_id=Os01g0103600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	186250	190262	.	-	.	ID=transcript:Os01t0103600-02;Parent=gene:Os01g0103600;biotype=protein_coding;transcript_id=Os01t0103600-02
+1	irgsp	three_prime_UTR	186250	186515	.	-	.	Parent=transcript:Os01t0103600-02
+1	irgsp	exon	186250	186771	.	-	.	Parent=transcript:Os01t0103600-02;Name=Os01t0103600-02.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0103600-02.exon4;rank=4
+1	irgsp	CDS	186516	186771	.	-	1	ID=CDS:Os01t0103600-02;Parent=transcript:Os01t0103600-02;protein_id=Os01t0103600-02
+1	irgsp	exon	189607	189715	.	-	.	Parent=transcript:Os01t0103600-02;Name=Os01t0103600-02.exon3;constitutive=0;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0103600-02.exon3;rank=3
+1	irgsp	CDS	189607	189715	.	-	2	ID=CDS:Os01t0103600-02;Parent=transcript:Os01t0103600-02;protein_id=Os01t0103600-02
+1	irgsp	exon	189841	189990	.	-	.	Parent=transcript:Os01t0103600-02;Name=Os01t0103600-02.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0103600-02.exon2;rank=2
+1	irgsp	CDS	189841	189990	.	-	2	ID=CDS:Os01t0103600-02;Parent=transcript:Os01t0103600-02;protein_id=Os01t0103600-02
+1	irgsp	CDS	190087	190231	.	-	0	ID=CDS:Os01t0103600-02;Parent=transcript:Os01t0103600-02;protein_id=Os01t0103600-02
+1	irgsp	exon	190087	190262	.	-	.	Parent=transcript:Os01t0103600-02;Name=Os01t0103600-02.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0103600-02.exon1;rank=1
+1	irgsp	five_prime_UTR	190232	190262	.	-	.	Parent=transcript:Os01t0103600-02
+1	irgsp	mRNA	187345	190904	.	-	.	ID=transcript:Os01t0103600-01;Parent=gene:Os01g0103600;biotype=protein_coding;transcript_id=Os01t0103600-01
+1	irgsp	three_prime_UTR	187345	189395	.	-	.	Parent=transcript:Os01t0103600-01
+1	irgsp	exon	187345	189715	.	-	.	Parent=transcript:Os01t0103600-01;Name=Os01t0103600-01.exon3;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0103600-01.exon3;rank=3
+1	irgsp	CDS	189396	189715	.	-	2	ID=CDS:Os01t0103600-01;Parent=transcript:Os01t0103600-01;protein_id=Os01t0103600-01
+1	irgsp	exon	189841	189990	.	-	.	Parent=transcript:Os01t0103600-01;Name=Os01t0103600-02.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0103600-02.exon2;rank=2
+1	irgsp	CDS	189841	189990	.	-	2	ID=CDS:Os01t0103600-01;Parent=transcript:Os01t0103600-01;protein_id=Os01t0103600-01
+1	irgsp	CDS	190087	190231	.	-	0	ID=CDS:Os01t0103600-01;Parent=transcript:Os01t0103600-01;protein_id=Os01t0103600-01
+1	irgsp	exon	190087	190904	.	-	.	Parent=transcript:Os01t0103600-01;Name=Os01t0103600-01.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0103600-01.exon1;rank=1
+1	irgsp	five_prime_UTR	190232	190904	.	-	.	Parent=transcript:Os01t0103600-01
+###
+1	irgsp	gene	187545	188586	.	+	.	ID=gene:Os01g0103650;biotype=protein_coding;description=Hypothetical gene. (Os01t0103650-00);gene_id=Os01g0103650;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	187545	188586	.	+	.	ID=transcript:Os01t0103650-00;Parent=gene:Os01g0103650;biotype=protein_coding;transcript_id=Os01t0103650-00
+1	irgsp	five_prime_UTR	187545	187546	.	+	.	Parent=transcript:Os01t0103650-00
+1	irgsp	exon	187545	188020	.	+	.	Parent=transcript:Os01t0103650-00;Name=Os01t0103650-00.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103650-00.exon1;rank=1
+1	irgsp	CDS	187547	187768	.	+	0	ID=CDS:Os01t0103650-00;Parent=transcript:Os01t0103650-00;protein_id=Os01t0103650-00
+1	irgsp	three_prime_UTR	187769	188020	.	+	.	Parent=transcript:Os01t0103650-00
+1	irgsp	exon	188060	188385	.	+	.	Parent=transcript:Os01t0103650-00;Name=Os01t0103650-00.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103650-00.exon2;rank=2
+1	irgsp	three_prime_UTR	188060	188385	.	+	.	Parent=transcript:Os01t0103650-00
+1	irgsp	exon	188455	188586	.	+	.	Parent=transcript:Os01t0103650-00;Name=Os01t0103650-00.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103650-00.exon3;rank=3
+1	irgsp	three_prime_UTR	188455	188586	.	+	.	Parent=transcript:Os01t0103650-00
+###
+1	irgsp	gene	191037	196287	.	+	.	ID=gene:Os01g0103700;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0103700-01);gene_id=Os01g0103700;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	191037	196287	.	+	.	ID=transcript:Os01t0103700-01;Parent=gene:Os01g0103700;biotype=protein_coding;transcript_id=Os01t0103700-01
+1	irgsp	exon	191037	191161	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103700-01.exon1;rank=1
+1	irgsp	five_prime_UTR	191037	191161	.	+	.	Parent=transcript:Os01t0103700-01
+1	irgsp	five_prime_UTR	191625	191693	.	+	.	Parent=transcript:Os01t0103700-01
+1	irgsp	exon	191625	191705	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0103700-01.exon2;rank=2
+1	irgsp	CDS	191694	191705	.	+	0	ID=CDS:Os01t0103700-01;Parent=transcript:Os01t0103700-01;protein_id=Os01t0103700-01
+1	irgsp	exon	192399	192506	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0103700-01.exon3;rank=3
+1	irgsp	CDS	192399	192506	.	+	0	ID=CDS:Os01t0103700-01;Parent=transcript:Os01t0103700-01;protein_id=Os01t0103700-01
+1	irgsp	exon	192958	193161	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0103700-01.exon4;rank=4
+1	irgsp	CDS	192958	193161	.	+	0	ID=CDS:Os01t0103700-01;Parent=transcript:Os01t0103700-01;protein_id=Os01t0103700-01
+1	irgsp	exon	193248	193356	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103700-01.exon5;rank=5
+1	irgsp	CDS	193248	193356	.	+	0	ID=CDS:Os01t0103700-01;Parent=transcript:Os01t0103700-01;protein_id=Os01t0103700-01
+1	irgsp	CDS	193434	193507	.	+	2	ID=CDS:Os01t0103700-01;Parent=transcript:Os01t0103700-01;protein_id=Os01t0103700-01
+1	irgsp	exon	193434	196287	.	+	.	Parent=transcript:Os01t0103700-01;Name=Os01t0103700-01.exon6;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0103700-01.exon6;rank=6
+1	irgsp	three_prime_UTR	193508	196287	.	+	.	Parent=transcript:Os01t0103700-01
+###
+1	irgsp	gene	197647	200803	.	+	.	ID=gene:Os01g0103800;Name=OsDW1-01g;biotype=protein_coding;description=Conserved hypothetical protein. (Os01t0103800-01);gene_id=Os01g0103800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	197647	200803	.	+	.	ID=transcript:Os01t0103800-01;Parent=gene:Os01g0103800;biotype=protein_coding;transcript_id=Os01t0103800-01
+1	irgsp	exon	197647	197838	.	+	.	Parent=transcript:Os01t0103800-01;Name=Os01t0103800-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0103800-01.exon1;rank=1
+1	irgsp	five_prime_UTR	197647	197838	.	+	.	Parent=transcript:Os01t0103800-01
+1	irgsp	five_prime_UTR	198034	198129	.	+	.	Parent=transcript:Os01t0103800-01
+1	irgsp	exon	198034	198225	.	+	.	Parent=transcript:Os01t0103800-01;Name=Os01t0103800-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0103800-01.exon2;rank=2
+1	irgsp	CDS	198130	198225	.	+	0	ID=CDS:Os01t0103800-01;Parent=transcript:Os01t0103800-01;protein_id=Os01t0103800-01
+1	irgsp	exon	198830	200036	.	+	.	Parent=transcript:Os01t0103800-01;Name=Os01t0103800-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103800-01.exon3;rank=3
+1	irgsp	CDS	198830	200036	.	+	0	ID=CDS:Os01t0103800-01;Parent=transcript:Os01t0103800-01;protein_id=Os01t0103800-01
+1	irgsp	CDS	200253	200479	.	+	2	ID=CDS:Os01t0103800-01;Parent=transcript:Os01t0103800-01;protein_id=Os01t0103800-01
+1	irgsp	exon	200253	200803	.	+	.	Parent=transcript:Os01t0103800-01;Name=Os01t0103800-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0103800-01.exon4;rank=4
+1	irgsp	three_prime_UTR	200480	200803	.	+	.	Parent=transcript:Os01t0103800-01
+###
+1	irgsp	gene	201944	206202	.	+	.	ID=gene:Os01g0103900;biotype=protein_coding;description=Polynucleotidyl transferase%2C Ribonuclease H fold domain containing protein. (Os01t0103900-01);gene_id=Os01g0103900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	201944	206202	.	+	.	ID=transcript:Os01t0103900-01;Parent=gene:Os01g0103900;biotype=protein_coding;transcript_id=Os01t0103900-01
+1	irgsp	five_prime_UTR	201944	202041	.	+	.	Parent=transcript:Os01t0103900-01
+1	irgsp	exon	201944	202110	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0103900-01.exon1;rank=1
+1	irgsp	CDS	202042	202110	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	202252	202359	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0103900-01.exon2;rank=2
+1	irgsp	CDS	202252	202359	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	203007	203127	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103900-01.exon3;rank=3
+1	irgsp	CDS	203007	203127	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	203302	203429	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0103900-01.exon4;rank=4
+1	irgsp	CDS	203302	203429	.	+	2	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	203511	203658	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon5;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103900-01.exon5;rank=5
+1	irgsp	CDS	203511	203658	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	203760	203938	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0103900-01.exon6;rank=6
+1	irgsp	CDS	203760	203938	.	+	2	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	204203	204440	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon7;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0103900-01.exon7;rank=7
+1	irgsp	CDS	204203	204440	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	204543	204635	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon8;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0103900-01.exon8;rank=8
+1	irgsp	CDS	204543	204635	.	+	2	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	204730	204875	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0103900-01.exon9;rank=9
+1	irgsp	CDS	204730	204875	.	+	2	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	205042	205149	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0103900-01.exon10;rank=10
+1	irgsp	CDS	205042	205149	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	205290	205378	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon11;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0103900-01.exon11;rank=11
+1	irgsp	CDS	205290	205378	.	+	0	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	CDS	205534	205543	.	+	1	ID=CDS:Os01t0103900-01;Parent=transcript:Os01t0103900-01;protein_id=Os01t0103900-01
+1	irgsp	exon	205534	206202	.	+	.	Parent=transcript:Os01t0103900-01;Name=Os01t0103900-01.exon12;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0103900-01.exon12;rank=12
+1	irgsp	three_prime_UTR	205544	206202	.	+	.	Parent=transcript:Os01t0103900-01
+###
+1	irgsp	gene	206131	209606	.	-	.	ID=gene:Os01g0104000;biotype=protein_coding;description=C-type lectin domain containing protein. (Os01t0104000-01)%3BSimilar to predicted protein. (Os01t0104000-02);gene_id=Os01g0104000;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	206131	209581	.	-	.	ID=transcript:Os01t0104000-02;Parent=gene:Os01g0104000;biotype=protein_coding;transcript_id=Os01t0104000-02
+1	irgsp	three_prime_UTR	206131	206449	.	-	.	Parent=transcript:Os01t0104000-02
+1	irgsp	exon	206131	207029	.	-	.	Parent=transcript:Os01t0104000-02;Name=Os01t0104000-02.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0104000-02.exon4;rank=4
+1	irgsp	CDS	206450	207029	.	-	1	ID=CDS:Os01t0104000-02;Parent=transcript:Os01t0104000-02;protein_id=Os01t0104000-02
+1	irgsp	exon	207706	208273	.	-	.	Parent=transcript:Os01t0104000-02;Name=Os01t0104000-02.exon3;constitutive=0;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0104000-02.exon3;rank=3
+1	irgsp	CDS	207706	208273	.	-	2	ID=CDS:Os01t0104000-02;Parent=transcript:Os01t0104000-02;protein_id=Os01t0104000-02
+1	irgsp	exon	208408	208836	.	-	.	Parent=transcript:Os01t0104000-02;Name=Os01t0104000-01.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0104000-01.exon2;rank=2
+1	irgsp	CDS	208408	208836	.	-	2	ID=CDS:Os01t0104000-02;Parent=transcript:Os01t0104000-02;protein_id=Os01t0104000-02
+1	irgsp	CDS	209438	209525	.	-	0	ID=CDS:Os01t0104000-02;Parent=transcript:Os01t0104000-02;protein_id=Os01t0104000-02
+1	irgsp	exon	209438	209581	.	-	.	Parent=transcript:Os01t0104000-02;Name=Os01t0104000-02.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0104000-02.exon1;rank=1
+1	irgsp	five_prime_UTR	209526	209581	.	-	.	Parent=transcript:Os01t0104000-02
+1	irgsp	mRNA	206134	209606	.	-	.	ID=transcript:Os01t0104000-01;Parent=gene:Os01g0104000;biotype=protein_coding;transcript_id=Os01t0104000-01
+1	irgsp	three_prime_UTR	206134	206449	.	-	.	Parent=transcript:Os01t0104000-01
+1	irgsp	exon	206134	207029	.	-	.	Parent=transcript:Os01t0104000-01;Name=Os01t0104000-01.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0104000-01.exon4;rank=4
+1	irgsp	CDS	206450	207029	.	-	1	ID=CDS:Os01t0104000-01;Parent=transcript:Os01t0104000-01;protein_id=Os01t0104000-01
+1	irgsp	exon	207706	208276	.	-	.	Parent=transcript:Os01t0104000-01;Name=Os01t0104000-01.exon3;constitutive=0;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0104000-01.exon3;rank=3
+1	irgsp	CDS	207706	208276	.	-	2	ID=CDS:Os01t0104000-01;Parent=transcript:Os01t0104000-01;protein_id=Os01t0104000-01
+1	irgsp	exon	208408	208836	.	-	.	Parent=transcript:Os01t0104000-01;Name=Os01t0104000-01.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0104000-01.exon2;rank=2
+1	irgsp	CDS	208408	208836	.	-	2	ID=CDS:Os01t0104000-01;Parent=transcript:Os01t0104000-01;protein_id=Os01t0104000-01
+1	irgsp	CDS	209438	209525	.	-	0	ID=CDS:Os01t0104000-01;Parent=transcript:Os01t0104000-01;protein_id=Os01t0104000-01
+1	irgsp	exon	209438	209606	.	-	.	Parent=transcript:Os01t0104000-01;Name=Os01t0104000-01.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0104000-01.exon1;rank=1
+1	irgsp	five_prime_UTR	209526	209606	.	-	.	Parent=transcript:Os01t0104000-01
+###
+1	irgsp	gene	209771	214173	.	+	.	ID=gene:Os01g0104100;Name=cold-inducible%2C cold-inducible zinc finger protein;biotype=protein_coding;description=Similar to protein binding / zinc ion binding. (Os01t0104100-01)%3BSimilar to protein binding / zinc ion binding. (Os01t0104100-02);gene_id=Os01g0104100;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	209771	214173	.	+	.	ID=transcript:Os01t0104100-01;Parent=gene:Os01g0104100;biotype=protein_coding;transcript_id=Os01t0104100-01
+1	irgsp	exon	209771	209896	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104100-01.exon1;rank=1
+1	irgsp	CDS	209771	209896	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	210244	210563	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104100-01.exon2;rank=2
+1	irgsp	CDS	210244	210563	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	210659	210890	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104100-01.exon3;rank=3
+1	irgsp	CDS	210659	210890	.	+	1	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	211015	211160	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104100-01.exon4;rank=4
+1	irgsp	CDS	211015	211160	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	212265	212352	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104100-01.exon5;rank=5
+1	irgsp	CDS	212265	212352	.	+	1	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	212433	212579	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104100-01.exon6;rank=6
+1	irgsp	CDS	212433	212579	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	213490	213639	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104100-01.exon7;rank=7
+1	irgsp	CDS	213490	213639	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	CDS	213741	213788	.	+	0	ID=CDS:Os01t0104100-01;Parent=transcript:Os01t0104100-01;protein_id=Os01t0104100-01
+1	irgsp	exon	213741	214173	.	+	.	Parent=transcript:Os01t0104100-01;Name=Os01t0104100-01.exon8;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0104100-01.exon8;rank=8
+1	irgsp	three_prime_UTR	213789	214173	.	+	.	Parent=transcript:Os01t0104100-01
+1	irgsp	mRNA	209794	214147	.	+	.	ID=transcript:Os01t0104100-02;Parent=gene:Os01g0104100;biotype=protein_coding;transcript_id=Os01t0104100-02
+1	irgsp	five_prime_UTR	209794	209794	.	+	.	Parent=transcript:Os01t0104100-02
+1	irgsp	exon	209794	209896	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-02.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104100-02.exon1;rank=1
+1	irgsp	CDS	209795	209896	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	210244	210563	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104100-01.exon2;rank=2
+1	irgsp	CDS	210244	210563	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	210659	210890	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104100-01.exon3;rank=3
+1	irgsp	CDS	210659	210890	.	+	1	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	211015	211160	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon4;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104100-01.exon4;rank=4
+1	irgsp	CDS	211015	211160	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	212265	212352	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104100-01.exon5;rank=5
+1	irgsp	CDS	212265	212352	.	+	1	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	212433	212579	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104100-01.exon6;rank=6
+1	irgsp	CDS	212433	212579	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	213490	213639	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104100-01.exon7;rank=7
+1	irgsp	CDS	213490	213639	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	CDS	213741	213788	.	+	0	ID=CDS:Os01t0104100-02;Parent=transcript:Os01t0104100-02;protein_id=Os01t0104100-02
+1	irgsp	exon	213741	214147	.	+	.	Parent=transcript:Os01t0104100-02;Name=Os01t0104100-02.exon8;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0104100-02.exon8;rank=8
+1	irgsp	three_prime_UTR	213789	214147	.	+	.	Parent=transcript:Os01t0104100-02
+###
+1	irgsp	gene	216212	217345	.	+	.	ID=gene:Os01g0104200;Name=NAC DOMAIN-CONTAINING PROTEIN 16;biotype=protein_coding;description=No apical meristem (NAM) protein domain containing protein. (Os01t0104200-00);gene_id=Os01g0104200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	216212	217345	.	+	.	ID=transcript:Os01t0104200-00;Parent=gene:Os01g0104200;biotype=protein_coding;transcript_id=Os01t0104200-00
+1	irgsp	exon	216212	216769	.	+	.	Parent=transcript:Os01t0104200-00;Name=Os01t0104200-00.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104200-00.exon1;rank=1
+1	irgsp	CDS	216212	216769	.	+	0	ID=CDS:Os01t0104200-00;Parent=transcript:Os01t0104200-00;protein_id=Os01t0104200-00
+1	irgsp	exon	216884	217345	.	+	.	Parent=transcript:Os01t0104200-00;Name=Os01t0104200-00.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104200-00.exon2;rank=2
+1	irgsp	CDS	216884	217345	.	+	0	ID=CDS:Os01t0104200-00;Parent=transcript:Os01t0104200-00;protein_id=Os01t0104200-00
+###
+1	irgsp	gene	226897	229301	.	+	.	ID=gene:Os01g0104400;biotype=protein_coding;description=Ricin B-related lectin domain containing protein. (Os01t0104400-01)%3BRicin B-related lectin domain containing protein. (Os01t0104400-02)%3BRicin B-related lectin domain containing protein. (Os01t0104400-03);gene_id=Os01g0104400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	226897	229229	.	+	.	ID=transcript:Os01t0104400-01;Parent=gene:Os01g0104400;biotype=protein_coding;transcript_id=Os01t0104400-01
+1	irgsp	five_prime_UTR	226897	227181	.	+	.	Parent=transcript:Os01t0104400-01
+1	irgsp	exon	226897	227634	.	+	.	Parent=transcript:Os01t0104400-01;Name=Os01t0104400-01.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104400-01.exon1;rank=1
+1	irgsp	CDS	227182	227634	.	+	0	ID=CDS:Os01t0104400-01;Parent=transcript:Os01t0104400-01;protein_id=Os01t0104400-01
+1	irgsp	exon	227742	227864	.	+	.	Parent=transcript:Os01t0104400-01;Name=Os01t0104400-03.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104400-03.exon2;rank=2
+1	irgsp	CDS	227742	227864	.	+	0	ID=CDS:Os01t0104400-01;Parent=transcript:Os01t0104400-01;protein_id=Os01t0104400-01
+1	irgsp	exon	228557	228785	.	+	.	Parent=transcript:Os01t0104400-01;Name=Os01t0104400-03.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104400-03.exon3;rank=3
+1	irgsp	CDS	228557	228785	.	+	0	ID=CDS:Os01t0104400-01;Parent=transcript:Os01t0104400-01;protein_id=Os01t0104400-01
+1	irgsp	CDS	228930	228931	.	+	2	ID=CDS:Os01t0104400-01;Parent=transcript:Os01t0104400-01;protein_id=Os01t0104400-01
+1	irgsp	exon	228930	229229	.	+	.	Parent=transcript:Os01t0104400-01;Name=Os01t0104400-01.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104400-01.exon4;rank=4
+1	irgsp	three_prime_UTR	228932	229229	.	+	.	Parent=transcript:Os01t0104400-01
+1	irgsp	mRNA	227139	229301	.	+	.	ID=transcript:Os01t0104400-02;Parent=gene:Os01g0104400;biotype=protein_coding;transcript_id=Os01t0104400-02
+1	irgsp	five_prime_UTR	227139	227181	.	+	.	Parent=transcript:Os01t0104400-02
+1	irgsp	exon	227139	227634	.	+	.	Parent=transcript:Os01t0104400-02;Name=Os01t0104400-02.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104400-02.exon1;rank=1
+1	irgsp	CDS	227182	227634	.	+	0	ID=CDS:Os01t0104400-02;Parent=transcript:Os01t0104400-02;protein_id=Os01t0104400-02
+1	irgsp	exon	227742	227864	.	+	.	Parent=transcript:Os01t0104400-02;Name=Os01t0104400-03.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104400-03.exon2;rank=2
+1	irgsp	CDS	227742	227864	.	+	0	ID=CDS:Os01t0104400-02;Parent=transcript:Os01t0104400-02;protein_id=Os01t0104400-02
+1	irgsp	exon	228557	228785	.	+	.	Parent=transcript:Os01t0104400-02;Name=Os01t0104400-03.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104400-03.exon3;rank=3
+1	irgsp	CDS	228557	228785	.	+	0	ID=CDS:Os01t0104400-02;Parent=transcript:Os01t0104400-02;protein_id=Os01t0104400-02
+1	irgsp	CDS	228930	228931	.	+	2	ID=CDS:Os01t0104400-02;Parent=transcript:Os01t0104400-02;protein_id=Os01t0104400-02
+1	irgsp	exon	228930	229301	.	+	.	Parent=transcript:Os01t0104400-02;Name=Os01t0104400-02.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104400-02.exon4;rank=4
+1	irgsp	three_prime_UTR	228932	229301	.	+	.	Parent=transcript:Os01t0104400-02
+1	irgsp	mRNA	227179	229214	.	+	.	ID=transcript:Os01t0104400-03;Parent=gene:Os01g0104400;biotype=protein_coding;transcript_id=Os01t0104400-03
+1	irgsp	five_prime_UTR	227179	227181	.	+	.	Parent=transcript:Os01t0104400-03
+1	irgsp	exon	227179	227634	.	+	.	Parent=transcript:Os01t0104400-03;Name=Os01t0104400-03.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104400-03.exon1;rank=1
+1	irgsp	CDS	227182	227634	.	+	0	ID=CDS:Os01t0104400-03;Parent=transcript:Os01t0104400-03;protein_id=Os01t0104400-03
+1	irgsp	exon	227742	227864	.	+	.	Parent=transcript:Os01t0104400-03;Name=Os01t0104400-03.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104400-03.exon2;rank=2
+1	irgsp	CDS	227742	227864	.	+	0	ID=CDS:Os01t0104400-03;Parent=transcript:Os01t0104400-03;protein_id=Os01t0104400-03
+1	irgsp	exon	228557	228785	.	+	.	Parent=transcript:Os01t0104400-03;Name=Os01t0104400-03.exon3;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104400-03.exon3;rank=3
+1	irgsp	CDS	228557	228785	.	+	0	ID=CDS:Os01t0104400-03;Parent=transcript:Os01t0104400-03;protein_id=Os01t0104400-03
+1	irgsp	CDS	228930	228931	.	+	2	ID=CDS:Os01t0104400-03;Parent=transcript:Os01t0104400-03;protein_id=Os01t0104400-03
+1	irgsp	exon	228930	229214	.	+	.	Parent=transcript:Os01t0104400-03;Name=Os01t0104400-03.exon4;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104400-03.exon4;rank=4
+1	irgsp	three_prime_UTR	228932	229214	.	+	.	Parent=transcript:Os01t0104400-03
+###
+1	irgsp	gene	241680	243440	.	+	.	ID=gene:Os01g0104500;Name=NAC DOMAIN-CONTAINING PROTEIN 20;biotype=protein_coding;description=No apical meristem (NAM) protein domain containing protein. (Os01t0104500-01);gene_id=Os01g0104500;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	241680	243440	.	+	.	ID=transcript:Os01t0104500-01;Parent=gene:Os01g0104500;biotype=protein_coding;transcript_id=Os01t0104500-01
+1	irgsp	exon	241680	241702	.	+	.	Parent=transcript:Os01t0104500-01;Name=Os01t0104500-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0104500-01.exon1;rank=1
+1	irgsp	five_prime_UTR	241680	241702	.	+	.	Parent=transcript:Os01t0104500-01
+1	irgsp	five_prime_UTR	241866	241907	.	+	.	Parent=transcript:Os01t0104500-01
+1	irgsp	exon	241866	242091	.	+	.	Parent=transcript:Os01t0104500-01;Name=Os01t0104500-01.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0104500-01.exon2;rank=2
+1	irgsp	CDS	241908	242091	.	+	0	ID=CDS:Os01t0104500-01;Parent=transcript:Os01t0104500-01;protein_id=Os01t0104500-01
+1	irgsp	CDS	242199	242977	.	+	2	ID=CDS:Os01t0104500-01;Parent=transcript:Os01t0104500-01;protein_id=Os01t0104500-01
+1	irgsp	exon	242199	243440	.	+	.	Parent=transcript:Os01t0104500-01;Name=Os01t0104500-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104500-01.exon3;rank=3
+1	irgsp	three_prime_UTR	242978	243440	.	+	.	Parent=transcript:Os01t0104500-01
+###
+1	irgsp	gene	248828	256872	.	-	.	ID=gene:Os01g0104600;Name=DE-ETIOLATED1;biotype=protein_coding;description=Homolog of Arabidopsis DE-ETIOLATED1 (DET1)%2C Modulation of the ABA signaling pathway and ABA biosynthesis%2C Regulation of chlorophyll content (Os01t0104600-01)%3BSimilar to Light-mediated development protein DET1 (Deetiolated1 homolog) (tDET1) (High pigmentation protein 2) (Protein dark green). (Os01t0104600-02);gene_id=Os01g0104600;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	248828	256571	.	-	.	ID=transcript:Os01t0104600-02;Parent=gene:Os01g0104600;biotype=protein_coding;transcript_id=Os01t0104600-02
+1	irgsp	three_prime_UTR	248828	248970	.	-	.	Parent=transcript:Os01t0104600-02
+1	irgsp	exon	248828	249107	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104600-01.exon11;rank=11
+1	irgsp	CDS	248971	249107	.	-	2	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	249369	249468	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon10;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104600-01.exon10;rank=10
+1	irgsp	CDS	249369	249468	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	249861	249956	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon9;rank=9
+1	irgsp	CDS	249861	249956	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	250617	250781	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon8;rank=8
+1	irgsp	CDS	250617	250781	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	250860	250940	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon7;rank=7
+1	irgsp	CDS	250860	250940	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	251026	251082	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon6;rank=6
+1	irgsp	CDS	251026	251082	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	251316	251384	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon5;rank=5
+1	irgsp	CDS	251316	251384	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	251695	251790	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon4;rank=4
+1	irgsp	CDS	251695	251790	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	255325	255553	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104600-01.exon3;rank=3
+1	irgsp	CDS	255325	255553	.	-	1	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	255674	256098	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104600-01.exon2;rank=2
+1	irgsp	CDS	255674	256098	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	CDS	256361	256441	.	-	0	ID=CDS:Os01t0104600-02;Parent=transcript:Os01t0104600-02;protein_id=Os01t0104600-02
+1	irgsp	exon	256361	256571	.	-	.	Parent=transcript:Os01t0104600-02;Name=Os01t0104600-02.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104600-02.exon1;rank=1
+1	irgsp	five_prime_UTR	256442	256571	.	-	.	Parent=transcript:Os01t0104600-02
+1	irgsp	mRNA	248828	256872	.	-	.	ID=transcript:Os01t0104600-01;Parent=gene:Os01g0104600;biotype=protein_coding;transcript_id=Os01t0104600-01
+1	irgsp	three_prime_UTR	248828	248970	.	-	.	Parent=transcript:Os01t0104600-01
+1	irgsp	exon	248828	249107	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon11;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0104600-01.exon11;rank=11
+1	irgsp	CDS	248971	249107	.	-	2	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	249369	249468	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon10;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104600-01.exon10;rank=10
+1	irgsp	CDS	249369	249468	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	249861	249956	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon9;rank=9
+1	irgsp	CDS	249861	249956	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	250617	250781	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon8;rank=8
+1	irgsp	CDS	250617	250781	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	250860	250940	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon7;rank=7
+1	irgsp	CDS	250860	250940	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	251026	251082	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon6;rank=6
+1	irgsp	CDS	251026	251082	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	251316	251384	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon5;rank=5
+1	irgsp	CDS	251316	251384	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	251695	251790	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104600-01.exon4;rank=4
+1	irgsp	CDS	251695	251790	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	255325	255553	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104600-01.exon3;rank=3
+1	irgsp	CDS	255325	255553	.	-	1	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	255674	256098	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104600-01.exon2;rank=2
+1	irgsp	CDS	255674	256098	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	CDS	256361	256441	.	-	0	ID=CDS:Os01t0104600-01;Parent=transcript:Os01t0104600-01;protein_id=Os01t0104600-01
+1	irgsp	exon	256361	256872	.	-	.	Parent=transcript:Os01t0104600-01;Name=Os01t0104600-01.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104600-01.exon1;rank=1
+1	irgsp	five_prime_UTR	256442	256872	.	-	.	Parent=transcript:Os01t0104600-01
+###
+1	irgsp	gene	261530	268145	.	+	.	ID=gene:Os01g0104800;biotype=protein_coding;description=Sas10/Utp3 family protein. (Os01t0104800-01)%3BHypothetical conserved gene. (Os01t0104800-02);gene_id=Os01g0104800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	261530	268145	.	+	.	ID=transcript:Os01t0104800-01;Parent=gene:Os01g0104800;biotype=protein_coding;transcript_id=Os01t0104800-01
+1	irgsp	five_prime_UTR	261530	261561	.	+	.	Parent=transcript:Os01t0104800-01
+1	irgsp	exon	261530	261661	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon1;constitutive=0;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0104800-01.exon1;rank=1
+1	irgsp	CDS	261562	261661	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	261767	261805	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon2;constitutive=0;ensembl_end_phase=1;ensembl_phase=1;exon_id=Os01t0104800-01.exon2;rank=2
+1	irgsp	CDS	261767	261805	.	+	2	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	261895	261941	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon3;constitutive=0;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0104800-01.exon3;rank=3
+1	irgsp	CDS	261895	261941	.	+	2	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	262582	262681	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon4;constitutive=0;ensembl_end_phase=1;ensembl_phase=0;exon_id=Os01t0104800-01.exon4;rank=4
+1	irgsp	CDS	262582	262681	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	262925	263181	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon5;constitutive=0;ensembl_end_phase=0;ensembl_phase=1;exon_id=Os01t0104800-01.exon5;rank=5
+1	irgsp	CDS	262925	263181	.	+	2	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	263525	263640	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon6;constitutive=0;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104800-01.exon6;rank=6
+1	irgsp	CDS	263525	263640	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	264014	264098	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon7;rank=7
+1	irgsp	CDS	264014	264098	.	+	1	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	265236	265415	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon8;rank=8
+1	irgsp	CDS	265236	265415	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	265506	265649	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon9;rank=9
+1	irgsp	CDS	265506	265649	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	265740	265817	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon10;rank=10
+1	irgsp	CDS	265740	265817	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	265909	266045	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon11;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104800-01.exon11;rank=11
+1	irgsp	CDS	265909	266045	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	266138	266246	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon12;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon12;rank=12
+1	irgsp	CDS	266138	266246	.	+	1	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	267237	267514	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon13;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104800-01.exon13;rank=13
+1	irgsp	CDS	267237	267514	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	267591	267657	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon14;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon14;rank=14
+1	irgsp	CDS	267591	267657	.	+	1	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	267734	267802	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon15;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon15;rank=15
+1	irgsp	CDS	267734	267802	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	CDS	267880	268011	.	+	0	ID=CDS:Os01t0104800-01;Parent=transcript:Os01t0104800-01;protein_id=Os01t0104800-01
+1	irgsp	exon	267880	268145	.	+	.	Parent=transcript:Os01t0104800-01;Name=Os01t0104800-01.exon16;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0104800-01.exon16;rank=16
+1	irgsp	three_prime_UTR	268012	268145	.	+	.	Parent=transcript:Os01t0104800-01
+1	irgsp	mRNA	263523	268120	.	+	.	ID=transcript:Os01t0104800-02;Parent=gene:Os01g0104800;biotype=protein_coding;transcript_id=Os01t0104800-02
+1	irgsp	five_prime_UTR	263523	263524	.	+	.	Parent=transcript:Os01t0104800-02
+1	irgsp	exon	263523	263640	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-02.exon1;constitutive=0;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0104800-02.exon1;rank=1
+1	irgsp	CDS	263525	263640	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	264014	264098	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon7;rank=2
+1	irgsp	CDS	264014	264098	.	+	1	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	265236	265415	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon8;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon8;rank=3
+1	irgsp	CDS	265236	265415	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	265506	265649	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon9;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon9;rank=4
+1	irgsp	CDS	265506	265649	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	265740	265817	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon10;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon10;rank=5
+1	irgsp	CDS	265740	265817	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	265909	266045	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon11;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104800-01.exon11;rank=6
+1	irgsp	CDS	265909	266045	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	266138	266246	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon12;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon12;rank=7
+1	irgsp	CDS	266138	266246	.	+	1	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	267237	267514	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon13;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0104800-01.exon13;rank=8
+1	irgsp	CDS	267237	267514	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	267591	267657	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon14;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0104800-01.exon14;rank=9
+1	irgsp	CDS	267591	267657	.	+	1	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	267734	267802	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-01.exon15;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0104800-01.exon15;rank=10
+1	irgsp	CDS	267734	267802	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	CDS	267880	268011	.	+	0	ID=CDS:Os01t0104800-02;Parent=transcript:Os01t0104800-02;protein_id=Os01t0104800-02
+1	irgsp	exon	267880	268120	.	+	.	Parent=transcript:Os01t0104800-02;Name=Os01t0104800-02.exon11;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0104800-02.exon11;rank=11
+1	irgsp	three_prime_UTR	268012	268120	.	+	.	Parent=transcript:Os01t0104800-02
+###
+1	irgsp	gene	270179	275084	.	-	.	ID=gene:Os01g0104900;biotype=protein_coding;description=Transferase family protein. (Os01t0104900-01)%3BHypothetical conserved gene. (Os01t0104900-02);gene_id=Os01g0104900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	270179	275084	.	-	.	ID=transcript:Os01t0104900-01;Parent=gene:Os01g0104900;biotype=protein_coding;transcript_id=Os01t0104900-01
+1	irgsp	three_prime_UTR	270179	270355	.	-	.	Parent=transcript:Os01t0104900-01
+1	irgsp	exon	270179	271333	.	-	.	Parent=transcript:Os01t0104900-01;Name=Os01t0104900-01.exon2;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0104900-01.exon2;rank=2
+1	irgsp	CDS	270356	271333	.	-	0	ID=CDS:Os01t0104900-01;Parent=transcript:Os01t0104900-01;protein_id=Os01t0104900-01
+1	irgsp	CDS	274529	274957	.	-	0	ID=CDS:Os01t0104900-01;Parent=transcript:Os01t0104900-01;protein_id=Os01t0104900-01
+1	irgsp	exon	274529	275084	.	-	.	Parent=transcript:Os01t0104900-01;Name=Os01t0104900-01.exon1;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0104900-01.exon1;rank=1
+1	irgsp	five_prime_UTR	274958	275084	.	-	.	Parent=transcript:Os01t0104900-01
+1	irgsp	mRNA	270250	271518	.	-	.	ID=transcript:Os01t0104900-02;Parent=gene:Os01g0104900;biotype=protein_coding;transcript_id=Os01t0104900-02
+1	irgsp	three_prime_UTR	270250	270355	.	-	.	Parent=transcript:Os01t0104900-02
+1	irgsp	exon	270250	271333	.	-	.	Parent=transcript:Os01t0104900-02;Name=Os01t0104900-02.exon2;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0104900-02.exon2;rank=2
+1	irgsp	CDS	270356	271309	.	-	0	ID=CDS:Os01t0104900-02;Parent=transcript:Os01t0104900-02;protein_id=Os01t0104900-02
+1	irgsp	five_prime_UTR	271310	271333	.	-	.	Parent=transcript:Os01t0104900-02
+1	irgsp	exon	271457	271518	.	-	.	Parent=transcript:Os01t0104900-02;Name=Os01t0104900-02.exon1;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0104900-02.exon1;rank=1
+1	irgsp	five_prime_UTR	271457	271518	.	-	.	Parent=transcript:Os01t0104900-02
+###
+1	irgsp	gene	284762	291892	.	-	.	ID=gene:Os01g0105300;biotype=protein_coding;description=Similar to HAT family dimerisation domain containing protein%2C expressed. (Os01t0105300-01);gene_id=Os01g0105300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	284762	291892	.	-	.	ID=transcript:Os01t0105300-01;Parent=gene:Os01g0105300;biotype=protein_coding;transcript_id=Os01t0105300-01
+1	irgsp	three_prime_UTR	284762	284930	.	-	.	Parent=transcript:Os01t0105300-01
+1	irgsp	exon	284762	287047	.	-	.	Parent=transcript:Os01t0105300-01;Name=Os01t0105300-01.exon5;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105300-01.exon5;rank=5
+1	irgsp	CDS	284931	285020	.	-	0	ID=CDS:Os01t0105300-01;Parent=transcript:Os01t0105300-01;protein_id=Os01t0105300-01
+1	irgsp	five_prime_UTR	285021	287047	.	-	.	Parent=transcript:Os01t0105300-01
+1	irgsp	exon	291398	291436	.	-	.	Parent=transcript:Os01t0105300-01;Name=Os01t0105300-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105300-01.exon4;rank=4
+1	irgsp	five_prime_UTR	291398	291436	.	-	.	Parent=transcript:Os01t0105300-01
+1	irgsp	exon	291520	291534	.	-	.	Parent=transcript:Os01t0105300-01;Name=Os01t0105300-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105300-01.exon3;rank=3
+1	irgsp	five_prime_UTR	291520	291534	.	-	.	Parent=transcript:Os01t0105300-01
+1	irgsp	exon	291678	291738	.	-	.	Parent=transcript:Os01t0105300-01;Name=Os01t0105300-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105300-01.exon2;rank=2
+1	irgsp	five_prime_UTR	291678	291738	.	-	.	Parent=transcript:Os01t0105300-01
+1	irgsp	exon	291838	291892	.	-	.	Parent=transcript:Os01t0105300-01;Name=Os01t0105300-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105300-01.exon1;rank=1
+1	irgsp	five_prime_UTR	291838	291892	.	-	.	Parent=transcript:Os01t0105300-01
+###
+1	irgsp	gene	288372	292296	.	+	.	ID=gene:Os01g0105400;biotype=protein_coding;description=Similar to Kinesin heavy chain. (Os01t0105400-01);gene_id=Os01g0105400;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	288372	292296	.	+	.	ID=transcript:Os01t0105400-01;Parent=gene:Os01g0105400;biotype=protein_coding;transcript_id=Os01t0105400-01
+1	irgsp	exon	288372	288846	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon1;rank=1
+1	irgsp	five_prime_UTR	288372	288846	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	288950	289116	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon2;rank=2
+1	irgsp	five_prime_UTR	288950	289116	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	289202	289572	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon3;rank=3
+1	irgsp	five_prime_UTR	289202	289572	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	289661	289830	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon4;rank=4
+1	irgsp	five_prime_UTR	289661	289830	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	five_prime_UTR	290395	290432	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	290395	290512	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon5;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0105400-01.exon5;rank=5
+1	irgsp	CDS	290433	290512	.	+	0	ID=CDS:Os01t0105400-01;Parent=transcript:Os01t0105400-01;protein_id=Os01t0105400-01
+1	irgsp	CDS	291372	291558	.	+	1	ID=CDS:Os01t0105400-01;Parent=transcript:Os01t0105400-01;protein_id=Os01t0105400-01
+1	irgsp	exon	291372	291574	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon6;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0105400-01.exon6;rank=6
+1	irgsp	three_prime_UTR	291559	291574	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	291648	291779	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon7;rank=7
+1	irgsp	three_prime_UTR	291648	291779	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	291859	291948	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon8;rank=8
+1	irgsp	three_prime_UTR	291859	291948	.	+	.	Parent=transcript:Os01t0105400-01
+1	irgsp	exon	292073	292296	.	+	.	Parent=transcript:Os01t0105400-01;Name=Os01t0105400-01.exon9;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105400-01.exon9;rank=9
+1	irgsp	three_prime_UTR	292073	292296	.	+	.	Parent=transcript:Os01t0105400-01
+###
+1	irgsp	gene	303233	306736	.	+	.	ID=gene:Os01g0105700;Name=basic helix-loop-helix protein 071;biotype=protein_coding;description=Basic helix-loop-helix dimerisation region bHLH domain containing protein. (Os01t0105700-01);gene_id=Os01g0105700;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	303233	306736	.	+	.	ID=transcript:Os01t0105700-01;Parent=gene:Os01g0105700;biotype=protein_coding;transcript_id=Os01t0105700-01
+1	irgsp	five_prime_UTR	303233	303328	.	+	.	Parent=transcript:Os01t0105700-01
+1	irgsp	exon	303233	303471	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0105700-01.exon1;rank=1
+1	irgsp	CDS	303329	303471	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	303981	304509	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0105700-01.exon2;rank=2
+1	irgsp	CDS	303981	304509	.	+	1	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	305572	305718	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105700-01.exon3;rank=3
+1	irgsp	CDS	305572	305718	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	305834	305899	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105700-01.exon4;rank=4
+1	irgsp	CDS	305834	305899	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	305993	306058	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105700-01.exon5;rank=5
+1	irgsp	CDS	305993	306058	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	306171	306245	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon6;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105700-01.exon6;rank=6
+1	irgsp	CDS	306171	306245	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	CDS	306353	306493	.	+	0	ID=CDS:Os01t0105700-01;Parent=transcript:Os01t0105700-01;protein_id=Os01t0105700-01
+1	irgsp	exon	306353	306736	.	+	.	Parent=transcript:Os01t0105700-01;Name=Os01t0105700-01.exon7;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0105700-01.exon7;rank=7
+1	irgsp	three_prime_UTR	306494	306736	.	+	.	Parent=transcript:Os01t0105700-01
+###
+1	irgsp	gene	306871	308842	.	-	.	ID=gene:Os01g0105800;Name=IRON-SULFUR CLUSTER PROTEIN 9;biotype=protein_coding;description=Similar to Iron sulfur assembly protein 1. (Os01t0105800-01);gene_id=Os01g0105800;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	306871	308842	.	-	.	ID=transcript:Os01t0105800-01;Parent=gene:Os01g0105800;biotype=protein_coding;transcript_id=Os01t0105800-01
+1	irgsp	three_prime_UTR	306871	307123	.	-	.	Parent=transcript:Os01t0105800-01
+1	irgsp	exon	306871	307217	.	-	.	Parent=transcript:Os01t0105800-01;Name=Os01t0105800-01.exon4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=Os01t0105800-01.exon4;rank=4
+1	irgsp	CDS	307124	307217	.	-	1	ID=CDS:Os01t0105800-01;Parent=transcript:Os01t0105800-01;protein_id=Os01t0105800-01
+1	irgsp	exon	307296	307413	.	-	.	Parent=transcript:Os01t0105800-01;Name=Os01t0105800-01.exon3;constitutive=1;ensembl_end_phase=2;ensembl_phase=1;exon_id=Os01t0105800-01.exon3;rank=3
+1	irgsp	CDS	307296	307413	.	-	2	ID=CDS:Os01t0105800-01;Parent=transcript:Os01t0105800-01;protein_id=Os01t0105800-01
+1	irgsp	CDS	308397	308601	.	-	0	ID=CDS:Os01t0105800-01;Parent=transcript:Os01t0105800-01;protein_id=Os01t0105800-01
+1	irgsp	exon	308397	308626	.	-	.	Parent=transcript:Os01t0105800-01;Name=Os01t0105800-01.exon2;constitutive=1;ensembl_end_phase=1;ensembl_phase=-1;exon_id=Os01t0105800-01.exon2;rank=2
+1	irgsp	five_prime_UTR	308602	308626	.	-	.	Parent=transcript:Os01t0105800-01
+1	irgsp	exon	308703	308842	.	-	.	Parent=transcript:Os01t0105800-01;Name=Os01t0105800-01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Os01t0105800-01.exon1;rank=1
+1	irgsp	five_prime_UTR	308703	308842	.	-	.	Parent=transcript:Os01t0105800-01
+###
+1	irgsp	gene	309520	313170	.	-	.	ID=gene:Os01g0105900;biotype=protein_coding;description=Carbohydrate/purine kinase domain containing protein. (Os01t0105900-01);gene_id=Os01g0105900;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	309520	313170	.	-	.	ID=transcript:Os01t0105900-01;Parent=gene:Os01g0105900;biotype=protein_coding;transcript_id=Os01t0105900-01
+1	irgsp	three_prime_UTR	309520	309821	.	-	.	Parent=transcript:Os01t0105900-01
+1	irgsp	exon	309520	310070	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon8;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0105900-01.exon8;rank=8
+1	irgsp	CDS	309822	310070	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	310256	310367	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon7;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0105900-01.exon7;rank=7
+1	irgsp	CDS	310256	310367	.	-	1	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	310455	310552	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon6;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=Os01t0105900-01.exon6;rank=6
+1	irgsp	CDS	310455	310552	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	310632	310739	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon5;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105900-01.exon5;rank=5
+1	irgsp	CDS	310632	310739	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	310880	310918	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon4;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105900-01.exon4;rank=4
+1	irgsp	CDS	310880	310918	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	311002	311073	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon3;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105900-01.exon3;rank=3
+1	irgsp	CDS	311002	311073	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	311163	311426	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Os01t0105900-01.exon2;rank=2
+1	irgsp	CDS	311163	311426	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	CDS	312867	313064	.	-	0	ID=CDS:Os01t0105900-01;Parent=transcript:Os01t0105900-01;protein_id=Os01t0105900-01
+1	irgsp	exon	312867	313170	.	-	.	Parent=transcript:Os01t0105900-01;Name=Os01t0105900-01.exon1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=Os01t0105900-01.exon1;rank=1
+1	irgsp	five_prime_UTR	313065	313170	.	-	.	Parent=transcript:Os01t0105900-01
+###
+1	irgsp	gene	319754	322205	.	+	.	ID=gene:Os01g0106200;biotype=protein_coding;description=Similar to RER1A protein (AtRER1A). (Os01t0106200-01);gene_id=Os01g0106200;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	319754	322205	.	+	.	ID=transcript:Os01t0106200-01;Parent=gene:Os01g0106200;biotype=protein_coding;transcript_id=Os01t0106200-01
+1	irgsp	five_prime_UTR	319754	319874	.	+	.	Parent=transcript:Os01t0106200-01
+1	irgsp	exon	319754	320236	.	+	.	Parent=transcript:Os01t0106200-01;Name=Os01t0106200-01.exon1;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=Os01t0106200-01.exon1;rank=1
+1	irgsp	CDS	319875	320236	.	+	0	ID=CDS:Os01t0106200-01;Parent=transcript:Os01t0106200-01;protein_id=Os01t0106200-01
+1	irgsp	exon	321468	321648	.	+	.	Parent=transcript:Os01t0106200-01;Name=Os01t0106200-01.exon2;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=Os01t0106200-01.exon2;rank=2
+1	irgsp	CDS	321468	321648	.	+	1	ID=CDS:Os01t0106200-01;Parent=transcript:Os01t0106200-01;protein_id=Os01t0106200-01
+1	irgsp	CDS	321928	321975	.	+	0	ID=CDS:Os01t0106200-01;Parent=transcript:Os01t0106200-01;protein_id=Os01t0106200-01
+1	irgsp	exon	321928	322205	.	+	.	Parent=transcript:Os01t0106200-01;Name=Os01t0106200-01.exon3;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=Os01t0106200-01.exon3;rank=3
+1	irgsp	three_prime_UTR	321976	322205	.	+	.	Parent=transcript:Os01t0106200-01
+###
+1	irgsp	gene	322591	323923	.	-	.	ID=gene:Os01g0106300;biotype=protein_coding;description=Similar to Isoflavone reductase homolog IRL (EC 1.3.1.-). (Os01t0106300-01);gene_id=Os01g0106300;logic_name=irgspv1.0-20170804-genes
+1	irgsp	mRNA	322591	323923	.	-	.	ID=transcript:Os01t0106300-01;Parent=gene:Os01g0106300;biotype=protein_coding;transcript_id=Os01t0106300-01
+1	irgsp	three_prime_UTR	322591	322809	.	-	.	Parent=transcript:Os01t0106300-01
+1	irgsp	exon	322591	322973	.	-	.	Parent=transcript:Os01t0106300-01;Name=Os01t0106300-01.exon2;constitutive=1;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Os01t0106300-01.exon2;rank=2
diff --git a/src/agat/agat_sq_stat_basic/test_data/agat_sq_stat_basic_1.gff b/src/agat/agat_sq_stat_basic/test_data/agat_sq_stat_basic_1.gff
new file mode 100644
index 00000000..d8fc1f4e
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/test_data/agat_sq_stat_basic_1.gff
@@ -0,0 +1,12 @@
+Type (3rd column)	Number	Size total (kb)	Size mean (bp)	/!\Results are rounding to two decimal places 
+cds	290	69.69	240.30
+chromosome	1	43270.92	43270923.00
+exon	320	107.30	335.32
+five_prime_utr	79	11.77	149.03
+gene	52	158.83	3054.40
+mrna	65	197.99	3045.94
+ncrna_gene	1	0.08	81.00
+repeat_region	2	0.20	101.00
+three_prime_utr	70	25.60	365.66
+trna	1	0.08	81.00
+Total	881	43842.46	49764.43
diff --git a/src/agat/agat_sq_stat_basic/test_data/script.sh b/src/agat/agat_sq_stat_basic/test_data/script.sh
new file mode 100755
index 00000000..5527955d
--- /dev/null
+++ b/src/agat/agat_sq_stat_basic/test_data/script.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/1.gff src/agat/agat_sq_stat_basic/test_data/
+cp -r /tmp/agat_source/t/scripts_output/out/agat_sq_stat_basic_1.gff src/agat/agat_sq_stat_basic/test_data/
\ No newline at end of file

From 06005a79b49911f1197ccfddf066fc566d5b1def Mon Sep 17 00:00:00 2001
From: Leila011 <leilapaquay@gmail.com>
Date: Sat, 2 Nov 2024 10:29:37 +0100
Subject: [PATCH 41/42] Add agat convert mfannot2gff (#112)

* add help

* add config

* add run script

* add test data and expected output + script to fetch them

* add test

* update changelog

* cleanup

* create temporary directory and clean up on exit

* add requirements

* update keywords

* update --config description

* add set -eo pipefail to script and test files

* fxi create temporary directory

* cleanup changelog

* cleanup changelog

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |    1 +
 .../agat_convert_mfannot2gff/config.vsh.yaml  |   66 +
 src/agat/agat_convert_mfannot2gff/help.txt    |   67 +
 src/agat/agat_convert_mfannot2gff/script.sh   |   11 +
 src/agat/agat_convert_mfannot2gff/test.sh     |   35 +
 .../test_data/agat_convert_mfannot2gff_1.gff  |  240 ++
 .../test_data/script.sh                       |   10 +
 .../test_data/test.mfannot                    | 2914 +++++++++++++++++
 8 files changed, 3344 insertions(+)
 create mode 100644 src/agat/agat_convert_mfannot2gff/config.vsh.yaml
 create mode 100644 src/agat/agat_convert_mfannot2gff/help.txt
 create mode 100644 src/agat/agat_convert_mfannot2gff/script.sh
 create mode 100644 src/agat/agat_convert_mfannot2gff/test.sh
 create mode 100644 src/agat/agat_convert_mfannot2gff/test_data/agat_convert_mfannot2gff_1.gff
 create mode 100755 src/agat/agat_convert_mfannot2gff/test_data/script.sh
 create mode 100644 src/agat/agat_convert_mfannot2gff/test_data/test.mfannot

diff --git a/CHANGELOG.md b/CHANGELOG.md
index c8d86fa5..35aa33b5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -65,6 +65,7 @@
   - `agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
   - `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
   - `agat/agat_convert_bed2gff`: convert bed file to gff format (PR #97).
+  - `agat/agat_convert_mfannot2gff`: convert MFannot "masterfile" annotation to gff format (PR #112).
   - `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
   - `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
   - `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
diff --git a/src/agat/agat_convert_mfannot2gff/config.vsh.yaml b/src/agat/agat_convert_mfannot2gff/config.vsh.yaml
new file mode 100644
index 00000000..625c4613
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/config.vsh.yaml
@@ -0,0 +1,66 @@
+name: agat_convert_mfannot2gff
+namespace: agat
+description: |
+  Conversion utility for MFannot "masterfile" annotation produced by the
+  [MFannot pipeline](http://megasun.bch.umontreal.ca/RNAweasel/). Reports
+  GFF3 format.
+keywords: [gene annotations, GFF , Mfannot]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_mfannot2gff.html
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.
+requirements:
+  - command: [agat]
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --mfannot
+        alternatives: [-m, -i]
+        description: The mfannot input file.
+        type: file
+        required: true
+        direction: input
+        example: input.mfannot
+  - name: Outputs
+    arguments:
+      - name: --gff
+        alternatives: [-g, -o]
+        description: The GFF output file.
+        type: file
+        direction: output
+        required: true
+        example: output.gff
+  - name: Arguments
+    arguments:
+      - name: --config
+        alternatives: [-c]
+        description: |
+          AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        required: false
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
\ No newline at end of file
diff --git a/src/agat/agat_convert_mfannot2gff/help.txt b/src/agat/agat_convert_mfannot2gff/help.txt
new file mode 100644
index 00000000..83536c5a
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/help.txt
@@ -0,0 +1,67 @@
+```sh
+agat_convert_mfannot2gff.pl --help
+```
+
+ ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_convert_mfannot2gff.pl
+
+Description:
+    Conversion utility for MFannot "masterfile" annotation produced by the
+    MFannot pipeline (http://megasun.bch.umontreal.ca/RNAweasel/). Reports
+    GFF3 format.
+
+Usage:
+        agat_convert_mfannot2gff.pl -m <mfannot> -o <gff>
+        agat_convert_mfannot2gff.pl --help
+
+Copyright and License:
+    Copyright (C) 2015, Brandon Seah (kbseah@mpi-bremen.de) ... GPL-3 ...
+    modified by jacques dainat 2017-11
+
+Options:
+    -m or -i or --mfannot
+            The mfannot input file
+
+    -g or -o or --gff
+            the gff output file
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    -h or --help
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
\ No newline at end of file
diff --git a/src/agat/agat_convert_mfannot2gff/script.sh b/src/agat/agat_convert_mfannot2gff/script.sh
new file mode 100644
index 00000000..e4a32b1e
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+agat_convert_mfannot2gff.pl \
+  --mfannot "$par_mfannot" \
+  --gff "$par_gff" \
+  ${par_config:+--config "${par_config}"}
diff --git a/src/agat/agat_convert_mfannot2gff/test.sh b/src/agat/agat_convert_mfannot2gff/test.sh
new file mode 100644
index 00000000..19f79b6d
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/test.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+set -eo pipefail
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+# create temporary directory and clean up on exit
+TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
+function clean_up {
+ [[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
+}
+trap clean_up EXIT
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --mfannot "$test_dir/test.mfannot" \
+  --gff "$TMPDIR/output.gff" 
+
+echo ">> Checking output"
+[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1
+
+echo ">> Check if output matches expected output"
+diff "$TMPDIR/output.gff" "$test_dir/agat_convert_mfannot2gff_1.gff"
+if [ $? -ne 0 ]; then
+  echo "Output file output.gff does not match expected output"
+  exit 1
+fi
+
+echo "> Test successful"
\ No newline at end of file
diff --git a/src/agat/agat_convert_mfannot2gff/test_data/agat_convert_mfannot2gff_1.gff b/src/agat/agat_convert_mfannot2gff/test_data/agat_convert_mfannot2gff_1.gff
new file mode 100644
index 00000000..6c6c6e2f
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/test_data/agat_convert_mfannot2gff_1.gff
@@ -0,0 +1,240 @@
+##gff-version 3
+tig00000088	mfannot	mRNA	375	3557	.	-	.	ID=mRNA_1;Name=atp1;gene=atp1;transl_table=4
+tig00000088	mfannot	exon	375	3557	.	-	.	ID=exon_1;Parent=atp1;Name=atp1;gene=atp1;transl_table=4
+tig00000088	mfannot	mRNA	2947	3618	.	+	.	ID=mRNA_2;Name=orf223;gene=orf223;transl_table=4
+tig00000088	mfannot	exon	2947	3618	.	+	.	ID=exon_2;Parent=orf223;Name=orf223;gene=orf223;transl_table=4
+tig00000088	mfannot	mRNA	3948	8683	.	-	.	ID=mRNA_3;Name=cox3;gene=cox3;transl_table=4
+tig00000088	mfannot	exon	3948	8683	.	-	.	ID=exon_3;Parent=cox3;Name=cox3;gene=cox3;transl_table=4
+tig00000088	mfannot	group_II_intron	8789	9291	.	+	.	ID=group_II_intron_1;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	9292	9432	.	-	.	ID=mRNA_4;Name=nad9;gene=nad9;transl_table=4
+tig00000088	mfannot	exon	9292	9432	.	-	.	ID=exon_4;Parent=nad9;Name=nad9;gene=nad9;transl_table=4
+tig00000088	mfannot	group_II_intron	9491	9970	.	+	.	ID=group_II_intron_2;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	mRNA	9971	10423	.	-	.	ID=mRNA_5;Name=nad9;gene=nad9;transl_table=4
+tig00000088	mfannot	exon	9971	10423	.	-	.	ID=exon_5;Parent=nad9;Name=nad9;gene=nad9;transl_table=4
+tig00000088	mfannot	mRNA	10429	10545	.	-	.	ID=mRNA_6;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	exon	10429	10545	.	-	.	ID=exon_6;Parent=cox2;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	group_II_intron	10613	11201	.	+	.	ID=group_II_intron_3;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	11202	11519	.	-	.	ID=mRNA_7;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	exon	11202	11519	.	-	.	ID=exon_7;Parent=cox2;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	group_II_intron	11584	12755	.	+	.	ID=group_II_intron_4;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	mRNA	12756	13190	.	-	.	ID=mRNA_8;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	exon	12756	13190	.	-	.	ID=exon_8;Parent=cox2;Name=cox2;gene=cox2;transl_table=4
+tig00000088	mfannot	mRNA	13595	15460	.	-	.	ID=mRNA_9;Name=orf621;gene=orf621;transl_table=4
+tig00000088	mfannot	exon	13595	15460	.	-	.	ID=exon_9;Parent=orf621;Name=orf621;gene=orf621;transl_table=4
+tig00000088	mfannot	mRNA	15841	33346	.	-	.	ID=mRNA_10;Name=cox1;gene=cox1;transl_table=4
+tig00000088	mfannot	exon	15841	33346	.	-	.	ID=exon_10;Parent=cox1;Name=cox1;gene=cox1;transl_table=4
+tig00000088	mfannot	group_II_intron	33462	34862	.	+	.	ID=group_II_intron_5;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	group_II_intron	35352	35430	.	+	.	ID=group_II_intron_6;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	mRNA	35431	37011	.	-	.	ID=mRNA_11;Name=orf526;gene=orf526;transl_table=4
+tig00000088	mfannot	exon	35431	37011	.	-	.	ID=exon_11;Parent=orf526;Name=orf526;gene=orf526;transl_table=4
+tig00000088	mfannot	mRNA	37784	38089	.	-	.	ID=mRNA_12;Name=nad4L;gene=nad4L;transl_table=4
+tig00000088	mfannot	exon	37784	38089	.	-	.	ID=exon_12;Parent=nad4L;Name=nad4L;gene=nad4L;transl_table=4
+tig00000088	mfannot	group_II_intron	38283	38632	.	+	.	ID=group_II_intron_7;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	mRNA	38633	40147	.	-	.	ID=mRNA_13;Name=orf504;gene=orf504;transl_table=4
+tig00000088	mfannot	exon	38633	40147	.	-	.	ID=exon_13;Parent=orf504;Name=orf504;gene=orf504;transl_table=4
+tig00000088	mfannot	mRNA	43290	43955	.	-	.	ID=mRNA_14;Name=nad1;gene=nad1;transl_table=4
+tig00000088	mfannot	exon	43290	43955	.	-	.	ID=exon_14;Parent=nad1;Name=nad1;gene=nad1;transl_table=4
+tig00000088	mfannot	group_II_intron	44168	44599	.	+	.	ID=group_II_intron_8;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	44600	53026	.	-	.	ID=mRNA_15;Name=cob;gene=cob;transl_table=4
+tig00000088	mfannot	exon	44600	53026	.	-	.	ID=exon_15;Parent=cob;Name=cob;gene=cob;transl_table=4
+tig00000088	mfannot	mRNA	54956	55507	.	-	.	ID=mRNA_16;Name=rpl5;gene=rpl5;transl_table=4
+tig00000088	mfannot	exon	54956	55507	.	-	.	ID=exon_16;Parent=rpl5;Name=rpl5;gene=rpl5;transl_table=4
+tig00000088	mfannot	mRNA	55526	55897	.	-	.	ID=mRNA_17;Name=rpl14;gene=rpl14;transl_table=4
+tig00000088	mfannot	exon	55526	55897	.	-	.	ID=exon_17;Parent=rpl14;Name=rpl14;gene=rpl14;transl_table=4
+tig00000088	mfannot	mRNA	56168	56542	.	-	.	ID=mRNA_18;Name=atp8;gene=atp8;transl_table=4
+tig00000088	mfannot	exon	56168	56542	.	-	.	ID=exon_18;Parent=atp8;Name=atp8;gene=atp8;transl_table=4
+tig00000088	mfannot	mRNA	57298	58023	.	-	.	ID=mRNA_19;Name=orf241;gene=orf241;transl_table=4
+tig00000088	mfannot	exon	57298	58023	.	-	.	ID=exon_19;Parent=orf241;Name=orf241;gene=orf241;transl_table=4
+tig00000088	mfannot	mRNA	58024	58434	.	-	.	ID=mRNA_20;Name=rpl16;gene=rpl16;transl_table=4
+tig00000088	mfannot	exon	58024	58434	.	-	.	ID=exon_20;Parent=rpl16;Name=rpl16;gene=rpl16;transl_table=4
+tig00000088	mfannot	mRNA	58447	59346	.	-	.	ID=mRNA_21;Name=rps3;gene=rps3;transl_table=4
+tig00000088	mfannot	exon	58447	59346	.	-	.	ID=exon_21;Parent=rps3;Name=rps3;gene=rps3;transl_table=4
+tig00000088	mfannot	mRNA	58447	59430	.	-	.	ID=mRNA_22;Name=orf327;gene=orf327;transl_table=4
+tig00000088	mfannot	exon	58447	59430	.	-	.	ID=exon_22;Parent=orf327;Name=orf327;gene=orf327;transl_table=4
+tig00000088	mfannot	mRNA	59324	59578	.	-	.	ID=mRNA_23;Name=rps19;gene=rps19;transl_table=4
+tig00000088	mfannot	exon	59324	59578	.	-	.	ID=exon_23;Parent=rps19;Name=rps19;gene=rps19;transl_table=4
+tig00000088	mfannot	mRNA	62407	64761	.	-	.	ID=mRNA_24;Name=orf784;gene=orf784;transl_table=4
+tig00000088	mfannot	exon	62407	64761	.	-	.	ID=exon_24;Parent=orf784;Name=orf784;gene=orf784;transl_table=4
+tig00000088	mfannot	mRNA	62484	64694	.	-	.	ID=mRNA_25;Name=orf736;gene=orf736;transl_table=4
+tig00000088	mfannot	exon	62484	64694	.	-	.	ID=exon_25;Parent=orf736;Name=orf736;gene=orf736;transl_table=4
+tig00000088	mfannot	mRNA	62497	64800	.	+	.	ID=mRNA_26;Name=orf767;gene=orf767;transl_table=4
+tig00000088	mfannot	exon	62497	64800	.	+	.	ID=exon_26;Parent=orf767;Name=orf767;gene=orf767;transl_table=4
+tig00000088	mfannot	mRNA	62505	64790	.	+	.	ID=mRNA_27;Name=orf761;gene=orf761;transl_table=4
+tig00000088	mfannot	exon	62505	64790	.	+	.	ID=exon_27;Parent=orf761;Name=orf761;gene=orf761;transl_table=4
+tig00000088	mfannot	mRNA	62579	64786	.	+	.	ID=mRNA_28;Name=orf735;gene=orf735;transl_table=4
+tig00000088	mfannot	exon	62579	64786	.	+	.	ID=exon_28;Parent=orf735;Name=orf735;gene=orf735;transl_table=4
+tig00000088	mfannot	mRNA	67403	71938	.	-	.	ID=mRNA_29;Name=orf1511;gene=orf1511;transl_table=4
+tig00000088	mfannot	exon	67403	71938	.	-	.	ID=exon_29;Parent=orf1511;Name=orf1511;gene=orf1511;transl_table=4
+tig00000088	mfannot	mRNA	67413	71873	.	-	.	ID=mRNA_30;Name=orf1486;gene=orf1486;transl_table=4
+tig00000088	mfannot	exon	67413	71873	.	-	.	ID=exon_30;Parent=orf1486;Name=orf1486;gene=orf1486;transl_table=4
+tig00000088	mfannot	mRNA	67417	71835	.	-	.	ID=mRNA_31;Name=orf1472;gene=orf1472;transl_table=4
+tig00000088	mfannot	exon	67417	71835	.	-	.	ID=exon_31;Parent=orf1472;Name=orf1472;gene=orf1472;transl_table=4
+tig00000088	mfannot	mRNA	68331	70100	.	+	.	ID=mRNA_32;Name=orf589;gene=orf589;transl_table=4
+tig00000088	mfannot	exon	68331	70100	.	+	.	ID=exon_32;Parent=orf589;Name=orf589;gene=orf589;transl_table=4
+tig00000088	mfannot	mRNA	68495	70594	.	+	.	ID=mRNA_33;Name=orf699;gene=orf699;transl_table=4
+tig00000088	mfannot	exon	68495	70594	.	+	.	ID=exon_33;Parent=orf699;Name=orf699;gene=orf699;transl_table=4
+tig00000088	mfannot	mRNA	69979	71091	.	+	.	ID=mRNA_34;Name=orf370;gene=orf370;transl_table=4
+tig00000088	mfannot	exon	69979	71091	.	+	.	ID=exon_34;Parent=orf370;Name=orf370;gene=orf370;transl_table=4
+tig00000088	mfannot	tRNA	72094	72164	.	+	.	ID=tRNA_1;Name=trnW(uca)_1;gene=trnW(uca)_1;transl_table=4
+tig00000088	mfannot	exon	72094	72164	.	+	.	ID=exon_35;Parent=tRNA_1;Name=trnW(uca)_1;gene=trnW(uca)_1;transl_table=4
+tig00000088	mfannot	mRNA	72179	72577	.	+	.	ID=mRNA_35;Name=rps13_1;gene=rps13_1;transl_table=4
+tig00000088	mfannot	exon	72179	72577	.	+	.	ID=exon_36;Parent=rps13_1;Name=rps13_1;gene=rps13_1;transl_table=4
+tig00000088	mfannot	mRNA	72669	91559	.	+	.	ID=mRNA_36;Name=rps11;gene=rps11;transl_table=4
+tig00000088	mfannot	exon	72669	91559	.	+	.	ID=exon_37;Parent=rps11;Name=rps11;gene=rps11;transl_table=4
+tig00000088	mfannot	mRNA	72981	73280	.	+	.	ID=mRNA_37;Name=rps14_1;gene=rps14_1;transl_table=4
+tig00000088	mfannot	exon	72981	73280	.	+	.	ID=exon_38;Parent=rps14_1;Name=rps14_1;gene=rps14_1;transl_table=4
+tig00000088	mfannot	mRNA	73309	74238	.	+	.	ID=mRNA_38;Name=rps8_1;gene=rps8_1;transl_table=4
+tig00000088	mfannot	exon	73309	74238	.	+	.	ID=exon_39;Parent=rps8_1;Name=rps8_1;gene=rps8_1;transl_table=4
+tig00000088	mfannot	mRNA	73708	74238	.	+	.	ID=mRNA_39;Name=rpl6_1;gene=rpl6_1;transl_table=4
+tig00000088	mfannot	exon	73708	74238	.	+	.	ID=exon_40;Parent=rpl6_1;Name=rpl6_1;gene=rpl6_1;transl_table=4
+tig00000088	mfannot	mRNA	74288	74656	.	+	.	ID=mRNA_40;Name=rps12_1;gene=rps12_1;transl_table=4
+tig00000088	mfannot	exon	74288	74656	.	+	.	ID=exon_41;Parent=rps12_1;Name=rps12_1;gene=rps12_1;transl_table=4
+tig00000088	mfannot	mRNA	74597	74917	.	-	.	ID=mRNA_41;Name=orf106;gene=orf106;transl_table=4
+tig00000088	mfannot	exon	74597	74917	.	-	.	ID=exon_42;Parent=orf106;Name=orf106;gene=orf106;transl_table=4
+tig00000088	mfannot	tRNA	75137	75208	.	+	.	ID=tRNA_2;Name=trnP(ugg)_1;gene=trnP(ugg)_1;transl_table=4
+tig00000088	mfannot	exon	75137	75208	.	+	.	ID=exon_43;Parent=tRNA_2;Name=trnP(ugg)_1;gene=trnP(ugg)_1;transl_table=4
+tig00000088	mfannot	mRNA	76605	77011	.	-	.	ID=mRNA_42;Name=rpl16;gene=rpl16;transl_table=4
+tig00000088	mfannot	exon	76605	77011	.	-	.	ID=exon_44;Parent=rpl16;Name=rpl16;gene=rpl16;transl_table=4
+tig00000088	mfannot	mRNA	81073	83373	.	+	.	ID=mRNA_43;Name=orf766;gene=orf766;transl_table=4
+tig00000088	mfannot	exon	81073	83373	.	+	.	ID=exon_45;Parent=orf766;Name=orf766;gene=orf766;transl_table=4
+tig00000088	mfannot	mRNA	81081	83363	.	+	.	ID=mRNA_44;Name=orf760;gene=orf760;transl_table=4
+tig00000088	mfannot	exon	81081	83363	.	+	.	ID=exon_46;Parent=orf760;Name=orf760;gene=orf760;transl_table=4
+tig00000088	mfannot	mRNA	81155	83359	.	+	.	ID=mRNA_45;Name=orf734;gene=orf734;transl_table=4
+tig00000088	mfannot	exon	81155	83359	.	+	.	ID=exon_47;Parent=orf734;Name=orf734;gene=orf734;transl_table=4
+tig00000088	mfannot	mRNA	81661	82935	.	-	.	ID=mRNA_46;Name=orf424;gene=orf424;transl_table=4
+tig00000088	mfannot	exon	81661	82935	.	-	.	ID=exon_48;Parent=orf424;Name=orf424;gene=orf424;transl_table=4
+tig00000088	mfannot	mRNA	82320	83267	.	-	.	ID=mRNA_47;Name=orf315;gene=orf315;transl_table=4
+tig00000088	mfannot	exon	82320	83267	.	-	.	ID=exon_49;Parent=orf315;Name=orf315;gene=orf315;transl_table=4
+tig00000088	mfannot	mRNA	85976	90457	.	-	.	ID=mRNA_48;Name=orf1493;gene=orf1493;transl_table=4
+tig00000088	mfannot	exon	85976	90457	.	-	.	ID=exon_50;Parent=orf1493;Name=orf1493;gene=orf1493;transl_table=4
+tig00000088	mfannot	mRNA	85986	90419	.	-	.	ID=mRNA_49;Name=orf1477;gene=orf1477;transl_table=4
+tig00000088	mfannot	exon	85986	90419	.	-	.	ID=exon_51;Parent=orf1477;Name=orf1477;gene=orf1477;transl_table=4
+tig00000088	mfannot	mRNA	85990	90522	.	-	.	ID=mRNA_50;Name=orf1510;gene=orf1510;transl_table=4
+tig00000088	mfannot	exon	85990	90522	.	-	.	ID=exon_52;Parent=orf1510;Name=orf1510;gene=orf1510;transl_table=4
+tig00000088	mfannot	mRNA	86082	89342	.	+	.	ID=mRNA_51;Name=orf1086;gene=orf1086;transl_table=4
+tig00000088	mfannot	exon	86082	89342	.	+	.	ID=exon_53;Parent=orf1086;Name=orf1086;gene=orf1086;transl_table=4
+tig00000088	mfannot	mRNA	86161	89838	.	+	.	ID=mRNA_52;Name=orf1225;gene=orf1225;transl_table=4
+tig00000088	mfannot	exon	86161	89838	.	+	.	ID=exon_54;Parent=orf1225;Name=orf1225;gene=orf1225;transl_table=4
+tig00000088	mfannot	mRNA	89216	90571	.	+	.	ID=mRNA_53;Name=orf451;gene=orf451;transl_table=4
+tig00000088	mfannot	exon	89216	90571	.	+	.	ID=exon_55;Parent=orf451;Name=orf451;gene=orf451;transl_table=4
+tig00000088	mfannot	tRNA	90678	90748	.	+	.	ID=tRNA_3;Name=trnW(uca)_2;gene=trnW(uca)_2;transl_table=4
+tig00000088	mfannot	exon	90678	90748	.	+	.	ID=exon_56;Parent=tRNA_3;Name=trnW(uca)_2;gene=trnW(uca)_2;transl_table=4
+tig00000088	mfannot	mRNA	90763	91161	.	+	.	ID=mRNA_54;Name=rps13_2;gene=rps13_2;transl_table=4
+tig00000088	mfannot	exon	90763	91161	.	+	.	ID=exon_57;Parent=rps13_2;Name=rps13_2;gene=rps13_2;transl_table=4
+tig00000088	mfannot	mRNA	91566	91865	.	+	.	ID=mRNA_55;Name=rps14_2;gene=rps14_2;transl_table=4
+tig00000088	mfannot	exon	91566	91865	.	+	.	ID=exon_58;Parent=rps14_2;Name=rps14_2;gene=rps14_2;transl_table=4
+tig00000088	mfannot	mRNA	91894	92277	.	+	.	ID=mRNA_56;Name=rps8_2;gene=rps8_2;transl_table=4
+tig00000088	mfannot	exon	91894	92277	.	+	.	ID=exon_59;Parent=rps8_2;Name=rps8_2;gene=rps8_2;transl_table=4
+tig00000088	mfannot	mRNA	92295	92825	.	+	.	ID=mRNA_57;Name=rpl6_2;gene=rpl6_2;transl_table=4
+tig00000088	mfannot	exon	92295	92825	.	+	.	ID=exon_60;Parent=rpl6_2;Name=rpl6_2;gene=rpl6_2;transl_table=4
+tig00000088	mfannot	mRNA	92875	93243	.	+	.	ID=mRNA_58;Name=rps12_2;gene=rps12_2;transl_table=4
+tig00000088	mfannot	exon	92875	93243	.	+	.	ID=exon_61;Parent=rps12_2;Name=rps12_2;gene=rps12_2;transl_table=4
+tig00000088	mfannot	mRNA	93224	93682	.	+	.	ID=mRNA_59;Name=rps7;gene=rps7;transl_table=4
+tig00000088	mfannot	exon	93224	93682	.	+	.	ID=exon_62;Parent=rps7;Name=rps7;gene=rps7;transl_table=4
+tig00000088	mfannot	tRNA	93720	93791	.	+	.	ID=tRNA_4;Name=trnP(ugg)_2;gene=trnP(ugg)_2;transl_table=4
+tig00000088	mfannot	exon	93720	93791	.	+	.	ID=exon_63;Parent=tRNA_4;Name=trnP(ugg)_2;gene=trnP(ugg)_2;transl_table=4
+tig00000088	mfannot	mRNA	93823	94440	.	+	.	ID=mRNA_60;Name=rps4;gene=rps4;transl_table=4
+tig00000088	mfannot	exon	93823	94440	.	+	.	ID=exon_64;Parent=rps4;Name=rps4;gene=rps4;transl_table=4
+tig00000088	mfannot	mRNA	95255	96652	.	+	.	ID=mRNA_61;Name=orf465;gene=orf465;transl_table=4
+tig00000088	mfannot	exon	95255	96652	.	+	.	ID=exon_65;Parent=orf465;Name=orf465;gene=orf465;transl_table=4
+tig00000088	mfannot	group_II_intron	96715	97278	.	+	.	ID=group_II_intron_9;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	group_II_intron	97835	97857	.	+	.	ID=group_II_intron_10;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	97858	100740	.	+	.	ID=mRNA_62;Name=nad5;gene=nad5;transl_table=4
+tig00000088	mfannot	exon	97858	100740	.	+	.	ID=exon_66;Parent=nad5;Name=nad5;gene=nad5;transl_table=4
+tig00000088	mfannot	mRNA	100756	100971	.	+	.	ID=mRNA_63;Name=nad6;gene=nad6;transl_table=4
+tig00000088	mfannot	exon	100756	100971	.	+	.	ID=exon_67;Parent=nad6;Name=nad6;gene=nad6;transl_table=4
+tig00000088	mfannot	mRNA	101416	103482	.	+	.	ID=mRNA_64;Name=orf688;gene=orf688;transl_table=4
+tig00000088	mfannot	exon	101416	103482	.	+	.	ID=exon_68;Parent=orf688;Name=orf688;gene=orf688;transl_table=4
+tig00000088	mfannot	group_II_intron	103569	103575	.	+	.	ID=group_II_intron_11;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	103576	103974	.	+	.	ID=mRNA_65;Name=orf132;gene=orf132;transl_table=4
+tig00000088	mfannot	exon	103576	103974	.	+	.	ID=exon_69;Parent=orf132;Name=orf132;gene=orf132;transl_table=4
+tig00000088	mfannot	tRNA	104056	104128	.	+	.	ID=tRNA_5;Name=trnR(ucu);gene=trnR(ucu);transl_table=4
+tig00000088	mfannot	exon	104056	104128	.	+	.	ID=exon_70;Parent=tRNA_5;Name=trnR(ucu);gene=trnR(ucu);transl_table=4
+tig00000088	mfannot	mRNA	104153	104224	.	-	.	ID=mRNA_66;Name=nad3;gene=nad3;transl_table=4
+tig00000088	mfannot	exon	104153	104224	.	-	.	ID=exon_71;Parent=nad3;Name=nad3;gene=nad3;transl_table=4
+tig00000088	mfannot	group_II_intron	104436	105029	.	+	.	ID=group_II_intron_12;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	mRNA	105030	107969	.	-	.	ID=mRNA_67;Name=atp6;gene=atp6;transl_table=4
+tig00000088	mfannot	exon	105030	107969	.	-	.	ID=exon_72;Parent=atp6;Name=atp6;gene=atp6;transl_table=4
+tig00000088	mfannot	mRNA	108059	108412	.	-	.	ID=mRNA_68;Name=rps10;gene=rps10;transl_table=4
+tig00000088	mfannot	exon	108059	108412	.	-	.	ID=exon_73;Parent=rps10;Name=rps10;gene=rps10;transl_table=4
+tig00000088	mfannot	mRNA	108421	109893	.	-	.	ID=mRNA_69;Name=nad2;gene=nad2;transl_table=4
+tig00000088	mfannot	exon	108421	109893	.	-	.	ID=exon_74;Parent=nad2;Name=nad2;gene=nad2;transl_table=4
+tig00000088	mfannot	mRNA	110001	118556	.	+	.	ID=mRNA_70;Name=nad7;gene=nad7;transl_table=4
+tig00000088	mfannot	exon	110001	118556	.	+	.	ID=exon_75;Parent=nad7;Name=nad7;gene=nad7;transl_table=4
+tig00000088	mfannot	group_II_intron	119144	119308	.	+	.	ID=group_II_intron_13;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	mRNA	119309	121269	.	+	.	ID=mRNA_71;Name=nad4;gene=nad4;transl_table=4
+tig00000088	mfannot	exon	119309	121269	.	+	.	ID=exon_76;Parent=nad4;Name=nad4;gene=nad4;transl_table=4
+tig00000088	mfannot	mRNA	121551	121778	.	+	.	ID=mRNA_72;Name=atp9;gene=atp9;transl_table=4
+tig00000088	mfannot	exon	121551	121778	.	+	.	ID=exon_77;Parent=atp9;Name=atp9;gene=atp9;transl_table=4
+tig00000088	mfannot	tRNA	121887	121959	.	+	.	ID=tRNA_6;Name=trnD(guc);gene=trnD(guc);transl_table=4
+tig00000088	mfannot	exon	121887	121959	.	+	.	ID=exon_78;Parent=tRNA_6;Name=trnD(guc);gene=trnD(guc);transl_table=4
+tig00000088	mfannot	tRNA	121962	122033	.	+	.	ID=tRNA_7;Name=trnC(gca);gene=trnC(gca);transl_table=4
+tig00000088	mfannot	exon	121962	122033	.	+	.	ID=exon_79;Parent=tRNA_7;Name=trnC(gca);gene=trnC(gca);transl_table=4
+tig00000088	mfannot	tRNA	122051	122123	.	+	.	ID=tRNA_8;Name=trnH(gug);gene=trnH(gug);transl_table=4
+tig00000088	mfannot	exon	122051	122123	.	+	.	ID=exon_80;Parent=tRNA_8;Name=trnH(gug);gene=trnH(gug);transl_table=4
+tig00000088	mfannot	tRNA	122142	122214	.	+	.	ID=tRNA_9;Name=trnV(uac);gene=trnV(uac);transl_table=4
+tig00000088	mfannot	exon	122142	122214	.	+	.	ID=exon_81;Parent=tRNA_9;Name=trnV(uac);gene=trnV(uac);transl_table=4
+tig00000088	mfannot	mRNA	122234	122446	.	+	.	ID=mRNA_73;Name=rnpB;gene=rnpB;transl_table=4
+tig00000088	mfannot	exon	122234	122446	.	+	.	ID=exon_82;Parent=rnpB;Name=rnpB;gene=rnpB;transl_table=4
+tig00000088	mfannot	rRNA	122544	123762	.	+	.	ID=rRNA_1;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	122544	123762	.	+	.	ID=exon_83;Parent=rRNA_1;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	group_II_intron	123576	123762	.	+	.	ID=group_II_intron_14;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	rRNA	123763	124009	.	+	.	ID=rRNA_2;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	123763	124009	.	+	.	ID=exon_84;Parent=rRNA_2;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	rRNA	124010	124127	.	+	.	ID=rRNA_3;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	124010	124127	.	+	.	ID=exon_85;Parent=rRNA_3;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	rRNA	124128	124832	.	+	.	ID=rRNA_4;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	124128	124832	.	+	.	ID=exon_86;Parent=rRNA_4;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	mRNA	124833	125279	.	+	.	ID=mRNA_74;Name=orf148;gene=orf148;transl_table=4
+tig00000088	mfannot	exon	124833	125279	.	+	.	ID=exon_87;Parent=orf148;Name=orf148;gene=orf148;transl_table=4
+tig00000088	mfannot	group_II_intron	124847	124962	.	+	.	ID=group_II_intron_15;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	rRNA	124963	125117	.	+	.	ID=rRNA_5;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	124963	125117	.	+	.	ID=exon_88;Parent=rRNA_5;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	rRNA	125118	125231	.	+	.	ID=rRNA_6;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	125118	125231	.	+	.	ID=exon_89;Parent=rRNA_6;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	rRNA	125232	125279	.	+	.	ID=rRNA_7;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	125232	125279	.	+	.	ID=exon_90;Parent=rRNA_7;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	rRNA	125493	125529	.	+	.	ID=rRNA_8;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	exon	125493	125529	.	+	.	ID=exon_91;Parent=rRNA_8;Name=rns;gene=rns;transl_table=4
+tig00000088	mfannot	mRNA	125530	125635	.	+	.	ID=mRNA_75;Name=rrn5;gene=rrn5;transl_table=4
+tig00000088	mfannot	exon	125530	125635	.	+	.	ID=exon_92;Parent=rrn5;Name=rrn5;gene=rrn5;transl_table=4
+tig00000088	mfannot	tRNA	125644	125715	.	+	.	ID=tRNA_10;Name=trnF(gaa);gene=trnF(gaa);transl_table=4
+tig00000088	mfannot	exon	125644	125715	.	+	.	ID=exon_93;Parent=tRNA_10;Name=trnF(gaa);gene=trnF(gaa);transl_table=4
+tig00000088	mfannot	tRNA	125734	125806	.	+	.	ID=tRNA_11;Name=trnK(uuu);gene=trnK(uuu);transl_table=4
+tig00000088	mfannot	exon	125734	125806	.	+	.	ID=exon_94;Parent=tRNA_11;Name=trnK(uuu);gene=trnK(uuu);transl_table=4
+tig00000088	mfannot	tRNA	126093	126165	.	+	.	ID=tRNA_12;Name=trnT(ugu);gene=trnT(ugu);transl_table=4
+tig00000088	mfannot	exon	126093	126165	.	+	.	ID=exon_95;Parent=tRNA_12;Name=trnT(ugu);gene=trnT(ugu);transl_table=4
+tig00000088	mfannot	tRNA	126180	126251	.	+	.	ID=tRNA_13;Name=trnM(cau)_1;gene=trnM(cau)_1;transl_table=4
+tig00000088	mfannot	exon	126180	126251	.	+	.	ID=exon_96;Parent=tRNA_13;Name=trnM(cau)_1;gene=trnM(cau)_1;transl_table=4
+tig00000088	mfannot	tRNA	126284	126356	.	+	.	ID=tRNA_14;Name=trnM(cau)_2;gene=trnM(cau)_2;transl_table=4
+tig00000088	mfannot	exon	126284	126356	.	+	.	ID=exon_97;Parent=tRNA_14;Name=trnM(cau)_2;gene=trnM(cau)_2;transl_table=4
+tig00000088	mfannot	tRNA	126364	126435	.	+	.	ID=tRNA_15;Name=trnA(ugc);gene=trnA(ugc);transl_table=4
+tig00000088	mfannot	exon	126364	126435	.	+	.	ID=exon_98;Parent=tRNA_15;Name=trnA(ugc);gene=trnA(ugc);transl_table=4
+tig00000088	mfannot	tRNA	126453	126525	.	+	.	ID=tRNA_16;Name=trnR(ucg);gene=trnR(ucg);transl_table=4
+tig00000088	mfannot	exon	126453	126525	.	+	.	ID=exon_99;Parent=tRNA_16;Name=trnR(ucg);gene=trnR(ucg);transl_table=4
+tig00000088	mfannot	tRNA	126528	126600	.	+	.	ID=tRNA_17;Name=trnI(gau);gene=trnI(gau);transl_table=4
+tig00000088	mfannot	exon	126528	126600	.	+	.	ID=exon_100;Parent=tRNA_17;Name=trnI(gau);gene=trnI(gau);transl_table=4
+tig00000088	mfannot	tRNA	126629	126710	.	+	.	ID=tRNA_18;Name=trnL(uag);gene=trnL(uag);transl_table=4
+tig00000088	mfannot	exon	126629	126710	.	+	.	ID=exon_101;Parent=tRNA_18;Name=trnL(uag);gene=trnL(uag);transl_table=4
+tig00000088	mfannot	tRNA	126724	126796	.	+	.	ID=tRNA_19;Name=trnN(guu);gene=trnN(guu);transl_table=4
+tig00000088	mfannot	exon	126724	126796	.	+	.	ID=exon_102;Parent=tRNA_19;Name=trnN(guu);gene=trnN(guu);transl_table=4
+tig00000088	mfannot	tRNA	126797	126881	.	+	.	ID=tRNA_20;Name=trnY(gua);gene=trnY(gua);transl_table=4
+tig00000088	mfannot	exon	126797	126881	.	+	.	ID=exon_103;Parent=tRNA_20;Name=trnY(gua);gene=trnY(gua);transl_table=4
+tig00000088	mfannot	tRNA	126907	126978	.	+	.	ID=tRNA_21;Name=trnE(uuc);gene=trnE(uuc);transl_table=4
+tig00000088	mfannot	exon	126907	126978	.	+	.	ID=exon_104;Parent=tRNA_21;Name=trnE(uuc);gene=trnE(uuc);transl_table=4
+tig00000088	mfannot	tRNA	127002	127072	.	+	.	ID=tRNA_22;Name=trnQ(uug);gene=trnQ(uug);transl_table=4
+tig00000088	mfannot	exon	127002	127072	.	+	.	ID=exon_105;Parent=tRNA_22;Name=trnQ(uug);gene=trnQ(uug);transl_table=4
+tig00000088	mfannot	tRNA	127097	127167	.	+	.	ID=tRNA_23;Name=trnG(ucc);gene=trnG(ucc);transl_table=4
+tig00000088	mfannot	exon	127097	127167	.	+	.	ID=exon_106;Parent=tRNA_23;Name=trnG(ucc);gene=trnG(ucc);transl_table=4
+tig00000088	mfannot	rRNA	127170	132900	.	+	.	ID=rRNA_9;Name=rnl;gene=rnl;transl_table=4
+tig00000088	mfannot	exon	127170	132900	.	+	.	ID=exon_107;Parent=rRNA_9;Name=rnl;gene=rnl;transl_table=4
+tig00000088	mfannot	group_II_intron	128101	130559	.	+	.	ID=group_II_intron_16;Name=group%3DII;gene=group%3DII;transl_table=4
+tig00000088	mfannot	group_II_intron	132446	132900	.	+	.	ID=group_II_intron_17;Name=group%3DII(derived);gene=group%3DII(derived);transl_table=4
+tig00000088	mfannot	rRNA	132901	132923	.	+	.	ID=rRNA_10;Name=rnl;gene=rnl;transl_table=4
+tig00000088	mfannot	exon	132901	132923	.	+	.	ID=exon_108;Parent=rRNA_10;Name=rnl;gene=rnl;transl_table=4
+tig00000088	mfannot	tRNA	132924	133010	.	+	.	ID=tRNA_24;Name=trnS(gcu);gene=trnS(gcu);transl_table=4
+tig00000088	mfannot	exon	132924	133010	.	+	.	ID=exon_109;Parent=tRNA_24;Name=trnS(gcu);gene=trnS(gcu);transl_table=4
+tig00000088	mfannot	tRNA	133023	133103	.	+	.	ID=tRNA_25;Name=trnL(uaa);gene=trnL(uaa);transl_table=4
+tig00000088	mfannot	exon	133023	133103	.	+	.	ID=exon_110;Parent=tRNA_25;Name=trnL(uaa);gene=trnL(uaa);transl_table=4
+tig00000088	mfannot	tRNA	133131	133218	.	+	.	ID=tRNA_26;Name=trnS(uga);gene=trnS(uga);transl_table=4
+tig00000088	mfannot	exon	133131	133218	.	+	.	ID=exon_111;Parent=tRNA_26;Name=trnS(uga);gene=trnS(uga);transl_table=4
diff --git a/src/agat/agat_convert_mfannot2gff/test_data/script.sh b/src/agat/agat_convert_mfannot2gff/test_data/script.sh
new file mode 100755
index 00000000..f60aa8dd
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/test_data/script.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/scripts_output/in/test.mfannot src/agat/agat_convert_mfannot2gff/test_data/
+cp -r /tmp/agat_source/t/scripts_output/out/agat_convert_mfannot2gff_1.gff src/agat/agat_convert_mfannot2gff/test_data/
\ No newline at end of file
diff --git a/src/agat/agat_convert_mfannot2gff/test_data/test.mfannot b/src/agat/agat_convert_mfannot2gff/test_data/test.mfannot
new file mode 100644
index 00000000..7a33b19a
--- /dev/null
+++ b/src/agat/agat_convert_mfannot2gff/test_data/test.mfannot
@@ -0,0 +1,2914 @@
+;; Masterfile modified automatically by mfannot version 1.33
+;;    - Gene Totals: 106
+;;    - List of genes added:
+;;      atp1 (3 introns)     atp6 (1 introns)     atp8                
+;;      atp9                 cob (6 introns)      cox1 (11 introns)   
+;;      cox3 (3 introns)     nad1                 nad2                
+;;      nad3                 nad4 (1 introns)     nad4L               
+;;      nad5 (2 introns)     nad7 (6 introns)     orf101              
+;;      orf106               orf1086              orf119              
+;;      orf1225              orf123               orf132              
+;;      orf1472              orf1477              orf148              
+;;      orf1486              orf149               orf1493             
+;;      orf1510              orf1511              orf158              
+;;      orf204               orf223               orf240              
+;;      orf241               orf259               orf269              
+;;      orf315               orf327               orf353              
+;;      orf370               orf385               orf424              
+;;      orf451               orf465               orf499              
+;;      orf504               orf505               orf511              
+;;      orf526               orf550               orf580              
+;;      orf589               orf621               orf671              
+;;      orf673               orf676               orf688              
+;;      orf699               orf734               orf735              
+;;      orf736               orf750               orf760              
+;;      orf761               orf766               orf767              
+;;      orf784               rnpB                 rpl14               
+;;      rpl16                rpl5                 rpl6                
+;;      rps10                rps11                rps12               
+;;      rps13                rps14                rps19               
+;;      rps3                 rps4                 rps7                
+;;      rps8                 rrn5                 trnA(ugc)           
+;;      trnC(gca)            trnD(guc)            trnE(uuc)           
+;;      trnF(gaa)            trnG(ucc)            trnH(gug)           
+;;      trnI(gau)            trnK(uuu)            trnL(uaa)           
+;;      trnL(uag)            trnM(cau)            trnN(guu)           
+;;      trnP(ugg)            trnQ(uug)            trnR(ucg)           
+;;      trnR(ucu)            trnS(gcu)            trnS(uga)           
+;;      trnT(ugu)            trnV(uac)            trnW(uca)           
+;;      trnY(gua)           
+;;
+;; end mfannot
+;;
+
+
+>tig00000088 gc=4
+     1  GAATTTTAAGTTTATCTAAAATATAGAAAATAAAAATATATTTTTATTTTATGCAGTTTT
+    61  TGTATATCATAAATCTTAAGTGTTATTTAACATTTATTTTAGTAAATTTAAGAATAGATT
+   121  TTTAAAATAACAAATATAATAATGAACCAGTTATTATTTATAAATTATTTGTAGTAATAA
+   181  GATAAATTAACTTTATATTTTAGTTATATAGTTATAATTAGTATAGTATGTATAAATTGG
+   241  CATTTATAATATTAGTTACATTAACTATAAAATTAATTTTATATGTTTTTTGATTTTTTC
+   301  TAAAAAAATTTGTATCATTTGGAGAAATCTAAGATGAGTTGGTATTAACTAATGATGGTT
+   361  ATTGGTTAAAAATA
+;     G-atp1 <== end
+;     G-atp1-E4 <== end
+   375  TTAAAGATTAATTTCAGAGGTAAGCAGAGATTGTAATTTTTTTTTAAGCTCGGGAGAAAT
+   435  TTTTTTTTGTTCTTTAATTTCATTTAAAATTTTTGTATGTTTTGTTTTTAAAAGGTTTAA
+   495  AAGTTTTTGTTCAAAATTTGATACTTTGTTTGTAGCAATTTTATCTAAAAACCCATTCAT
+   555  CCCAGCGAAAATTATAACAACTTGGTATTCAATTGGCATTGGTATGAATTGATTTTGTTT
+   615  TAACAATTCGATTAGACGAGAACCTCGATTTAATACGTGTTGTGTAGATGCATCTAAATC
+   675  AGACCCGAATTGAGCAAAAGCTTCAACTTCACGGTATTGGGCTAGTTCTAGTTTTAAACC
+   735  CCCGGCAACTTGTCTCATTGCTGGAATTTGAGCAGCAGAACCAACACGACTTACTGATAA
+   795  ACCCACATTAATTGCGGGCCGAATTCCTTTATAAAAAAGTTCAGCTTCTAGAAAGATTTG
+   855  ACCATCTGTAA
+;     G-atp1-E4 <== start
+;     G-atp1-I3 <== end
+   866  aatgggtagataaatattgttaattattatatcccccaatgtaaactgtacatgatagtt
+   926  agttatcatacagcttcttttaagagaaaagaattatgtttataaattaaaatatatttt
+   986  gataagttataaactacacataattatccagttaggtataattatgtgtagtttattata
+  1046  caatttattttattaaaaaataatatttactataaaaacttcccctcacactgttcagtt
+  1106  tgattgtttataaaaacaatttttttaagaaaaatgatacagttttcatgccttttttaa
+  1166  aaaaaagcttttatttttacaaaaatttgtacttaattttttggaaaaaatatccaatga
+  1226  tagctgatatcaaaatttaatattgtttattttggttaaaaagttttaaataaaaattaa
+  1286  ttttctattaaaaatagatttttctaaaaaaatttttttttcagtttaatgtttttctat
+  1346  aaatgaatttttaatattattatattatgataattaaatatgtataattagataatgtaa
+  1406  tttgatgtaaaaattacaagttttcatatataaaattttttaaaaaaaatttcttatcat
+  1466  aatttatataattttatacaaattgatgtaattacaaataactgccc
+;     G-atp1-I3 <== start /group=II ;; mfannot: splice boundaries uncertain
+;     G-atp1-E3 <== end
+  1513  TAGAAATAACATTTGTTGGAATATAAGCTGAAACATCTCCAGCTTGTGTTTCTATTATAG
+  1573  GAAGCGCGGTTAATGATCCAGCCCCATAGTCTTTATTTAATTTAGCTGCACGTTCTAATA
+  1633  AACGAGAATGTAGATAAA
+;     G-atp1-E3 <== start
+;     G-atp1-I2 <== end
+  1651  attgagtcaaaaacttattagtaatgttaattactgttgtatgctcttagagctttacaa
+  1711  aataattacttattataaagctctcttttcgttgaaaaattggattttgtgaattattaa
+  1771  ttaagtttaatattttttgatgtaaaaaaatatttataaattttatcgaaataaattcat
+  1831  attatgaattttataaaattttattgttttataaaattacataaaaagaattgtctgttc
+  1891  tatattattttatacaatataaactctagtattaggaactttatgaaaaagttttaaaca
+  1951  aaaaaataattatgaatttgtcatatttttgcttgaaatgtttatgaaatacgtcaaaat
+  2011  ttctctataactattttttcttagcggtaaagatatgtatatatattaaaaagtttattt
+  2071  tatttttgaataaaactttttgacaaacacataaatagttatttatttaaatatacattt
+  2131  atgaatatactgtatatttaaaattttttggaaaaaatttacctaattaactaaataccc
+;     G-atp1-I2 <== start /group=II(derived)
+;     G-atp1-E2 <== end
+  2191  AAACGTCTCCGGGATATGCTTCACGACCTGGTGGTCGTCTTAATAGTAAAGACATTTGTC
+  2251  TATAAGCTACTGCCTGTTTACTTAAATCATCATAAATGATTAACGCATGCTTTTTATTAT
+  2311  CGCGAAAATATTCTCCTATTGTACACCCAGTATATG
+;     G-atp1-E2 <== start
+;     G-atp1-I1 <== end
+  2347  tttggatagaaaaaatttcttccacaaaacttaacgtataaatttctttatattaagctt
+  2407  aatgaaaaaatttctagttaaattattaataacctaataaatacatttaatgtagatgtg
+  2467  atatacgtctaaaatttggtatttataaattagatttttaaagaaattttttcaaaaact
+  2527  gtttttactttaaagtaatttcagattcaaaattataaaattaattataaacttaactag
+  2587  tttcttttatactatttataattaaagcgaatttttttagtagataatatttaatttttt
+  2647  tgcattgttttatgataatagcaccttttaatcaaaaagtttttatataaatatttataa
+  2707  ttgatttgttatataacaaacgtatacatatacttattaatataagtattaaactacctt
+  2767  aaatttggggtttagtaatatataaaaagaaacgattgttaaatac
+;     G-atp1-I1 <== start /group=II(derived) ;; mfannot: splice boundaries uncertain
+;     G-atp1-E1 <== end
+  2813  GTGCTAAAAACTGTAATGGAGCGGCTTCAGATGCCGTTGCAGCTACAATGATTGTATATG
+  2873  AAAATGCGTTTTCTTTTTCTAATATAGATACTAGTTGAGCAACTGTTGAACGTTTTTGTC
+  2933  CGATTGCGACATAA
+;     G-orf223 ==> start
+  2947  ATGCAGTATAACTTATCGGAATCGTTTAGCTCATTATTTTGATATTTTTGATTTAAAATG
+  3007  GTGTCAATTGCAATTGCAGTTTTTCCAGTTTGCCTGTCACCAATGATTAGTTCCCGTTGA
+  3067  CCACGTCCAATAGGAACTAAACTGTCAACAGCTTTTAATCCAGTTTGCATTGGCTCAGAA
+  3127  ACTGATTTTCTTGGAATAATTCCGGGTGCTTTAACTTCTACCCGCCGAGTTTCATTACTT
+  3187  TTAATTGCTCCTTTTCCGTCGATAGGGGCACCTAAAGCGTTAATTGCTCGGCCTAAAAGG
+  3247  TCTGTACCTACAGGCACACTAACAATATTTTTAGTACGTTTTACGGATTCTCCTTCTGAA
+  3307  ACAAATTTGTCGTTTCCAAAAATAACAATTCCTGCATTATCATTTTCTAAATTTAGAGCC
+  3367  ATTCCTTTTAAACCGGAACTAAATTCGACCATTTCACCAGCTTTTAAATTTTGTAATCCA
+  3427  AAAACTCGAGCAATTCCGTCTCCTACAGTTAATACTTTTCCTTTTTCAGTGAAGGAATTT
+  3487  TTATTAATTCCTGTTGTTGCTATTTGAATTTCTAATAATTGAGATAATTCGTTTATATGT
+  3547  AGTTTTTGCAT
+;     G-atp1-E1 <== start
+;     G-atp1 <== start
+  3558  TTCTGCTATAAGTTTTGTATTAAAATTTAAAGTATTTTTTTGTATAATTTTTTTTAACTA
+  3618  A
+;     G-orf223 ==> end
+  3619  AACCGCTAATTTGATAAAAAATTGTTAAGAAATTTATTCATAAAATCTAGAAAACTAAAA
+  3679  GAATTTTCCAGTAAAGGAAAAAGTTATTTAATATAAAATTTTTTACATATTAAAAATAAT
+  3739  AATATAATTTATATTTTATTTAATTTTTAAGATTTTAAAATTAATGATCCTTTTTTAAAA
+  3799  AATGTAGAATTTTATTAAAAATTGAATATCCCATAAACTTATGGTTTATGGGATAATTTT
+  3859  CTTACCCATGAAAAATAAGTTTTTAACTTAATCAAAATATTAATATATAATATTTATATA
+  3919  TTTTTCATGTAAGTGAACACTAGTCAAGT
+;     G-cox3 <== end
+;     G-cox3-E4 <== end
+  3948  TTAAGCTTTATTTCCCCATATATATATGGAAATGAATAAAAAAAGTCAAACAACATATA
+;     G-cox3-E4 <== start
+;     G-cox3-I3 <== end
+  4007  aacttaaaaatatgcttttttagattttgatttttgcgcatattaatagtttttgaacca
+  4067  aacgtaataatttcttattattaggctcttatacaaattaaatatttgtctttatactgt
+  4127  atatgtttaatcgtgtagtataaaaaattgcaggaggtaggtaattaaattttaattttt
+  4187  ttaaaaaaacatatttaatttgtttagaaggtgttctaatcatttttaaatttctaaaaa
+  4247  agaaaagaatatcagcatcaataaacataattatataaatataattatgtttttgctaaa
+  4307  atttttgttaagtgtacagtaatatgttttaaactttctaaaagaaagtattttacataa
+  4367  gttttattttattctatttgttaaaaatgaattttttttttgaaaaacgtataattagaa
+  4427  gtctttaaaagagattttggcttaaaaagtcaatttcaatataatgttgaatttttgatc
+  4487  tttttaaagcaacttctatctaattaagaaaaaggacctaactataaatttataagcaca
+  4547  caccaacaaaatatattaatcgatatgctttgaagcaataagcgttaattcacacatggc
+  4607  gtgcaaattgctaaacaccatgcgttttttatgaaatactttaaatttaaaaaatttttt
+  4667  ttcataaataagatacttttaaagcgaagtatcttgcaatactaatttagtgtattagta
+  4727  aaataatgctttacattttttttaatttaaaaatctgtttttagattagagtaaattttt
+  4787  tggttaaagaagaaatatattggctataatattttttctttaaaaactttggtgtaaaaa
+  4847  atatattaaaaacagtatactatatttttatataaaagtattatatattcaaatgcaaag
+  4907  aaatat
+;     G-cox3-I3 <== start /group=II(derived) ;; mfannot: splice boundaries uncertain
+;     G-cox3-E3 <== end
+  4913  TCGCCCGCCAATATCATGCTGCTGCTTCGAAAGCAAAATGATGATTATCCGTAAAATGAT
+  4973  GTTTTATTAAACGTATTAAACAAATACCCAAAAAAATACTTCCTATTAAAA
+;     G-cox3-E3 <== start
+;     G-cox3-I2 <== end
+  5024  tttgagattaattataaattattaattttctcttagaactgtacatataattttattata
+  5084  tacggctcaacataataaattgttattgtgcatacaaaattgatggaattagtattatgc
+  5144  aatacatttttattataaatagtgtagtacttaagaattcctattatagggaagcgtagt
+  5204  aaatattataaaatttttttgttataatactgttgtttactattttcatatttcattttt
+  5264  ttatttaaaaaaaaatgaaaaagtttataatgcatatttgtttttttaaatgcaaattta
+  5324  gatatttattattgttataatttttaatatcaaaaatgcaataaaatttgtttgtattag
+  5384  aattttcatgtcaaaagaaatatttacaactttaaaaaatatactaaaatatttttatta
+  5444  aacaaatacaataaaaaccgtac
+;     G-cox3-I2 <== start /group=II(derived)
+;     G-cox3-E2 <== end
+  5467  CATGAAATCCGTGAAAACCTGTTGCTAAATAAAAAGTTGAACCATAAATACTATCTGAAA
+  5527  TATCAAAATCAGCATTTCAATATTCGAAAATCTGTAAAGTAGTAAATATAAATGCTAATA
+  5587  TTACTGTCAACAGTAAGCTAATTATAGCTTCTTCTCTAAACCTTTTTAAAATAGTATGAT
+  5647  GGCACCACGTTACGCTGCATCCAGATAATAATAAAATTCCAGTGTTTAAAGCAGGCACAT
+  5707  ATTTAGCGCTTAAAGAAAAAATACCAAGAGGGGGCCATTTAGTGCCAAGTTCAATAATTG
+  5767  GGGCAAAGCTTGAAGTG
+;     G-cox3-E2 <== start
+;     G-cox3-I1 <== end
+  5784  catcaaataatattaattagttatttgtgctcaaaaccgtatgaacttattgtactaagt
+  5844  attacggctcccaggaaaaaacaacgtttttaaaa
+;     G-cox3-I1-orf673 <== end
+  5879  ttaattaatatgcctgatgcggtattctatatctttaatgcgtaattgttttacattacc
+  5939  gcattgtactattcctttttttgagaaaaagttctgatatatataataacgaagttgtgg
+  5999  ttttgatccaaactttgtaattaaataattaagaaaatagcaagaagataaaaaatcaaa
+  6059  ttgctttaattggaaaacaatagttttacttagaccaaaatatataataactttaattaa
+  6119  ccattgattatatttttgaattaaaacgtttagtggtaattttaattgtattggggcaaa
+  6179  tatatttctaagatcttctttaaggttagaaaaagttttagagcatatagtaatagttat
+  6239  attattgtaataattaaataaaatttttctataaaagaaattttcttgatttagatatct
+  6299  agtgcattgaatttttcgattaaacatattagtaaacttaaaacctaaatattcaaaaaa
+  6359  catatttggatataatatttgtattgaagttacgtttttatctacttgaataaatttttt
+  6419  ttttaaaaaaatcaataatcgataataaattattaaaaaatatgaaaaattagcagtaaa
+  6479  atctactattaaaatgttacctaaaaatctataaatttgtgtatttaacttaaaatattg
+  6539  taaaataagtgtactctgttgataatttttatttaaaaaacaattagaacggtcatttaa
+  6599  tttttcggttaatttaaatgttaacggtaacaatacaaatgattccatattatttaacat
+  6659  aacatttgcaattaatgcacccaaaattgtatttcataattttttaatatgtaatgtatt
+  6719  atgtattcttctttcaagaagatatttatataaaaaaggataacatcagataattataag
+  6779  ggagcggtatttgttacaaatgggcatatgttttgccatgacaagataagaattcatatt
+  6839  taagttcttaaaaatatctatattaacaaattttttataaaaaatagttttataaaataa
+  6899  ctttaattttattttcatattatgaatataatataaatatggtttgttttgctggtctaa
+  6959  tttagaaattcaaaaattatacaatttttgtttaaaattaaaaaattttttatattttaa
+  7019  aattaatagttttttataaaaaattaaataatatttaatttttcgagaaaactgtaaata
+  7079  ccgaattaatgattttactaaaaatgtcttagatgaattagagaaagttgtaaattgttg
+  7139  aatatttttctgccataaaattataggcaataatgctacataaacaattttttgtagtat
+  7199  acgatcctgtattaaaatgttcggtaacacattatactttttataaaattgtaaatacct
+  7259  ggcagcacttttatagtctaatcagaaattagaaatattatgtcttttaagaagacatca
+  7319  gtttaattctttcctttttcttaaaatttgctggaatgtgcaattaattaattttctaat
+  7379  tgaatataattcagttatagattttttagatttacaaaattgagatttattgcaatttga
+  7439  atttttaccactgctttttacataacatcattttgattttaatatagaaaaagtaaacgt
+  7499  tttgttatttaaatttgaaattgtacttgcttcttgacattgtatattatatataatata
+  7559  tcatcgaattgctggtgattctaaaatccattgttggagtaatttaactttgaagggaag
+  7619  agatgcagctgtaacagcatataaataattaattccttgactttgttgttgtaatagtaa
+  7679  taataccatataattataaatactaattgcagtatttatacatttcatgagctcgacaaa
+  7739  cgaatgttgttttaacaaccgaatattttcgttatttttatgagcgtatgctttaaggat
+  7799  cagattattcaatcgaattaaatcttttttccaaaatttaagcgtatcaaaacttttcct
+  7859  accaaaaaattttataagtttttttccatgaaaaatatacat
+;     G-cox3-I1-orf673 <== start
+  7901  ttttttaattttctaaaatatttttttgtattttttttaaattgattagaaaaaatctta
+  7961  tttttttattagattctgtctgaattaagaatacaaatgatgtatgtttacataacataa
+  8021  aatttaataaaatatatattttattaaattttattttaataaatattgattatactgcaa
+  8081  taaaagactattattgattaatttctgaaaaatccacacataattcaaaaaaagactact
+  8141  tacgagagtaacttttaaaaccaatttttatacattatttatcaaatacatcacatatac
+  8201  tacatgtattttgttaaaatgcgtacgtgaaatattttataaaataataattataaaata
+  8261  tctcttcttacaattatttattaataaccaatttatctatatgctagatatgtatcttgc
+  8321  cgacaaattcagagtatacccatgg
+;     G-cox3-I1 <== start ;; mfannot: no intron type identified
+;     G-cox3-E1 <== end
+  8346  AAAAAGGCTCAAAAAAAAGCAAAAAAAAATAAAACTTCTGAAAGAATGAAAAGCGCCATT
+  8406  CCAAAACTCAAACCAGTCTGTACTATTTGTGTATGCTGACCTTCGAAAGTTGATTCACGG
+  8466  ATTACATCTCGCCATCAACATGTAATGCAAAAAATTATTGCGATTAACCCAAAAAGAACA
+  8526  AACATATTACTATATTTATACGAATGCAAATAACTTACAAATCCACTTGTAAAAATTCAG
+  8586  GCAGCGCAAGCCGTGAAAATTGGCCATGGGCTAGAATCCACTAAATGAAAACCATGAGTA
+  8646  CATGTTAAAATTTTTTTTTTTAAAGATTTTAATAACAC
+;     G-cox3-E1 <== start
+;     G-cox3 <== start
+  8684  TTAAATCCAAGTTTTATTACAAATTTTTACAACAATTGTTAGTTGACCTAAACTATTATG
+  8744  A
+;; mfannot:
+  8745  tccctaattcgaaactatgcgtggtattttctaccacatagctt
+;; mfannot:     /group=II
+  8789  CTTTATTGTAAGTAAAACTCTCCTACTTAATGTACCATTATTACTTACAATAATAAACAA
+  8849  ACTTTGCATAGTATTGCTTAATCCATTCTTTTAATATTGATACAATTTCGCTTAATTTTT
+  8909  CATGTTTTAAAATTTTAAACTTAGGATTTATATTTTATACCAAATAGATTTTTTCTTTAT
+  8969  TTATACTTATGTTTTACGACATAAATACTATCTCTAATAAATAAATTTAAAAAATTTTTT
+  9029  TTAAAGTAAATCAATAATAATAATTTAAAATTTATCCATTTTCAAAATATTAATATTGTA
+  9089  GCAAATACATTAAATTTTGTTAAAACCTAATATATTTACTACAAATAATTAATTCTACTA
+  9149  ATTACAGATTATCATTATTAATTAAAAATATAAAAACTAATATGTAACCTTTTATAGAAA
+  9209  TAACAAAATACTAGAAAAATTTTATAAAATTAGCCTACTACATAAAATACTCAATTTATT
+  9269  TGATAATAGTTTTGAATTAGAAT
+;;     G-nad9 <== end
+  9292  TTATAA
+;;     G-nad9 <== end
+  9298  AAAATCGAAATCCCGATACTCTTGAGCCATTTCTAAGGATTCAGTTAAAATACGTTTTTG
+  9358  ATTTTCATCATACCGTACCTCAACATATCCACTTAAAGGAAAATCTTTCCGAAAAGGATG
+  9418  TCCATCGGTAGGATA
+;;     G-nad9 <== start ;; 138,182
+  9433  GGAAATTCTATA
+;; mfannot:
+  9445  aaccctccactaaaaccacgcatacaatttatattataagtggctt
+;; mfannot:     /group=II(derived)
+  9491  TCGTTAAATTTACTTTTTTCAATCAAAAAATTTCTATAAAATTTATAAACAGTATATACT
+  9551  GTTTCCATTTTTTGGAAAAAAAGTAATTTAAACTTTTATCAAATTATACTCTAAATGATT
+  9611  ACTCCAATTCGTACACAATAATTTATATTATCTAGTAAAAACGAATCCATATTTCAAATT
+  9671  TAATATTTTTTGTTTTCAATATTTTTATATTAATTTAGTATAAAAAACAGGAAAACTAAT
+  9731  AAATACCTTTTTTTGATTAAAAACTTATTATAAACTATAGAAACTAGTCTCTGTTTTCCT
+  9791  TTTTAACATAAAAATGTTATTATTTAATCATTATACAGCAAATTCACAAACTATTATTGT
+  9851  ATTTATATTTTATTAAAACCCATTTTAGCCAATTATCCTTTTATATTAAATAATATTATA
+  9911  TATTTTTATTTAACTGTATTTATTAAGAATAAATAGGGAATAATAATTAAATTTTTAAAA
+;;     G-nad9 <== end
+  9971  ATGGCTAACAAAACCGTAATCTGTTAGAATACGTCGTAAGTCAAAATTATTTATAAAAAA
+ 10031  AATACCAAACATATCCCACACTTCCCTTTCAAATCAAACTGCTGCTGGATAAATTAATGA
+ 10091  GATTGAATTAATTGTTGCTAATAAAGTTAAATTACTTTTTAAAAAAAATCTAGAATTTCG
+ 10151  GGATATACTTAAAAAATTATATATAATCTCAAAACGTTTTAATTTTGAAAGATAATCTAC
+ 10211  AGCAATAATATCAATTAAAATTTTATATTGTGTAAGTGTATGATTTTTTAAAAAAATAGA
+ 10271  AATGGGTTGGATAAATTCGTTTCAAACCCCCATGGCTATAATTTTTCTGTTTACGCATAC
+ 10331  AGAAATAATTCCACGCAAACAAGACTTTACTATATTTAAAGTATACTTTTCTAT
+;;     G-nad9 <== start ;; 6,143
+ 10385  TAATTTATGTAATTGTCCAACTTTCAAAACTTTTTCCAT
+;;     G-nad9 <== start
+ 10424  CGTTT
+;;     G-cox2 <== end
+ 10429  TTAAATTAATTCTCCATTTGAATC
+;;     G-cox2 <== end
+ 10453  TTCAACATATTTGAAGAAAATTCAAGATACATATTCTTTAAAAGGTACAGCTTCAAGCGC
+ 10513  AATTGGCATAAATCCATGATTAATACCACTTAG
+;;     G-cox2 <== start ;; 238,268
+ 10546  GTAGATATTATATAAAAATAATTA
+;; mfannot:
+ 10570  cccctaattgaacttaacaagcgcttctcaacgcattaagctc
+;; mfannot:     /group=II
+ 10613  GATTTCAATCTAAAATCTTAGTGACGAAATTTCACAATATTTTTTATATATTTATCTTTT
+ 10673  GGGACATTGTATTTATTTTTACAAAAATAATTTAGTCATAATAAACATATAAACAGACTA
+ 10733  TATCTAAAAAAAAAATATTCTATGTAAAATTTAAAAAATATATTAAAGAAAGATGTACAG
+ 10793  TTTTTAAAAATATTTAGTTATCTAAGATTTTCCAAACTGTATCTTATTCACTTATAATCT
+ 10853  TAATATTAAAATAAAAGCAAAAAGAAATATCTTAATACATTTTTATAATATTAAAATTTT
+ 10913  AAATGAAATTTTTATAGCTACATTTATTACTAAAATTAGTATATAATTTATATATCACAG
+ 10973  TATTCCCAACATCTGTAATTTCAACTGAAAAAACTTACTCAATAAATACAATCTGATATA
+ 11033  TATTTATTTTTTAGAAAATATTTACGTAAATTTGATAAAATTTTAACTGTTGGCTCTAAA
+ 11093  GTTTTATAGATTTCCCAAAGCTAGTGCACTATAATATTTTTATATTACACATAGGAAATC
+ 11153  GACTTGTTTCTTTTCTAAACAAAAATTTAATAAATTAACTATACCGCCA
+;;     G-cox2 <== end
+ 11202  ACAGATCTCACTACACTGGCCATAATAAACACCAGGACGATCGATAAAAACTAGCACTTG
+ 11262  ATTTAATCTACCAGGACATGCATCGATTTTAATGCCTAACGAAGGTAATGCTCAACTATG
+ 11322  TAAAACATCAGTTGACGTTACAATCGCACGGATATTTGTATATATAGGTAAAATAATTCG
+ 11382  TTTATCTACTTCTAATAATCGAAAACTCCCTTCTTGTAAATCATCATCCCCTATAAGGTA
+ 11442  ACTATCAAATAAGAACGATACATCTGTTGGTAAATTAACCACTGTATAATCTGAATACTC
+ 11502  ATAACTTCACTGCCAATA
+;;     G-cox2 <== start ;; 137,239
+ 11520  AGTAATTTCACAAGGAA
+;; mfannot:
+ 11537  ttttctttggcaagaaccgtacaagcgttttgcaacgcatacggctc
+;; mfannot:     /group=II(derived)
+ 11584  TAGTAAATTTCTACGTAAACGTATAAACACATAAAATAGGAATTATAATTTGCAGACTGT
+ 11644  ATTATTTTTTTAAATTAATAACTACTAACTCTGAAAAAATTTTCAATATAATAATCATAT
+ 11704  TTTTTTTGAAAAATTTGAAATATACCTTAAGCTCTATACGTATTTGATAAATTCATACTG
+ 11764  ATTTATATGGCAAAAAAAATTCAAATTTTCTGAAAAAACACTGAATTTTAAAAACATATT
+ 11824  TTTATAAAAGAATTTTAATAAAAAATTAATATTATTTTCATTAAACAAAATAAATACAAT
+ 11884  TTTGAAGATTATAAACTATAAAGGCATTTATTTAAAATTTTTTCAAAAAACTAATATTAA
+ 11944  TTTTATATTAAATTTTTTTTTCCTTCAAAAAATCGAATTCATTTTACTTTGTAAAAAAAT
+ 12004  ATTTTTTTCTAAATATTTTTTCTGCTAAATCAATCCTCCTATTATTATTTTTATCTAAAA
+ 12064  ATAAAATACAATTATAAATAATTATTTCTACTTAAAAATAGAATATAACTTACTCTCTGA
+ 12124  ACTACCATAACCTACTTATGCAGATTTTTTTAGCTATTAACCTTTTTGTAAATTTTTTAA
+ 12184  TTATACTAAAAACACGTACGTTATTTTCAAACTAAATAAAATTTCTTTAGTGTCTAATTT
+ 12244  ACCAGAAATAACAAATTTCTTTTCATAAATTTGACCATTTCATCGTAATCTTAAACTTAA
+ 12304  TTTTTTATAAGACTATAAGTTTAAATTATAAAAAATTATAATATTAAGCTTAAGAAAAAC
+ 12364  TCAACTTCTATCCTAAAATACTTATATCGAATTCAATAAATACCTATGGTTTCAACAAAG
+ 12424  TAAAATTAAACTTTGTTTTTAATCTTTTGTACATTATTTTTGTACGAAATTATATTCATT
+ 12484  TCTTACATAAAGATATACTTATACACCGACCAATCAATACTTTTAATTTTATTATCACCT
+ 12544  TCGAACAAAAAACTTGTGTTTAAGGATTTTAATTTAATAATACAACCAATTTACTTACTA
+ 12604  CATTTAAACTTTTAATTCTATTATTAAATAACTCAAGCATAAGAATACGTATTAATACGA
+ 12664  CATTAACGTAAACCCTTATTCAAAAACTTTGAAACCTTATACATGTATTACGTATTTTCT
+ 12724  ATAATTTAGAAAATTTAATGAATTTCCCACTC
+;;     G-cox2 <== end
+ 12756  ATACCATTGGTGACCAATAACCTTTAAAGTTAGAACTGGATCTATAATTTCATCTATTGA
+ 12816  ATAAAGTATAGCTAAAGAAGAACTCATCACTCCTACTAAAAGTAACGCAGGTATTAAAAC
+ 12876  CCATAAAAACTCTAATATCATAATAACCCGATCTGACATATGTCATTCAGTTGTTCTAGG
+ 12936  GCTTGTAACATCATATTGCTTAGCTATACAAAATAAAATCCACATAACAATTCCTAAAAT
+ 12996  TAAAAATGCTATGAAAAATAAATCTTGGTACAGTGTAACAATACCATCCATAATAGGAGA
+ 13056  TGCAGAATCTTGAAATTCAACTTGCCAATTTTCAGCAGAATCAGCAAATAACTCATACCG
+ 13116  AAAAAGATCCAAAAAAAATATAAATATTAATATAAAATTAAAATT
+;;     G-cox2 <== start ;; 10,139
+ 13161  ACGAAAAAACGAACTTTTTAATAAACACAT
+;;     G-cox2 <== start
+ 13191  AGAACACATAGATATCATATTTTTATGCTTTATACACAAACCCTAAAGTTTTTCTCTCTT
+ 13251  CAAATTTTTTACCAGACGTTAAAGTTTTATAAACAAGGACAAAAAATACTATTAACGAAA
+ 13311  TTACTGAAATATATGAACCTAGTGACGCAACTCAATTTCAATGAATAAACGCATCAGGAT
+ 13371  AATAGTACACTAGTCTAAATAAACCGTGCCCAAAACCGTATAAACTTATTATGCTAAGAA
+ 13431  TTACGGCTTCGAAGAAAGTAAAATTTAACATTTCTACCGTTATACGATAAATATATATGT
+ 13491  TTATTATATTAATTATATCAAAAATTATATACTATTATTTAATAACTATTTTTTATAATT
+ 13551  TTAACATCCGAAGTTATCCTGTATTATTAATTTAATAATATAAC
+;     G-orf621 <== end
+ 13595  TTATACTAAATTAATTTTAATAAATCTACTATAAGTAGATAAATATCGATATATATACGT
+ 13655  ACGTATTTTAGATACTGAATTATATTTTTTATACAAAAATTTTAAAATTCTTTTATAAAG
+ 13715  AATATAACTTAAAATTATTAACTGCCTATGTAGACTCTCAAAATAACGATAATATTGTAA
+ 13775  TATTTTACTCATAAACCTATTTACTTCATTTACAAGAATTTTTAAATTGAGTAATAGATT
+ 13835  CTTACAAGAAAATAACTGTAAAATCAATCTTCTAACCCGGTTAAAAAAATTTATATTCAA
+ 13895  TGAAACAGTCCACTTACTCAAAATTTTTTCATAAAAAGATAAATAATTTTTACATATTAT
+ 13955  AAAATTTTTAAAATTATATTCATTATTTTTAAAAAAAATGCTATTCAAACAATTAAACTT
+ 14015  CAATCCTGATAAATTCAACGTTGCATTTGGCCTACAATATTTAAATTTCACTATATTTGA
+ 14075  GCAATTTAACACACTTAATCCACATTTTATCAAAAATTTCACAAAAAATTCATAAAATCG
+ 14135  AATAAAATATTTACAACTTTTTTTTCCAAAAATTAAAAGTCTACCAGAATAATATACTAT
+ 14195  TTGTACCATTTGTCGATACTTCATTAATTCTAAATATAAATTACTCTTCTTTAACTCTAA
+ 14255  ACCACCGTTACTATTAGTAGTAATTTCTCTTGTGAATAAAAAAAATTCAAATAAAATAAA
+ 14315  TAAAAAAAAACTTAATAAACTTCTTAAAAATACATTTCACAGCATTTCATATTCAAAATG
+ 14375  CACTCCTGCGGCCTTCAATAAAACTTTATTAACATGGTTAAAAGCAAATGCACCAACTCA
+ 14435  AATTTTCGATAAAAAAAAATATTTTTTGCAAACTAGTATATTCTCCATAATAGGTAGACA
+ 14495  AGATATAAAACCCAAATATCTATTAATATTAATATCAATATATTTAGAAAAAGTCATAAT
+ 14555  ACCACTTATAATTAAACGTTTTCTCCGAACAGATAAATATCAAAATCAAAACTTATAACT
+ 14615  AATTTTACTTTGCCCAGTAGTTTTTTCGTGTAAAAAATTGTTAGAATTAACAAAAAAATA
+ 14675  TGAAATATTACAATTTCTTAAATAAAACAAATCCATATCATTAGACATGATACAGTTTGC
+ 14735  TAATTTTATATAACTTGGTATTAAATTATTTTTTTGTAAGTCTAAAACCTTACCTAAATT
+ 14795  TAAACAAATTCGAGAATACTCTAAATAGGGACGTAAAGAAAAACCAAAAATCTTTTGCAT
+ 14855  AATACAATCTTTTAATATAAATAAATAAATAATACAAAATTTCCTAAAATCGTATCTAAA
+ 14915  ATTAATAAGAGTTTCTTTAAAAGAAATTAACTTATAATAACATAAATAATACTTACAACT
+ 14975  ACAAATTAAACTTAACAACTGCAAACATCAATAATTAACTTTTTTAAATTTGATTAAAAG
+ 15035  ACAAATGAATTTTTTTTTTATACTATAGATTTTTTTTGGAAAATCTTGCCTTAACCTAAT
+ 15095  ATTTTTTTTAGATTTACCATATTTAGTTGTACTCAAAAATTTTTCTAAAAGACTTTTATT
+ 15155  ATAAACATTCTTAGAAGATAAAAATATAATATTTTTATTACCCATTACTTCACTAAGTGG
+ 15215  AAGTAACTGCGTATATCAAATTAAATACATTCGTACACTGAATGATTCCATAAATATAAT
+ 15275  TTGCATTATCACAAAAAATGTAGGTAATTTGAAACTTCCAAAATAATTAAGATTAGCCCC
+ 15335  ACCATATATTAACATCATTTTAAATACAACATGATTATAAAAAATAATTAAACTTTCTAA
+ 15395  TTTCAAAATTATCTCTATTTTAAATAAATTCGTATCTTTTCGTAGAAAAATATATCGTAA
+ 15455  CCGCAT
+;     G-orf621 <== start
+ 15461  ATCTACTATATGACAGCATCTAACTCAATACCTTAATAAATTAAAACTTTCTTTACAACT
+ 15521  AATAATATCTTTTTTTAAAACCTACCATTCTACGATAGTGCTTCAATTAACATTTTTAGC
+ 15581  TATTCTATCTGAAAAAAATTACAATAATTAACAAAAATCTTACCTAATCACAAATAATTT
+ 15641  TGCAAATTAAATATAACATTGGTATATACATAACAATAAAACTAACTTTTATCAATTTTT
+ 15701  TATACACGTCAACTTAACTGTAAACTATCCAATCGAAAACAGTACTTATCTTAATAAAAA
+ 15761  ATTTATAATATACTATTCTTATACATTACCATTATTTTAAGTATATTTGCTTGATTTAAG
+ 15821  GTATTATAATTATCTTCATA
+;     G-cox1 <== end
+;     G-cox1-E12 <== end
+ 15841  TTATCTACAATAAGAAAATAAATCAAGCTCGTTAACAATTTTTTTAATTTCAGACTTTAC
+ 15901  TCCTGGAATTCGTCTTGGCATTCCGGCTAAACCTAAAGCATGCATTGGAAAAAAAGTTAT
+ 15961  ATTTACACCAAAAAAAAATGTTCAAAAATGTATTTTACCTAATCTTTCAGGATATTTATA
+ 16021  TCCACTAATTTTACCAATTCATAAATAAAATCCGGCAAATA
+;     G-cox1-E12 <== start
+;     G-cox1-I11 <== end
+ 16062  ctccacaaagaatagaatttttaaaaacaaatccttctaaaaaactgcacatacaactta
+ 16122  attttgtatacagcttatttttataactaataaggcactatattatccattaaaaaatat
+ 16182  aaaataaatttaattaagtactttacttaacacttttattttattaagtttatcgattta
+ 16242  tactaaatttataattttaaaactaaccttttttctatttacctgtgcttatattaatta
+ 16302  ataaaaatatattaaaataataactacatatattcaataatctttcctttttaaaaaagt
+ 16362  ataacctaacgtattaatacaactatgttactaataaaaagctcctgtaatcctattgaa
+ 16422  ttaatttttttgtttactacaaataaaaatatgaataaaaatgaatttatatatttctag
+ 16482  attataatattataatatgtttacttataatttcaaagaatttactttatatagttatta
+ 16542  ttcaaaaaacaaagcaaatcttcaatacatactaaaatatactgaaatacattaaaattc
+ 16602  ttaaacgtaaaatttttacctttttcatatttaattgaattctatgattttacgctaaaa
+ 16662  tatatacatataaaccttataaaaaaattatcatatatgcacacgtccatctcttataaa
+ 16722  ttttttattaacttgttaaacatagatacatatattcaaaaattctactattcaaatact
+ 16782  tttttcaaaatcattattaacatccaattgattttaggt
+;     G-cox1-I11 <== start /group=II(derived) ;; mfannot: splice boundaries uncertain
+;     G-cox1-E11 <== end
+ 16821  ACAAAAACATAGCCCCCATAGATAAATAATACTTCGAATGCTGAA
+;     G-cox1-E11 <== start
+;     G-cox1-I10 <== end
+ 16866  aagcaagctgtattccacacataacaagcgatttttcaacggcattatgcgttctgatga
+ 16926  aacaaagcaatatttttatcacaatagtataatatcatctaaacaaataat
+;     G-cox1-I10-orf671 <== end
+ 16977  ttaacttttattaaaattaactttcaataatttttgtaatgaaaatccatcatatttacc
+ 17037  tgaacatattcatatataatgaattttacataaaaataattgtttaacaaaaaaaagtga
+ 17097  attctttcaaagtatctctgcagtcttatcaactttttcttcccgcgaagaaaaatatca
+ 17157  ttcagtagcttctaataaacaattcggaatacaacaacgaagtttctctactgaagtttc
+ 17217  taatgaaaaatcagacaaaattggaaaaaatactgaatctggagaaaaattagttaaaca
+ 17277  ccttttgtaaaaatttgttctctcagttggtatatataaaagaattactttacgggaatc
+ 17337  tttagaacatcgaactgttagatcaataccaaatttcttaaatgcccagtttgcagactg
+ 17397  ttttttataccgatgtgctaatgttaacgcagcacttcgttttaatgcatgtcaaaattt
+ 17457  gaaaaaaatctttcgagcaaaaataaaataataattttcaatattttgtataataagatt
+ 17517  ataccgatagacaacctctcaatctgaagcaaaggcaagtcatttatcttgacatttccc
+ 17577  aacaaatttaatacgtttgcctaaacgattaatcctaaaaaatccgaattcaacatactg
+ 17637  tttaaataatttcattattggaatgttaaattgtaattgaaacttagaaaattcttgaaa
+ 17697  tatatttatattagtactaatacaaaattgataatttttatgtagataataattcaaaaa
+ 17757  aaaaatctttttttcagaatattttataaaaatattaaatttcaaatcaatacctaaaga
+ 17817  acaacttatataattagataaacacaccaacactgcatgcataagttcttttttaccagc
+ 17877  aatacctaataaaatacaatttaaattccgcacataatataatttattatgataccataa
+ 17937  atattccttatttaataaatttaatgaatttttcattgttaaatatttgtgaattcgaaa
+ 17997  ttcgacaaacttatcaagctcccgaaaaaaaatatcataaaataacaagtttaacataaa
+ 18057  atcttgcaaatgtaatacactattataatagtaatcaccatcatataaattttcaaaaaa
+ 18117  aatataaccacaattccaaaatttattaattaacgaaactactcaataatcatttaaatg
+ 18177  atgactaataacgcttaaaaaaaaagtacaatttttaaaatcaaatatttgaataaattc
+ 18237  actcttgataaaccaagttacccccttccacttatctttaatatgttgcaaaaataaatg
+ 18297  attcttagtagtctcaaaaaacattttaaaagaaaatggcttaaacactacttctaaaag
+ 18357  tattttcaatgcctgctgaattaacttatctcgcataggtattaaactaaacaattttat
+ 18417  actaccataaacgttacttccaaaaaatcgcttaattggatgcggattataacattttga
+ 18477  ctctaattcttctgaaagctttacaatttgctctaaggtaagatttatcgaaaataactt
+ 18537  aactttcttgctattataataggatttcaaaaaataattacaataacaaattaacaaata
+ 18597  acgtggatcacataatattctatatataaaaaattgagtatcaacaatatttttactatt
+ 18657  aacgccagatccaatgtgaattaaaaaatgatctaaatctttaaaaaagattgtcaattc
+ 18717  tttaagagaaaaaatactttgaattttcttatcaataactacgcgcttaaacttattttt
+ 18777  gtaaaattttaacaaataccatgcttcactaatttgtttcatctgagatatttcacaaac
+ 18837  ttgatcgtaaattatatatcaaacatttatacttactaatacatccgctactgctatctc
+ 18897  gcttctaccaagtgtagaaaattcatatgtacacttgacatgtcttaactgcttagctca
+ 18957  agctaaaaaactatttaggtaacttattctaaacat
+;     G-cox1-I10-orf671 <== start
+ 18993  cctctttgaaaactattaacttttaaataaaattcgattaaaaacatttgtttattcgaa
+ 19053  taaaacattgtaatcagtgtacactccatatattatttattttagaataattaataacgc
+ 19113  aaccatataatcatattttaaaaagctcttataggtttaacctactaaaaaatatttact
+ 19173  aagaaaaatgttcatcttagattagcctttcaagcgttaatcgcccaacgtaatgaaaat
+ 19233  gtgctacaacataattgggatagtcaattttgctaagctatgtgtactcaactacattta
+ 19293  ctcccaatgaacatagcaaactaattacttagtactatgctctattatccgttttctata
+ 19353  tcctatttttcataaaaaaatgacttcataataaaacaaccaataatattaaaagcatag
+ 19413  tataatatttaaattaacaatgcacttagcctatttttatgaaaaatttaacttacataa
+ 19473  gatttactacgacagtaagtacttttataaaataaaataacgcattactaatttcaaaaa
+ 19533  ttcttaaaattaatctttaattcaattttctaaaaagaaattgtattaactaccctcaaa
+ 19593  ttactttgaactactaaatataaccagaaacgaaattacttcatataaaaatataacaat
+ 19653  ttcgacaaaacaatatacttcaatcaattatataatttatatttgctatatattaataaa
+ 19713  ataatgtttagaaatctctcacttaaattaatttaaaacataaaaattaaaccgttcata
+ 19773  ctttttctaatccatattattaaattaattatacatacttatctaataattagcgtatat
+ 19833  ttaatcttttctttctatcaatcagttttagaaaacacaaagttattggaattag
+;     G-cox1-I10 <== start /group=II ;; mfannot: splice boundaries uncertain
+;     G-cox1-E10 <== end
+ 19888  CGAAACCATAAGTATCATGAAGTGCAATATCAATTCCCGAATTTGCTAAAATAACCCCCG
+ 19948  TTAGTCCTCCAATAGTAAATAAAAATAAAAATCCAAAAGTAAATAAAACAGACGTATTAA
+ 20008  ATTGTATAACACCTCCTCACATAGTAACTAATCAACTAAA
+;     G-cox1-E10 <== start
+;     G-cox1-I9 <== end
+ 20048  ttagattggataaaaccataaattttcgccttttggttttaaactctattgaactttacg
+ 20108  caatacatcctatactataaagctcacatatataatttaaaacagcttactattaattta
+ 20168  ttaaaaaaacttttaataccctaagaacatatatgaatatttaagcattcaaaaatattt
+ 20228  agtaatttttttaaaatctaattaaataatatattgaacataaaaaaaatttagtaaaac
+ 20288  actaagataaattttttagtcatcttatatcataagatatgaattaaaaataataaaaag
+ 20348  cgaaatataaacaacgtactcaacatagccttagttatacatacttaaaatatttataaa
+ 20408  attaaatacttatattcaatattattcatttcccaaataaaagcccatcagtatatcaca
+ 20468  taattgcatcttacaataagatacttcctatttttagaaaataaattttttaacttattt
+ 20528  cttataaagaaatttcaaacaaaaacacataacataaattttcactattatgtaaactaa
+ 20588  taatagacagtaaacactactaccactctctatttttccttttgttactttaccaatctt
+ 20648  ttttcaaaaatatcttaccctaaatttttttctaaaaatacaatccattaacaccacatt
+ 20708  ctatcactacccactaatccaatctacaataaatttaaactaaatttctttgcatattaa
+ 20768  acgaaaataaatgatctaaattaaataaattattaataatgacatactgaacgtactact
+ 20828  ccca
+;     G-cox1-I9 <== start ;; mfannot: no intron type identified
+;     G-cox1-E9 <== end
+ 20832  AACTTTGATACCTGTAGGAACCGCAATAAT
+;     G-cox1-E9 <== start
+;     G-cox1-I8 <== end
+ 20862  aaggatgatcgatataaattttatctaccccttccgaaccttccaagctaattactcagc
+ 20922  ataaggctctgtaatgaaaacaactcgacttacgacttgttgataaaaagctaaaaaaaa
+ 20982  taatttctttactatatacaatatattgagaatataaaataaaattatccaaaataaata
+ 21042  caataggaacatcttacctacatgttaaccaataacgtaaaataaacatattaaaacaaa
+ 21102  tcttaagcctataaggacattaaacatatttaatatattttaactaa
+;     G-cox1-I8-orf385 <== end
+ 21149  ctaaactaaaataggtttatttaatgtaaaaaacaataacaatcccgtttggtaatatga
+ 21209  gatttctaaaaaaaatattggtacattactagtatgttgtatcgacgcaaaccgtcttaa
+ 21269  ttttaaaaacatattaagccaaatacaataacaatcagtagctattttagaatctgactt
+ 21329  aacacgtaaatgccaaggtaatccataaggagaacagtttgcttgatttttcaaaaaatt
+ 21389  acgtataacccaagtagagcgccgtaaaccctctaatcgaaacttctgaattaaatactt
+ 21449  tttaaatactttatatattagtttgtcaaaaagttttaatttatttgaaatacctactct
+ 21509  cacaaaataactaaaaatcttttgaataataacattaacttttaatattactatttttgt
+ 21569  agctaaaaccaaaagacttttaatagtttgaatcaaacactgttttaataaactaaaaac
+ 21629  ttgtcaagtagggtatatagacactattctcgaagaataaaaacaatctgaaattaaaaa
+ 21689  attcttacttaccgctccaaatttaataaaaaatataaagcctaaatagaaacaataaat
+ 21749  tttacttaacctctgaaaaaaataaggctgcactctaatatatatcaaaccaacttgata
+ 21809  aaacaataactgaagttttgaccaaaaggccataatatttatgttatctgtaacaagtaa
+ 21869  taaaatactcccattacaataaaataattctataggaacataactataatctatttttcc
+ 21929  taaaaaataacatcttttataaacccccctaattttcttatcattaaatttaatacagca
+ 21989  cttaaccaaaaaatcattaaatatccatcaaattacaacattttgcactaacaaccaaac
+ 22049  acgcaatcaaatttctaaatttatatcgcaagaaaactgtagtttgcattggctagtgta
+ 22109  caaactaatccactgctttaatagatatctcaaatacaaaggaatgtgaaaaaaacatct
+ 22169  attaattaaatctgaatttaatttctctgcgtatttaaataatttgattttaaaaaaagt
+ 22229  taaacgtttattacccatcatttccttcaatgttctaaaagcacacgtagcatttcgacc
+ 22289  ttttcgatttgaatacat
+;     G-cox1-I8-orf385 <== start
+ 22307  attccgtgtaaactttgcttcataatagggttcaattaaataacaaaaaataacctgtac
+ 22367  tacattatctggaaagtttctttctacataacctattcatttttttttatttcaattaat
+ 22427  tcaaaatcattttcgaaataaacgaaatgtgcttaataaaatacaataatttgattcttt
+ 22487  aataccattaatttttccagtaaattttctaattctagaaaaaatatttcctatgtctgt
+ 22547  taccgccaaataataaatacaaatattaataataaatttatttattaatacaaacgaact
+ 22607  agcatcttggttattagatgaaagacgacaaatttctcgttgtaatttatacaatatccc
+ 22667  caaaataaatttatgaaattttatacccaattgccaaaccccgcactcaaggcacttaac
+ 22727  taattgatttgtatagaatttaaactgtgataaaaaaaaccgttctattttttggcctaa
+ 22787  atgcacctccctaaaaatcaaattttcataaaaaatctacaattatttctcttaacccta
+ 22847  catttataaatacccttaaatctagttaaaagcattcaatctgaataagtatataaacct
+ 22907  acttataaaatttacctaaaatctaaaaatcaaatactcatagtatacatatttaattta
+ 22967  agatatttttgctgttttacttaatttgaatttactgtttaatcaacacattgaatcctt
+ 23027  aataaactatattaatttttagacaaaaattatataatattttatcacacaaaaagcaat
+ 23087  attctaacagtacaccaatatcacaaaatattttaagtgcatcaaatcaatcaaaaattt
+ 23147  gaaatttatataatttaatccaaaattaaaacaatttttccaagaattataaatagaaaa
+ 23207  attatgtatttaatttatttctaataattaacttagaaataaccaccaacatcgcaaaaa
+ 23267  aagtatctttaataatctatcat
+;     G-cox1-I8 <== start /group=II ;; mfannot: splice boundaries uncertain
+;     G-cox1-E8 <== end
+ 23290  CATAGTTGCTGCTGTAAAATAAGCACGAGTATCTACATCTAAACCTACTGTATACATATG
+;     G-cox1-E8 <== start
+;     G-cox1-I7 <== end
+ 23350  gttaagatcgaggtacataacctctccctcttgaactgtgcatgccagttagccagcaca
+ 23410  cagctcacaataaaaaaaaatctttttaaacgattactaattaaacaaggtttaattacc
+ 23470  tttaattattttgtaattttttcactaaaataatacataatatagcacaaatattaatta
+ 23530  gttaagttataaagtaaatctaataatatttttaaaatcttccaattacaaaaattttct
+ 23590  cttaacgctctgatattatttttcctaaaaaaaaaataaattcaaaatcctaaattccat
+ 23650  tttacaatatccatataccacatttctaaagtttaaatagaacattaactgacttctaat
+ 23710  tagaatcgctatatgtaatttgtactatataaaaagcaaaatccaacggatcctcttaca
+ 23770  aactcacttgcataaaccaaagattcgcattagccacataaaactaatcatttggtacta
+ 23830  gtatatgcaaaatcgttttacaaacatttcagagaataccactactttcctctgcttctt
+ 23890  cttactctatcgctggataactactcaattagctgctaatttatacactcacaatagttt
+ 23950  attttttttacaaaactatcttcataaattaacaaaaattttaataaaaaatttaattaa
+ 24010  ttctaataaaattccagtcgaac
+;     G-cox1-I7 <== start /group=II
+;     G-cox1-E7 <== end
+ 24033  ATGCGCTCATACAATAAATCCTAAAAAACCAATACAAAGCATA
+;     G-cox1-E7 <== start
+;     G-cox1-I6 <== end
+ 24076  attgttataggtaataaatcaaattctatgattcataaaccccaactcagatccgtacac
+ 24136  gcaaatctctaagcatacggctctttaaatctaaaatagcaaattttctatctttcattt
+ 24196  tcttttaacaataccaaattaataaagcattagatatttacaacatatactaaattaaca
+ 24256  tttcttctttaatcaacttatagaaacaatctttttatcttttatacgcatattaattaa
+ 24316  caactgacactttaaaaattctcttatataagatatcaaaagcacattaaaaataaaaaa
+ 24376  atctcataaatt
+;     G-cox1-I6-orf676 <== end
+ 24388  ttatcgtcgcatcaaaaattttttatacagttttcaagaataccaatccaaacttctacg
+ 24448  aacttcacgttttccaattaacggtaaaaaataataaaaccaattactaagcaacaatca
+ 24508  aagttttgttttcaatatacttattgaaaatctaccacttgataacacattagaatataa
+ 24568  gtaacgtattttacatcgtagtgtaacaataccactgtttataggatataagctaaaatt
+ 24628  accgcaacataaaaacatacttaatataccccaatttactatagtcggaatacgtaccca
+ 24688  tttaaataagtgtcaataaaaaattcagatataaaaattaaaacaattcctatatgaata
+ 24748  ctctcatttaaaaatatctaattgagactctaacaacgtaaaattacgttttcaaaaata
+ 24808  taaacttatcctaaactttaaatttctaatctctgctaacgtacgcaagttagttataat
+ 24868  aagcctattcccatactgtatatgaaaaattttcttaaaataatcctggcacaaacagta
+ 24928  attaaatcaattataggaaaaatgcgacaattctattcgtcaattttttattctcctgta
+ 24988  attacatatatatataaaatgaaaattcaataaagaataatatcaaccaacctccctaat
+ 25048  acttattagtaaataatttatgaacacaaaacctaaaatctttattgaaaaagcgcatac
+ 25108  ccaattagtttttaatttttcatcaacacccagaatataattcacacctaacattcatcg
+ 25168  aaaatttttaaaatttgatacctttctaaaataatattgtaattttttcgagatatttat
+ 25228  gcaatttgtaaaccaactaaaatcagcacaactaaaataatttttaaaattaatatcaaa
+ 25288  tatataaaatttctgaaaaaacccaatacaactttgattatcaatattattatattctga
+ 25348  taaagatttaattttttgcaaaaaaataatttcgccactttcacttaaaccataattaaa
+ 25408  tcataaactatactgctgaatatcatagctaattttaaaaattgtaggaaatcattgaca
+ 25468  attttgtcgaagtgagaatccggaaaaaaacacattatttaatacttctaataagggctc
+ 25528  aatgttaaattgaactaatttttgcaataacttctcaaatttagaaaattggtaaaaaat
+ 25588  ttgaaactttttatatttatctataaaaaaataaactaattttaaattctttaagtattg
+ 25648  acaaaatatagaattagtataaatactattaaataatccacatttatgtgccgtattcaa
+ 25708  acgcatcaaaaaaattctgtgatgcatcaaaaattttattggtcctttaaaaaaaattca
+ 25768  ttttttttcaaatgaatctaacattcattttaactttaaattatcataatttaattcatt
+ 25828  aaaaaaccccaataacatcagcgtgtaccgagccttagttgaaaatattttaaacctaac
+ 25888  gcgtcaatttacacaaaacttcaaatatctttcatgaatgctattaatattttttataaa
+ 25948  agaatcttttgaaatttgctcaattatgtaaattttccaaataaatgaattgatattaga
+ 26008  ttgtattcgtaccaatcaccctttccaaaaaaaattaggtaaatctaataaaacaagaca
+ 26068  aaatcgccataattgacagcgcttgagaaaaaaaactgaatctataataaatttagaaaa
+ 26128  aatatctgaaaataatataataattctttgaaataaaaatctaaaattcatatcaatttg
+ 26188  ctgaacaatccgctgtgcacaccacgcgtaaatcataggtcatatttcttttttaaccaa
+ 26248  tcaaaatacttctaaaattaattttgattcttctaaaaaacattgatttttaaaagtatt
+ 26308  tacaaaaatatttttaaaaattgaatgatgatatcttacttgaaaacagctattatatag
+ 26368  atttcaaaatttattaactttatctaaccgacttttaatcgactgtgccat
+;     G-cox1-I6-orf676 <== start
+ 26419  cttattctaaaatacgttttccttttatctttacatgtctaatctcgttacttttaaata
+ 26479  aacaatacgctatttatattttcttttattttatacattcataaaatttaatctattttc
+ 26539  taatatattttagcgttcacggaaatatgtgcgaaataactactcatgcaaataaaactt
+ 26599  ttggttaaatagtatttaaattatttctaaaaattaaccattataaccctaaattttgtt
+ 26659  tttatattcaaataaaattaaattgaactgctactatttttagagttgtataaacaccac
+ 26719  ac
+;     G-cox1-I6 <== start /group=II
+;     G-cox1-E6 <== end
+ 26721  GCATAAACCATACCTAAAAAACCAAAAATACGTTTTTTTGAAAATAATTCTATAGTTTGA
+ 26781  CTTATTGTACCAAATGCTGGTAAAATTAAAATATATACTTCTGG
+;     G-cox1-E6 <== start
+;     G-cox1-I5 <== end
+ 26825  agtatgatagattacaatatttactaatgtagccccataccgaactgcacaagcaattta
+ 26885  cactgcaaacagctcttaacaaacaaattataactttt
+;     G-cox1-I5-orf550 <== end
+ 26923  ttataaaaaaaatcttttattcaaactataaataataaaatgccaagcaaatcaaataga
+ 26983  atggctaaataaactaaaagtatacgaaccaaatcaattattcgcttttaaaaaaaaact
+ 27043  cctaaatatcttacacattttgcataaaaattgaattagtactttacgaaattctataca
+ 27103  aactttcttattaactaaaatacaactcgatacagaaaaatctatatatatcacaaattt
+ 27163  taaccttaaacttctcaaaaaactccttactatttgaacatagcgttgaaaaacaaatcc
+ 27223  taaaaaatatatactaatgtttttataccataccaaccactcaatagttcgaatcactac
+ 27283  agttaacccacgcgccttagaaaacactttaaaacgttcttgtataaaacttaaacaact
+ 27343  tttcttcctaataacaacaataaaaacatctttataacgaaccaacaaccataacctctc
+ 27403  ttctaactgcttgcgtaaatataatgttgaatttaaacaatttcttccttttttttgacc
+ 27463  gttgagtctctcaatttcaacattaaaaatatcccttaacccatctaatataaaatttac
+ 27523  taaagaaactccaattctgcttttggaaaaaaatcgatttcaaatataacctcctgtttt
+ 27583  caaccagaatgacagatttcttaatcaatatattataatatttttaaaaaaatagggcat
+ 27643  tggaaaattaattcgaatccaattacagccctttgtatcaaaaaaatttacaaaataccc
+ 27703  gcataaaatatgtttaacttctaataagttagtagaaaacctaaataacctcttttgggt
+ 27763  acaaatcgtattttctgacaaaatcgtatacaagtgtagaagagcacgcggcgcattccg
+ 27823  tcctacacaataaccataattatcaaaatcagcatgtacatcaactactggttctaaaag
+ 27883  ctgcataaacaattcttgcacaattttatcgtaaataaaaaattttactgagaatttact
+ 27943  aatccgattataatttaactgcttagagtagaaagcaactaaatttttctttgtaatttt
+ 28003  cgtaaaccaagtgaatttatgcgttgaggcttttagatacagttttggaaccaaatattg
+ 28063  aaacgacttacttacaatttcaatcgcagctaaacatacatctggctttaaaactcaatc
+ 28123  cattatcagggattgaactaagattgatcgcataccatgtttataagaaagtaatgatat
+ 28183  gtatttttgacgtaattttactaattctaaaatttccatgctatagctacgcatgggtca
+ 28243  taattttaaacttaacatagtaactaattttggtattaattttttacttttattcataat
+ 28303  tctgaacctgtattgtagtacaatacctcttatactatgaattaaatagaaacgtttaat
+ 28363  tctttgctcattataatgacaaatgccaatatgccatttaatatttcatctatcaacata
+ 28423  tttgctaacagcaacgcttcgatattttccactagaaaaccacagcttaccgtagactac
+ 28483  aatatgcagtagaaacgtactctttctgaataagctgcttataataaactttgagtaaaa
+ 28543  cataataccatcgtgtgatccaccctttaacat
+;     G-cox1-I5-orf550 <== start
+ 28576  atttacctaattcggaaaaagcttcctttacatatactttggcactaaagtatttgacta
+ 28636  attacctttttaaaagaaacaaatctaaatgataaatattatataatattaccaaccatt
+ 28696  catttaaattcgatgtataaaaatgtgtcttccagcagatttttgctctttctaattaca
+ 28756  aaatgataaagggaatatccactaaaatatattcaaaatgttacgtcgccc
+;     G-cox1-I5 <== start /group=II
+;     G-cox1-E5 <== end
+ 28807  ATGACCAAAAAACCAAAAAAGATGCTGAAATAATACAGGATCACCACCACC
+;     G-cox1-E5 <== start
+;     G-cox1-I4 <== end
+ 28858  atttaaaatagaatctttttactaaaattctttaaaaaaaactgtacttgttacttatta
+ 28918  acatacagcttaactaaataaatataatttactcatattgaaatacttgctcttaaaatt
+ 28978  atacaccaccagctaaaagaggattcttttacaagcatcacaattatcttcttttcatag
+ 29038  aattcttttaaaaaaataattaacaatcattattaaaaaaatttcatgaaattactgtat
+ 29098  aaatttttaaacacactctcagatactactacaagaaaatctaaattaaattaattaata
+ 29158  tgaaacgcaaaatttatccaacaataataaatttttcccatattgaaaaatcaaatacaa
+ 29218  tttttatttattaaaatctctaatcttcattaagtacattactaattataaacaaattac
+ 29278  aattcaactaactgatatcgtcatctttttcttttttcacacaacataacgactaaatac
+ 29338  tacaattttaattaaaaaactaatctaaaaattctaaaaccaactaagattataatttta
+ 29398  atgataaaataacacatcacac
+;     G-cox1-I4 <== start /group=II(derived)
+;     G-cox1-E4 <== end
+ 29420  TGCCGGATCA
+;     G-cox1-E4 <== start
+;     G-cox1-I3 <== end
+ 29430  ttcttgatagaatttattttatttatataaataccccaattaaaacttattaagctaatc
+ 29490  tcttagcaataagctctttaaattttttcttaaaaaatttccgtaaatacataaattttc
+ 29550  taacgtatatacaacttttataatctctaaaaaaaaattatttttacccgtttttataag
+ 29610  taaaacattttattgctcaagtaaacaaaatcttaatttatttcgctttaatcgctaaca
+ 29670  cactttgtttactactaacaataaaaccgccattattcccatacttaatcttcacacttg
+ 29730  taaataagcgaaactaaatctaaaaattttgttattctgcgttgtttaacgttttattta
+ 29790  cttaaaagataatattatatagctctgaattttttctcaatatcaactaatatatacact
+ 29850  atatctatttaaattaaacataatcaaaaatttacttctactcaaaaaatacataatttc
+ 29910  aactaaaaacaacactataatccacatttaaacactaactgcttcgatgtcataaaaaaa
+ 29970  atttttactttaaaaaagaaaattgaatcggcgcac
+;     G-cox1-I3 <== start ;; mfannot: no intron type identified
+;     G-cox1-E3 <== end
+ 30006  AGAAACGTTGTATTAAAATTTCTATCAGTTAAAAGA
+;     G-cox1-E3 <== start
+;     G-cox1-I2 <== end
+ 30042  taatagttcgatcagttattctttaaattaaccagcgctatttcacacagaaacaagcta
+ 30102  atttcttagcatatctgcgttccgataattctattgagaaatgttttctaaacaaatgaa
+ 30162  aacctaaaaaatctataatataaaatttatacacaactgctaattgtttaaaagtattat
+ 30222  taaattttaatgaatctaaacacacacgcataccaaactcacaaactaaagtaacctaaa
+ 30282  tcttaatatctaaattttttataattatt
+;     G-cox1-I2-orf580 <== end
+ 30311  ttatctatctttaaattctgagacaaaatccataattaaatttgtaccaaaacgaacata
+ 30371  aacctttttagcagattttagcttataccaatgcgctatcgtcaacgctaaacatctctt
+ 30431  caaaagataaaaaatttcatataaaataaccgtattactcgtaattttgtaataaagttt
+ 30491  aatagctatccataatctaccaaaccatcttgtaatagcgtcaatagaacctaaagctaa
+ 30551  taatttatcacaacgccgagcaacatattttatatgatttgttttgcgcgcaatttgaaa
+ 30611  aaaacctaattttgtataatacttatataactgagataaaggtactttaaaaaaaatacc
+ 30671  accagcacaaacttttttaaccaaaccaggcgcagtatttaaaagaattccattttttac
+ 30731  taagttatatcccaaaaaatgagtacctatcccactacaacaataaattccagttttagc
+ 30791  agtacatatttgtaaaaaaagtttcgtttcaataaaaaaaacaatttgttgcaatatcaa
+ 30851  taaagcttcctgttcagagcctataaaatataaaagaatttcgtttgaataacgataata
+ 30911  atgtaaactactacgcgaataaattcccttagtatgaattctcgtcaacttataaaaata
+ 30971  tttaacaaacttccttttaatcaatctagaccaaaaacaatttattttaaagaacatgcc
+ 31031  tcttcaaaaaaatgcagacacaatatctgaaatttcaaatctaccagtcaaacttctatt
+ 31091  aaattgaggaattaagctgctataaatcattatatctaattcatgtaaacaaatattcaa
+ 31151  aataaaaaaagaaaaaatatgttttaaaaaaaactttttacaatattttacagtataatt
+ 31211  attcgaaaacactacaaaattatccttgaaaacttgtattattaattgaattaaagaata
+ 31271  ttcacaaagtttactatagagaatacaaaataaagattgaacagtaaaaatactatctga
+ 31331  cattccaacaatattcaagttaattgatcaaattggagatttagttgttgagcgaatacg
+ 31391  tgataaacaagaaaacacattacgtctataccgaaaaccaaaagaaacattcaaaaatct
+ 31451  atactcataaatgggccctaataataaaattatagcctgttgaataattacctctgaaag
+ 31511  atgattaaattcaaacttatcactgtaaaaatttcgtgaattaacaaataacctcttagc
+ 31571  acaacaatatgtacctaaccgaatactttctgctaagtaaacaatcccacctaaagtagc
+ 31631  tttcatcggaaaatttgaaatacctcgaattctaatccccctataaacataaatcaaaaa
+ 31691  attaggatctatgagtaatttaaataaaccactacttttaacagacaaacagcgtttgca
+ 31751  attaagaacaaaagcattgtattcatataaaatgccaattaacctctcctctgaaaaatg
+ 31811  ctttcggattttagcagaaactctatttaatttaggtaaaatttcactatataaacgaaa
+ 31871  accttcccaatataattgatttatctgaggaacttccctcttcgaacaataaaccgttgc
+ 31931  aataaccatatgctttaaaactcctatgtttttaaacggatgccgataatttagtactcc
+ 31991  aacctcgcttttacattttaaagaaaattttctttgaactagcattaacttaccaaaatg
+ 32051  cat
+;     G-cox1-I2-orf580 <== start
+ 32054  atcatttcttcttcaagtatattttattttatcataaaataaatgattttagactttatg
+ 32114  cgattcataaatactacatacatgtatcattccaactacaacaatccagagacacaatgc
+ 32174  gacacactgatgttgaatttaatataatattaatcaaaaatttataataaaaatatattg
+ 32234  tgaaattaaaactgaaccgtcccaaatacctcaacgtacccttaaccatttaaaaataac
+ 32294  tctaaatgcttttttaaacgaaaatcaaaccaatgccacttcgtataaaatttagagcat
+ 32354  tctaataactcacaagtattaataagtatactttaatagaaattcgcccc
+;     G-cox1-I2 <== start ;; mfannot: no intron type identified
+;     G-cox1-E2 <== end
+ 32404  ATTGTAATTCCTCCTGCAAAAACTGGAAGAGATAATAATAATAAAAATGCAGTGATGAAT
+ 32464  ACTGATCAAACAAATAAAGGAAGGCGTTGTCAATTCATACCTAACAACCTCATATTAACT
+ 32524  ATCGTAGTTATAAAATTAATTGCACCTAAAATTGATGAAATTCCTGATAAATGTAAACTA
+ 32584  AAAATAGCCATATCTACAGACGGTCCTGAGTGCGATTGTTCTGCAGATAATGGAGGATAA
+ 32644  ACCGTTCACCCGGTACCAGCACCTACTTCCACTAAAGATGAACCCAATAATAACAAAAGA
+ 32704  GACGGAGGTAGTAATCAAAAGCTTACGTTAT
+;     G-cox1-E2 <== start
+;     G-cox1-I1 <== end
+ 32735  aaaaagtaaaacgttggatagaaaagcccctttttctattttatttaattctaagcccaa
+ 32795  cagagttctcataaaactttacgtatcaatcccttattataaagcttttttatatatcaa
+ 32855  atttttaaaattgtacctaactttatattatgttaaataatataagaaccaatacacaat
+ 32915  attcaaacagatatacattttttagtttctaccaactggtatcatactatgtataaacat
+ 32975  tctaacaataatactttaattaagtaaaattcttattcgttgcggtattccaataacttt
+ 33035  taccatttaccttttatctaaaaaaattattaaactgttgtcttaaaatttccaaaatat
+ 33095  ttcctcacctgaaaatattataaattaaattattgattaaacaaataaaagtattacgta
+ 33155  aaatcaaaactgcaaaacacagttttaattacctaaatataccccgtaaagtaactaaat
+ 33215  ttttagattcaacgctactattcaacttttcaacaaaattagataacaaaatatattaaa
+ 33275  taaactgaacatataaaatacacatatattacagaatgcattcgcaaccacct
+;     G-cox1-I1 <== start ;; mfannot: no intron type identified
+;     G-cox1-E1 <== end
+ 33328  TTAATCGAGGAAATGCCAT
+;     G-cox1-E1 <== start
+;     G-cox1 <== start
+ 33347  ATCAGGAGCACCAATATAGATAGGAACAAATCAATTTCCATTTGAGATAGAGCATAT
+;; mfannot: G-cox1 <== start Def by similarity
+ 33404  TTAACCAGTTAAAAC
+;; mfannot:
+ 33419  cccctcaaagaaccatatttgcaaagcaatccacacatggctc
+;; mfannot:     /group=II
+ 33462  AATATTATACATATCGCTAATGCACAATACTATCTCGAATTTTACAATAAAAAATTAATC
+ 33522  CGATTATCTACTTCCTTGCGATACAAAATTAAGTAAACAATATTGACTTCCTACTTTTAT
+ 33582  GCGCTATTAAAAAAATAAATATATTTAACTAAAAACCTTACTATTCTTAACAACAACATA
+ 33642  AATTATAACCAGTACTTACAATTACATACTTTCACCGCAAATTTACCTTAAAAATATGTA
+ 33702  AATTTTTTAATTTACTTCTAAATTATAGATATTATTAGCCCCTTTCAGTCTAATTTAAAC
+ 33762  ATATTTTAACTAAATTCTTAGCAAATCAAAATAGCTTTAAATTTAATCTGCTAAATACTT
+ 33822  ATTTCTTATTTGTCTCACTATGCATAAAGATTAATTTTTAAAAATATATTTGTATTAAAC
+ 33882  TAAAATATCTATTTACTTTATCCTTAGCTGCTTAACGTAATTAACCTGAATAATAGATTA
+ 33942  ATACTAAAAAATACACTAAACAAACACTAAAATACATATCTTTTTGTAACATAAAAAGCC
+ 34002  ACCAATTAAAACTGGCATAAGCATAAAAAATATTTGAGTCAAATATTTATAGCTTGAATT
+ 34062  TGCTCATAGAATCAAACATGATAATTGCATACCATTTAACTCCACATTTTTACTTTAAGA
+ 34122  AGAATTTTAATTTAATAAAAAATACTAAATATTCCTTAAAATTGAAAATTTTTATTTTAA
+ 34182  TCTACAACAATCATGTATAAATTATGAATGCGCGATAAATCAATCCATTTTTACCCTATT
+ 34242  TTAAGCGATTTTATCACCGCGTTTTGCCAGAGAATAAAACTCTATCTTATATACTATTAT
+ 34302  TTACACAAACTCTTCTAACACCATTAAATCAATACATACTAATAATAAATAGAAAAGTTA
+ 34362  GTTCCAACGCCCCCTGCCAATTTATCCCATATAACTATAAATAAACTAACCAAAACCGAA
+ 34422  CTATATATTACAATAATATACAAAATTAAATTAGTTTTATTTTTGAAAACAATGTATCAG
+ 34482  GTTTGTAATTCTAAACGTAAAAACTTTTTAAAAATCTTTAGCACTTGTTATCTCTTTTCT
+ 34542  AGCTTTTTCTATGATATAATAAATCCTACAAGTTTTAACTTAATACAAGTCATATCTATT
+ 34602  ATTAATTAAATTAAAATAGTATATAAATAAAAATAAACTTACTCATAATAAAAGCATGTG
+ 34662  CCGTAACAACAACGTTATAAAATTGATGATTTCCTAATAAAATCTGGTTTCCTGGGTAAG
+ 34722  CCAATTCAGCTCGTATTAAAATTGATAACGTTGTACCAATAACACCAGAAAATGCCCCAA
+ 34782  ACAATAAATATAAAGGTCAGGTAGGTACATATATTTTTT
+;; mfannot:
+ 34821  ccctcctattaatctgtacatgcgttacaacgcatacagctt
+;; mfannot:     /group=II
+ 34863  AGTTTAATTTAGAATTTATCAACAGTATAATATAAATAATGATAAAATAATAAATATTAA
+ 34923  AATCATTACCTTTTACAAAAAATTCAGATCAGATTTCCCTTTATTCAAAAAACACTATCT
+ 34983  TGAAACACCCAATTATATTTTTCATGAAAATTTTTATATTAAATTCTTGATACCATACAA
+ 35043  AAAGTTATTAATAAAAATTCAATTTTTCAATTTGGAACCCTTGATTTATTATAAAACCTA
+ 35103  TATAAAAAAATTAACTAAATTAAAATTTCATATCAAAATTTTTTCATACGGTTTAACATT
+ 35163  AGTATCCTTATAAATATAATTACAGATACTCCTAAAAAATACTACAAACTATTAAAATTT
+ 35223  TCTTTAAAAATTAAAAAAATACACAACTGAAAAACACAGTAATCGACACACTTCCAAATT
+ 35283  GGATAGAATTGGCAACTAAGCAA
+;; mfannot:
+ 35306  atcccccaagtaaactgtacatatgaattaacttcacatacagctt
+;; mfannot:     /group=II(derived)
+ 35352  AATTCAGGTTAAAAAAAATTATTATTACCACTGTCCATTTAAAAAAATGCTATATAATAA
+ 35412  AGAAGTTACAGTACAATTA
+;     G-orf526 <== end
+ 35431  TTACATTAACCACTGCAAATATGCTCTAATATTTGAATGAAATGCATGGATTCTACGAAA
+ 35491  TTTAACAGGAAGAAAAAAACTATAAATTGATAAACAATTTGTATCTACAATAGGCGACCA
+ 35551  CAAATACACAATCTCAGTATTGTAATTTAAATTACTTTGTAATTTATTTTTAATTAAACC
+ 35611  TTTAAAAACTCATTTGCGCTTAGAAAATTTAAATGTAGTTTGTAAAAAATATTGATACGC
+ 35671  TATCATATTTTTTCCCCACTTCGGATGTTTTCTGCGAGCCCAATTTCAACATAATTTAAA
+ 35731  TAAATAATTATCTAACTTCAAACGATATCAAAATGAATATCCAAAAGAATAATATTGGCA
+ 35791  TCACCGCATAATTAAAGGATTAACCTGCGTTATTAATTCAAAGGCTGTTTTATGTGTTTG
+ 35851  ATAATAAAAAATATCATGCAATTGCCTACAAATAACTACAAATTTACTAAACGTTGGAAA
+ 35911  TAAAATAAAAAAAAACATATTTACATTTTTACAAGTATAACCAAAATCATACCCCAAAAA
+ 35971  CGATAAATTCTCATTTTTTAATGAAAATAATCTTAAAATATACTGTGTTGCATGTACTCC
+ 36031  CCTAAATTGTAAAAAATTAACTATAAATGATCTTAAATTTAAAACTTGTAACCACCATAA
+ 36091  ATTCCCAATTATAATAAATTCCCCAGCATATCTAATAAACTGAAAAGTATCAATAACCTG
+ 36151  GCTTAAATTACAATGCTTATTTAAATTACTACTAATAGATAAAGATTTTAAATAAAAAAA
+ 36211  CCGTCCTCACCGCTTTAATTTTTTCTCCAAATTATTTAAAATAAAATTAATAACAGTATT
+ 36271  GGTTAAAATTCCATTTACAAAAACCCCACCTTCTGCTGAAGAGGTAGTTAAGGCTTTTCG
+ 36331  GCGCAATAGGCCGGAACATAATCAATTATGCAATAACGGAACACATCTAATAGGTACCGG
+ 36391  TAAATATTTTAATATTCACGTAGAAACAGAAAAAGTAAAAAAATTCAAAACATTACATTT
+ 36451  TAAGATTCCTACTTCTTTTTTAAATTTTGATTGGATAGCAACATAAACATCTGAAATAGC
+ 36511  TTGCTGCTGTGAACGATGTCTTCGGAATCCGTAATTATTATAATCAGAAATCGATTCCAC
+ 36571  AATAGGCTCAAGTATTAAATTAAATAAACTTTGAGCTGCACGTTCTTCTAAAGTAACTAC
+ 36631  ATATGAAACACAAGTTTTTTTTTTGCTAAACTTAGAAAAAATTTTATATTTTACTAATTC
+ 36691  AAATTTCAAATCTGAAAAAGAATTTAATTTAGCTACTAACTTAAGCTTATCTATATTTCG
+ 36751  ACAGACAAAAGATTTCCTATTTTGAACTAATTTAATATCTTTACTCTCTACAACACGACG
+ 36811  AACTGCTACTAACTTAAACACCAAAGACGAAAGTAAATAATTTTGATATTGTTGTACTAC
+ 36871  TACATGACGTGATCCATATAAAACTGTTAACTTGGCTAAATTCATCTGCCTTAAATAAAC
+ 36931  AAGTCGCTCAATTCGACCTCAATATTTAGGCCAACTAAAAATTGAATTGTTGTAAATCAA
+ 36991  AAATCAATTTCATCCTAACAT
+;     G-orf526 <== start
+ 37012  ACTTAAATTTAATTTTGCTAAATACGACAAATTACTTAATTATTAATATTTTAAAAGCAT
+ 37072  AATTTATACTTTATTAGATTACAAAGTATATTATATTTTGTAAAAAAAAATAACTAAAAC
+ 37132  TTATATTAACTAGTAAATCATAACTGTTACAAAACCCGAAAATCTGAAAAAATCATTTTT
+ 37192  ACATCAAATTATAACATTTTTTCAAAAATCTTTCTATGCATAAAACAACCTACATTACAC
+ 37252  CTGTATTTTATCCTCTCTCTTCAAAAATAAAGTATTAAACTATAAAAGTCTAAAATAAAA
+ 37312  TCCAAGTTAATTACAAAACGCCCATGCATACTTTCTATGATTAGAAAAAAATTAACAATT
+ 37372  ATTTAAAATAATCGTTACATAAAATACGAATTTTTCTATAATTTACAAACATAATTAAAA
+ 37432  AAAACGATATCCAATATTTTTTATTTAAAAAAACAAATTACCGCCATATATTACTGCTAT
+ 37492  TAAAAAAATTCAAATATTAATCGCATTCACATCTAAAAATAAAAATAACTTAATAACAGT
+ 37552  ACTAGCATAATTACATTTTAATTATAATAGTACAAAATAATCTTCTCGCATATACATTAA
+ 37612  AATATGCTACAATAAATACACTTCTAACCTCTATAATACAATATCCTTATGATTAGTTGA
+ 37672  GAAAAATCACCGCATTAATGAAAAATTGCTTGGAATCAAAAACATTAAAAAAAAAATTCA
+ 37732  AATAAATAAATAACTATAAACTTATTAACCCAATTCAAAAATAACATAACGA
+;     G-nad4L <== end
+ 37784  CTACCCACGTAATCCATAAATAAAATCAAAATCAATATTTTGATGTTTTTTATAAAATAT
+ 37844  AACAAGAATAGCTAATCCAATAGCAGATTCTGAAGCAGCAACCGTTAAAATCAACAAAGA
+ 37904  AAAAACCTGCCCTTTTAAATCATCCATAAAAATAGAAAAAAAAATAAAATTTAAACTTGC
+ 37964  CCCCAATAATAAAATTTCAACTGCCATAATTAATATTATCACATTTTTTCTATTAAGAAC
+ 38024  TATACCCCATAAACCAATTGCAAACATAAATATTGAAAAAACTAAACACTGAAATGAAAT
+ 38084  TAACAT
+;     G-nad4L <== start
+ 38090  GCTTACGTATTTTTATAAACTAAACATAATATTAGGTAGACCATTATACAAGATCAAAAA
+ 38150  ACTAGAAACAATTATAATCAAACTAATAGATACTGGGAGAAACGACTTTCAACCTAACTG
+ 38210  CATTAATTGAGGACAAGTAAGAAAATTT
+;; mfannot:
+ 38238  tcttctttaagaaactgtacatgataattacttatcatacagctt
+;; mfannot:     /group=II(derived)
+ 38283  CACTCACATACTTCCTAAAATAAAAAAATAAATTAAAAAAAAATAATAACACTAAAACTA
+ 38343  ATCTTTTTTTAAAAATATATTCCAATTTAAATCAAAAAAACAACACAAAAGAACAATTCT
+ 38403  TTTAATGGATTGATTTAATATAACATCTGCTCATTTATTAAATTTAAAATATTTCGCTAA
+ 38463  AATAGACAACTAACCAATACAATTAAAATAAACATAATTGTATACAAAAATAAAGTTCCT
+ 38523  AATGATTTGAAAAAAAAATTTTGTTAACAAATTAAACTTTCATAATTTTTAAAGAATTAT
+ 38583  CACTATATAACATAAAAAATAATAAAAAAACTATTATCAAAATAAAGTAT
+;     G-orf504 <== end
+ 38633  TTAAATAAATATTCCATTAACACATTTATAATAAGTTTCTAAATTTCAATTTTTTCTATA
+ 38693  AAATATAAGTATATCCATTATAAAAAAACTTTTATAAACTATAATAAATTTTTTTATAAA
+ 38753  ATTAAAAAAATACTTAAGTAACAAAATATCAAAAATCTTATTTTTATACAAAAAAATATT
+ 38813  ATATGCTAGTAAACAAATATCTAAATTTTGTAAATATCCATATCTCACAAATTTTTTAAT
+ 38873  TGCTTTATTAAAAACTATCTCTTTCAATAAAGCAATTAAAAAAAACTTACCCACTTTATA
+ 38933  TATAATAATTCCAAACAATTTAACAAAACACCTAAATACTGCACTTCTCATAAAATCTAT
+ 38993  ACATGTTTTTTTGCAAAAAATATAATTGAAACAAAATTTACTAAAAACTTTAAATAAACC
+ 39053  CCCTTTCCGTAACACAGAAATATTCTGGTATTTGATTAATTTTATTTTCTGATTAAAATT
+ 39113  CAATACCTGCTGTTTTAAAAAAAAAATACGATCAATAAAAGCGTTAATTAAAGTATCCCT
+ 39173  TATATTTTTAGAATTTAATTGAAAAACACCCAATAAAAAAGAAAAAATTATTTTCTTAAT
+ 39233  TTTCAAAGCAAACTCCTTATCACCATAAATTCCAATTAAACAATTTTGAGTATTTCGAAT
+ 39293  ATACTTAACACTAATAAATTCATGGCTATTTATCAAATTGTATTTAACCTTACCAAAACA
+ 39353  CTTCGCATAAAAGCATTTTCGCTTTAGTTTATCTATAAAATCATCTAAAATTAATAAAAA
+ 39413  AAAAGATCTAAATAAATCAGTTGATCTTACATTTAAATGGCAGTTATTTTCAAACTCTAA
+ 39473  GACTAATTTTTCAGGAAAAATTTCCCTGTTACGCAATTGAACAGTAAATACTCTTCTCCA
+ 39533  GCTATACATAATATAAGGAAATAAAACTAACAATTCATCTTGTAAATATTTATTTACCCA
+ 39593  ATCCATTAAAACATTATAATTAAAAAATTTTAAAATTTTTTGAAAATTAAATTTTATATA
+ 39653  TCAGCGTGAAGCAACTCAGTGAAATTTCAAAGCTTTATAGAAAAATTGATTTCCTACCGT
+ 39713  CTGGACAATAAAACGGGTTTTTAAAAATAATTGCTTTGTTCAAATTTCCATTAAAATTAT
+ 39773  ATAAAGACTTTGTTCTATAAACGTATAAATTAAATACAATTCAATTACCTTACAACCTCC
+ 39833  CATCTTTCATTTAATAGATCTAAATCACCATAAAAATCTGCAATATAATTGATTTCCTAA
+ 39893  TTTCAAAAAAGTTTCTATTTTTAAACCAAAATAATATTTGTAATACTTATTTTTTACATT
+ 39953  TCTATATTTCTGGTTGTATACAAACCACAAAAATGTTGGTTGCATTAATAAACCGGATAA
+ 40013  TAACAACTTAAATTTACCATTTTTCTTTTCTAGGTTAGATAACGTAAAAAAAGGCTCATA
+ 40073  CTTTCCTAAACAAAAAAAATCCATTTCCCCATATAACACATATAATCAATTTTTAATAAA
+ 40133  TCGAAAAAAATACAT
+;     G-orf504 <== start
+ 40148  ATCTTTACTACTTACACATGTATTTACTTCAACAAAAATACACAAGCACTTTTATTACAA
+ 40208  TTTCTGTGAGTCTTAGATCTTCTAAAATATATTAATTTTACTTAAACAAAAAATGTATAT
+ 40268  CAAAACTATCTTCTGAATTCCCTTTTCTTGTTTTTGTTATTATTAAAATCCAAAAATCCT
+ 40328  CCTTTATTCTATTTACACCTAAATACATTCAAAGTATGTAAAAGAATTTCAAGAAATTAT
+ 40388  ACTTAACGTAATATATTCACTTATAGTATCCTAAAAATTTTTATTAATAAAACTTTTAAC
+ 40448  TTCTACTTATATCTATAGCTAACACCTTTATAAAATATTATTTTTTACAATTATATTATC
+ 40508  ACCTTTCCAAGTCATGATCAAACCATACTTCAAATTCTTTAATTAATAAAAAAAAACTAT
+ 40568  TTAAAATCAATAAAATTTACATTAAATAGTCTAAACTAAATAACTTCTTATACTAAATTA
+ 40628  TGAAAATTCTTAAATTTTAGCTTAATATCAATACTCAATATTGATACCTAAAAAATTTTT
+ 40688  TAAGTCTATAAATAAAAAATAAACTATACGCAATGTCTTCTATCTCGCACTCATAACGTA
+ 40748  ACCTAGGGTAGGTTGCACGTACTCAAACAAAACAAAATGAAAAAACGGAAATTTTTATAC
+ 40808  CAAACCACACCAATAATTCAACTGTCATATTAAAAACGGAAAACCATCCTCCACAAAAAA
+ 40868  ATAAAGAAAGCAATACACACATAACAAGATATTAGTTCGAAACGCTAAAATGTAATTTTA
+ 40928  ACGTGCGCTAATTAACACATCACATGCGAGTTTCCAAGCATTATGCGTTTCGCTGTTTTA
+ 40988  CTAAAACTAGAAAAAAATTTGCATTAAAAATTGCGTTAATACGTATTAATAATGCAATTT
+ 41048  TTTTATTTCTTAAAAATACTTCTAATTAATAATGCAATAATACATTCTACAAAAAATACA
+ 41108  TCTATAGTATATACACATAATATACTATTCAATTGATGGAATGGATGCAAAATATTTTGA
+ 41168  TACAGCAAAAATTTTTCGATTTACTAACTACTTAATTTTAGTTTTAGATATCTACGTATA
+ 41228  CTGCACTAAACCGTAACATCCATAATTATACTATAATTTTACTTTGTAGATATTTTAATA
+ 41288  AATTAATATAAAAATACATACGCCATAGGTTTATTTACAACAACACACTCAAATAAATTC
+ 41348  GTATAATCAACTCTTCTTTAATTATCTGCAACAAATTACTAAATTGAATAATCGCTAAAT
+ 41408  TAAAATATACATATGCACCTTTCAATATATATGCAACTATAATTAAATACAATTAATATA
+ 41468  TGTTATCATACACCAACCGTTAAGAATTAACTAAATTAAGCATTTATACCTATGATTAGT
+ 41528  GAAAAGCTCTAAATATTTAAATAATTTTTAAATACTAAACATTAACCAAAATATTAGCTG
+ 41588  CCACTGCTAATACAATTATTGATAAAATACCTATTCGCCCCATATTTGAATACTCACCTA
+ 41648  AAAAAAATAAAGCAAACGCCATCGCTGAATATTCAACAAAATAACCCGCTACTAATTGGT
+ 41708  AGGCTAAAATCAATTTTAATTCCTGATTTTCCCGTAAAACATGACGAACAATTTTCACTG
+ 41768  TATCCTGCTTCAATACTATCTACTTTATCTTCGCTGCAATAACTAAATAGATTTTATTTG
+ 41828  AAATTTTTTTAAAATAAAAATTATATTTGACAAAAAATACCTTTTATATATATTTATTTT
+ 41888  TAATTTAATTCTTTAAAAAAAAAGAAAATAACAATAAAAAAAATTTAATCTAGTTGCGAA
+ 41948  TTTAAATAAATACTTATCGGTATATATATTTAGTAGTAACCATTAAAAAACATCATTAAA
+ 42008  ATCTAAATCAGCCTATAATATAGACTTATCTTATATACCATCCCTGCGATATTACATACT
+ 42068  CTATATACCAAAACATTTCAAAAACAATTGAATATTCTCTTTATAAATAAAAAAAATAAC
+ 42128  TAATCGCCAAACCTATCGTAAAATATATATATTGTATGTTATCCACTTAGCCTATTATTA
+ 42188  ATATTTATAATCTCATTTTAGTATTATAATACAGTATATAAGTTATTTTTGAAAATCTCA
+ 42248  ATTTCTAATTTTTTTATATATATATTAGAATAAGCAGCCTACAAAAAATACTTAAATTAT
+ 42308  AATATTACAAAATTAGCCAAATATGCTAATCACCTTAAATACGCATTATATATACTTAAA
+ 42368  TTTTAATTTTTAAAATTAAAAACATCATCTCTGAAAATTTGTTTTCTATCAGTACTTCAT
+ 42428  AAAAAATGAAAATGAATTAAACAATTATAAATATTTATTAGCGCAATACCCCTGCTTCAG
+ 42488  CCTCTGCTAAATCATTCGAGATGGAATTTAAAATCCCCCCAAAGAACCACAAATATTAAC
+ 42548  CACTTAATATGCAGCTCAGCTTGAAAACTCTCCGTTGAATTAAATAGAAAAACTTAAATA
+ 42608  ATTGTAACACTGCCTCCTATCATCATTTTCCTTTTTCTACTAAAAATTATTTTTATAACT
+ 42668  TAATATCTATTTTTTTTCTCCACAGACCAATTTTCCAAGCATAAACAAAAAAAAATAATA
+ 42728  AATATATAAATATTTTTTTTACTTAAATCAGAATCTTATATACCTAATTTATTATTAACA
+ 42788  TAAAATTATATACACTATTACCTAAACTCAATACAAAACTAAATCTTTTTAGAAGCTATA
+ 42848  TTCTATTTACTTCATTATATATATAAAAACTTCCTTCTTAAGTTTTTTAACTATTTTCTA
+ 42908  TAAAAAATTTATATAAACCCAAAGTTGAAAACATATATTTTTTTATAAATACCTATACAA
+ 42968  GTACAGACTAGCCTTCACCCCCCTGTGCACACAAAAAACAACACTTTCATTTTAAGCTAA
+ 43028  CTCTAATTTAACCGTAATTCATTAAAATATCTTTCCATAATATAAATATTTCCTATAAAA
+ 43088  ATGTAAAACATTACATTTACATTTTCAACTATCTACAAAAAACTAATATAAATTTTAATC
+ 43148  TAAAATATACATAAATAATAATACTATAGCATTAAAAAATTCATAAATAATTAATATTAG
+ 43208  CCTAGTATAACAAAAATTGATAATTAACAACAAATATCAATTTGCTAAAAACTGCACGTT
+ 43268  TCACTTTCCAATAAACTTCAGA
+;     G-nad1 <== end
+ 43290  TTATGAAATGACTATAATCGTAGATAAATCAATATACCGTAAAAAAGGAGCTCGATTTGT
+ 43350  TTCCGCTAACGCAGAAAAAAAAAATAAAAAAAATTGTGGTCATAAATACCAACAATGTCA
+ 43410  AAAAAAAAAATTATTCTGATGTAATATCAACTGATACAAATTAGCCGAGCCTACACAAAT
+ 43470  TAAAACCGAAATTATAATAAAACCAATTGATACTTCATAAGAAATCATCTGTGCAGCCGC
+ 43530  TCGTAACGAACCTAAAAACGCATATTTAGAATTACTTGACCAACCAGCAAAAATTATACC
+ 43590  GTAAACACCAAATGATGATATTGCAAGAATAAATAAAACACCAGTTTCTACATCCACCAA
+ 43650  AGAACCATAATTTGTATATGGTATTAATGATCAACTTGCTAAACTAACAACAAATGTCAA
+ 43710  CATTGGTGCTAGATTAAATAAAAATCCAGTTGCATTGGTTGGTACAACAAGCTCTTTTAC
+ 43770  CAATAATTTTAAACCATCGGCTAAAGGCTGAAGCAATCCCCAAATACCAACAACATTAGG
+ 43830  CCCACGTCTACGCTGCATGCTAGCCATCACTTTACGATCTAACAATGTAAAATACGCAAC
+ 43890  AGCTATTAAAACACAAACTACTATTAATAAACTATAAATAACAATATAAATAAAATAACT
+ 43950  AAACAT
+;     G-nad1 <== start
+ 43956  CTGTATGTTATACCTCTAATACACAAAATTTATAATCTAAACTGAAACGTTTCAGAACAA
+ 44016  AATTCAGCATCCTCTATTAATGGGAATATAACTAAGAAAATAAAAAAATACAAACTAGTC
+ 44076  GCTATTTGACCAATAATCATATAAGGAGTACACTAGAAATCTAATATTT
+;; mfannot:
+ 44125  ccgtgcctaaaactgtagaaacttatcactaagaatacagctc
+;; mfannot:     /group=II
+ 44168  CAAAGAAATTTCAATAAAAAATTATTCTGAATTAAAATAATAAATTGTATAAGTTAAAAA
+ 44228  AACAAATTATAACAGTCAAAAAAACATTAATTTACGTAAAATAAATTTTTAAAATAACAT
+ 44288  TAAAAAAATATCAATTATTATATAATAATAAACAGATTTGAAATTTCTAATAAAACTATT
+ 44348  ATTTTTTATTTTTAAAAAACAATTTAAATTTAAAAAAATCTATTACATTTTTAATCTTAT
+ 44408  TGGTTTCCAAGAATATACATATTCGGATATAATTTAATCCAAAAAATTCATAACCTACTT
+ 44468  AAACAATTTACTTTTAAAGTATGCATATAAATTAAAAATTAAGTAACCACCACACCTATA
+ 44528  TGTAAGGTTATTTTTGTAATTATCCAGTTATTCTCTAAAAAGGCAAATATTTATACTTTT
+ 44588  TAATCTATATAA
+;     G-cob <== end
+;     G-cob-E7 <== end
+ 44600  CTAAACAAAATCGTTATTGCCACTATATATTCAATTACTAAATTCCAATTTATACTCATA
+ 44660  CTCAACAGGTTTCCCTCCAATTCAACCTAAAATAAAAAAAACTCCAATTAAAATTCAATA
+ 44720  CATATTTTTATAAGAAGACTTAAACAGTGCACTACGTACTGAAGAAGTACTATAAAACGG
+ 44780  TAAAAATAATCAAACTAAAATAGATCCTAGCATAGCTATAACTCCTAACAATTTGT
+;     G-cob-E7 <== start
+;     G-cob-I6 <== end
+ 44836  agtgcacactagactaattctaataatccgtgcctcgaaccggataaatatcttacaaaa
+ 44896  aatccggctcccaagaacaataaataaaaataataattcttatgcaaatttacttcggtt
+ 44956  tgcgtataatcatcaaaaaattaactccaccttaacttttaaaaagataattttttaaac
+ 45016  aatcttaattaaaacttataatttcataaaaaacctgtatattctgctatatcattatac
+ 45076  caaaatcttttaacaatcaatagactttataattaatacctatgttcccattaaattcct
+ 45136  tttaattttttttgttagctaataaatttaaaaaacaagctaaatatttcccataacctt
+ 45196  tacttaactacataaaattaaattttcttcattctaactgaattacggcaaaaaatctac
+ 45256  aatttcaaaaacatttttaatagttagtagatattatgtatttatacatttctaccaaaa
+ 45316  atagtccatttcatgcatacctctctaagcttcccactcacttgaaaaattaaactaatt
+ 45376  ctaataattaaactaataaaaattatcatcaattatctactaattaaaccatatacacaa
+ 45436  aatttatgtttacccctacattatcttcctattttaatatcaacattgtgtattttattt
+ 45496  cttccgtatcaaaatcagacaacactc
+;     G-cob-I6 <== start ;; mfannot: no intron type identified
+;     G-cob-E6 <== end
+ 45523  CAGGAATTGAACGCAAAATCGCATAAAAAGGCAAAAA
+;     G-cob-E6 <== start
+;     G-cob-I5 <== end
+ 45560  ctaccagtaagtattattaattttcaactaaaaattcctctggttagaactgtacaagct
+ 45620  tttcgcaaagcatacagctcttcgtagatttctcctcttaaataaaaaactataatacaa
+ 45680  aataaacatatcgttttacatgtttttaaaaacaataaatatattcatcaacaacactaa
+ 45740  agtatccaacgaactctatatttaatgttatactctactgtattaaaagattataaacat
+ 45800  gcaacctaaattcctttcacaattttttctttacctgataaccactttacaatactaaat
+ 45860  agcaaatatacaattataataacgtctaattttcaattgattaacaattaattttctaat
+ 45920  atttaaataaaattaaaacaaaatactgacatatattcattacttaatttatttatataa
+ 45980  agcaaata
+;     G-cob-I5-orf353 <== end
+ 45988  ttaattatatttcctcctaactttaatccttagtttacctaaacaattaacataataaat
+ 46048  caatacgcaaccgcttcacgtctcaattaaaaaattataattaaattgcacaaaataaaa
+ 46108  atttacttctttactaaaatttgttaaactaacatattttgttctaaaacataaacctaa
+ 46168  caatttatagaaataaactaaacccattaaaccgtaatcaattaaacaaaaatcatcaaa
+ 46228  atttgaatatacattttgcacaaaaaaatttctatttatagatataacttttatgttatt
+ 46288  aaatcaattttttatacttctatttaaaactaaaaagaaattacaatataaaattaagaa
+ 46348  agcaccgtctgtactaaaaactatatataaattattaacaccaaatatccaataaaagca
+ 46408  taaacttttaattcaacaataaaaacagataaactcaaacaaccaactaaaaaaaataat
+ 46468  cctaacattttttcgcgttaaaaattctgaagtaaatttaaaaaatctagctaaatgtga
+ 46528  actttttattttcaaaatatttcaaaaaaatttacaaattagcaaattaatatagacact
+ 46588  aatccctcttaacccataattatcaatcaataaaacactgactaaattttcctctgaaaa
+ 46648  aaaaaaaatcttaagtttccttaacaaatacgtgcaagaagacacaaaaaatttattact
+ 46708  tacaaatttatcaaaaaaattgtagtaataaataatttgatctataacatagaggcgatt
+ 46768  tctaatatcaccgaaaaaatctctactcttatctggtcataaacaattttgcaaattaaa
+ 46828  acaaataataaattttcatctaagctcaacaaaacgaataactgctaaacgtaacagtat
+ 46888  actccacattgatcatcgaccacaatacaaatcgcgcaaaataaagtaaattaatttaat
+ 46948  atacaatctatcaaaaaacacctttaaaaattttctactaaagaaaacaacattattttt
+ 47008  tacacaagtaaatcgttttctatctcgcaaattattttccat
+;     G-cob-I5-orf353 <== start
+ 47050  acctttttatttttaaaataaaaattataaaataattagacccacttttaaatacttgtt
+ 47110  ctaaaatacatattatcactatatctattaaaaattttaatacttacgtatatttatact
+ 47170  tacatattcatttcgatattttatttatttatatctattaaaaatatacgattaattttt
+ 47230  gtttcataaaaaatattaaaactaaaaaattttgatatccaattttaaaacattatcatt
+ 47290  aatttagtgataataaaaattttttattaggaatttaatttatttaaaatttaaaatcga
+ 47350  taaacaaccctaaaacaaataaaaattttaaattctgaaccttcagctatactgctatca
+ 47410  tatttataaaagtacgcttatgtaactgtaactaaaaattactaaatcaagtccacaaac
+ 47470  acaaatttaagcactccattc
+;     G-cob-I5 <== start /group=II ;; mfannot: splice boundaries uncertain
+;     G-cob-E5 <== end
+ 47491  ATACCACTCAGGAACTATATGCGTCGGCGTAACCATCGCATTAGCTT
+;     G-cob-E5 <== start
+;     G-cob-I4 <== end
+ 47538  cattttactagacttgttttaaaatagtctgcaaataaaactgtaaaaacttattaacta
+ 47598  agaatacagctctaaaaaaatacaaataaaggttaaaaaataaaacttttaattgacatt
+ 47658  taacccatgatatgataatcaactctcaaattcatatttcaagaaaattttttttagaaa
+ 47718  ataaaaaataatccagaatgtatatacgaaataaaataacctccctgtctgtatcttccg
+ 47778  ctttttcataaattttgaaaaagctctcttctatccttaattctaaaaattgatttaact
+ 47838  ttacaattaaatcccatttgaaaacccgttgttatattaaaatcttttaaactatagcta
+ 47898  catacacaattataactctaaaacaatattattttctgattaacactaataaatttctaa
+ 47958  aaatttcatacaaacatactgcatatgaaattaatcaaagctaattaaagctattttttt
+ 48018  ttcataattttaaaagattatttctatactttaattctctaaatttaataaaaaatatcg
+ 48078  ttctaccaatttacccccccctattaataattttttttaagtatatccaggttactaagt
+ 48138  caatagaatatttc
+;     G-cob-I4 <== start ;; mfannot: no intron type identified
+;     G-cob-E4 <== end
+ 48152  CTATGTAATTATCAGAATGCCCTAACACATTCGGAATAAAAAATACAAGTAACGACGCAC
+ 48212  CGATAAACAATAAAATCAAACTATAAATATCCTTAATATAAGAATACGGATAAAACGGTA
+ 48272  AATTTTCAACCCTTCAATCTACTCCCAAAGGACTACTAGAGCCTACTAAATGTAAAAGAT
+ 48332  ATAGATGCACTAATGCTATCGCAGCAATAATAAATGGAATAAGATAGTGTATAGCAAAAA
+ 48392  ATCTATTTAGAGTCGCA
+;     G-cob-E4 <== start
+;     G-cob-I3 <== end
+ 48409  aacgagtaaagatttatattaaatcctcctcaataaacagtacatggtaattattcacca
+ 48469  tactgctttaaaataaaaatacattagatattattattcgtacctttaataaactataac
+ 48529  caattaatctatgacaataaacttgacaaattttaataacaccatctataaaaaataata
+ 48589  ttctaaattatgtagagttattgataaattcttctatctatatttttactcgtttatttt
+ 48649  tatactttacgggagtaataccaataacgcaacgcttaatttaaataaaatctttgaatt
+ 48709  tatactttatagattcacttattcatccatatcataataaatcatgaaaattttttacaa
+ 48769  aatttaactcataaaagcaacttacttttaaaactagtatttaaggatttgtatatcact
+ 48829  aatagtatatcattaaaaaatatagaacttatattaaaattttaccaaatttatcgattc
+ 48889  ttccaatttatatgacagataatgtagtctaattttggatactaaattttaataactatt
+ 48949  tttttcaattataaaacttaaatttaattcattttttcttttatactaacacaaaacaaa
+ 49009  ttctattttatcaatccacaattttaaatttaattatttatcctaaaaattatactattc
+ 49069  aactttaaccttttcctattgattaaatcaatgctttaagttaactcaaaactgcatata
+ 49129  tcaaacagtatttcacaccaattcaataaaaaataaatattcccacac
+;     G-cob-I3 <== start /group=II
+;     G-cob-E3 <== end
+ 49177  TTATCAACACTAAAACCACCTCAAAGCCGATAAACTATTGAATTACCTATACCGGGTATC
+ 49237  GCAGACGCTAAATTCGTTATAACGGTTGCTCCTCAAAATGACATCTGACCCCAAGGAAGA
+ 49297  ACATATCCTAAAAAAGCTGCAGCCATTGTTAATAAAAAAATGATAACCCCTGAACATCAA
+ 49357  AGCCATTGCTTTGGATAGGAATAAGAACCATAATATAACCCTTTACCAATATGAATATAT
+ 49417  AACATAATAAAAAAAATTGACGCACCATTCGCATGAATGTACCGCAACAACCATCCATAA
+ 49477  TTAA
+;     G-cob-E3 <== start
+;     G-cob-I2 <== end
+ 49481  gacaagtcaataatctactattgctctcgaagctgtacatacacatttacgcgtatacag
+ 49541  ctttcataagcgttaaataaaattcctaaaataacaaaaat
+;     G-cob-I2-orf750 <== end
+ 49582  ttacaaacaatttatactatataaaatatacctaaaacttaatttccaaacaaagaatcc
+ 49642  aaatttaaaaaatttcctatcaagattttgtatataacaagttgacgggaattgaataac
+ 49702  tacattaccatcttcattatacacacaaggatcaagcccataatatttacgaacccaatt
+ 49762  acaattctgacgatgcttaaccgcaagagtcaatacgcaacttttacgtaaaaaaccaac
+ 49822  tatacgttttacttctagaaaattatcaacacaccgatatttactcattaaacaataagt
+ 49882  caaaattaaaaatcgcttcaaaatctctgaatctggcattaaaatgtaatagcgattgaa
+ 49942  tactcctttacaagtaattggatgaacaaattccattttcttaagtttatctaaaatatc
+ 50002  atctactggagctaataaaactatgcgtttcgcagaaaattgcgaagtcagcgcagtatt
+ 50062  atttattttatttctgaatcttcaaatactcaaatgtttttgttttttaatagttggcgt
+ 50122  caacttttgtatcatatggttttttacttcaagctctaacgaaattattttagcaataaa
+ 50182  atcaaactttgattgctgtgcctgactaattatgtgcctaatgcctgtgtttaaatttcg
+ 50242  cttaaaaaaaagtgaaataaatttagaattttctacaccacgccgcaatttagttagcat
+ 50302  taataaaacattttttgcctcttcacgaataaaatattgtttccagtttactaaatttag
+ 50362  attcataggcgaaagtttttccccctcacctctaaaaacagataaataatttcctaccaa
+ 50422  gcgtataaaaaatttctttacaatagtatttaccgtaatcaaccgatgtctcacatttga
+ 50482  agagctaagccctaattgtaaaatattgcgccataaaatttttatcaataaacgttgaat
+ 50542  actgcttattacattatgagttaaatcaaaatctttaaaaatcccattagagtctatttg
+ 50602  agccctaacaggtggcaaagcggaagaaggcccaatcgcacatacttttttagcgtcttc
+ 50662  tgctttaaccatacaaattttataccctaaaaattcaataaaccctttactacaacatgc
+ 50722  tagtttatttactcttacaattatgcataaatcactttttaaaaagtgatttatgcagtt
+ 50782  ctgaataaacataatgaactctttagatccaacactgccaatcaaaatattatctaaaca
+ 50842  ccgaacatactgaaaataattatgcgtcctccgccgtgttgcacaaaaataaaacataaa
+ 50902  ataggattttaaaaaccttctagaaattttatgaatttttttgaaaaacctatgtttgtt
+ 50962  attattagctattagttttatttgatacacattcaataatttaaatttttttaatttaat
+ 51022  tcttccataaaactgtgtaatcgccttaacaaatgaatctaaatgagacaaataaaaatt
+ 51082  aagcaaaaaaatcaatattaaactattagaacacatcaaaaaactttcattttgcttcaa
+ 51142  acaacaagaaacctcactctttactattttttctatttctcttcatatacgataatccaa
+ 51202  tatatacatttttagcagatttgctaaatgacctaaattaacactattaattataaggtg
+ 51262  tgcattgatatttaaaaatcaacttgtatgcaaacttcatccttttacgcatcgtaaaat
+ 51322  aatttgaggcgataaactaattcaattagataataaattaaatcctaacatcgaatttaa
+ 51382  taacttagaaacgcgatatgaatctcaatctaatgcatctcgattatttttttttaatct
+ 51442  aaaaaaattaaaaggaaaaattaacttttccatacctaatttattaaaaatagttacaat
+ 51502  ccctaataaaatagctacctcaattattttaacttttcaatcaaaaccttgatattgttt
+ 51562  atacgaatttactaccttccattgcttttttaagtagctatagtttccaattaatagagc
+ 51622  cttgctcgtttttgcaaaccatacaacgggtattcgatctaaagaaagtttttgataacc
+ 51682  aaaattacctcctgcattcaacaacactaaccgcaactgacatcaggctgtaactaagtt
+ 51742  ctcactagaaattatatattcatataaactattattattaatttttttttcgcttatggc
+ 51802  cgactcatttcctttaaaaacactacaaaccat
+;     G-cob-I2-orf750 <== start
+ 51835  aaagttttcacaagcctcccccgtaataccaaaaaaaaatacaattatctcaaatctttt
+ 51895  tagtccttacttgtgtacatatttctgaattcaacgccaataactgatttattactaaag
+ 51955  tgatatgataataaattctcttaaaatattcaaaaaatttaatcctaatacaattaatca
+ 52015  tctttacgcataggtcctatttaaaactatactaattgaaattatttctctttgacacct
+ 52075  cagtactccaacttcaaacttaaccactaaatcatacaactatcacagtaataaaacatt
+ 52135  atttctcttcgagaaatttttaacaaataatttttttaaataaattcattattcctcgga
+ 52195  tataatgctgaaaaccaaacgatacccacac
+;     G-cob-I2 <== start /group=II(derived)
+;     G-cob-E2 <== end
+ 52226  CATCACGCATAATATGCTCAACACTACTAAACGCCAATGCTATGTTTGGCGTATAATGCA
+ 52286  TTGTTAAAAAAAGCCCCGACAATAATTGTATTACTAAACACATCCCAGCTAATGAC
+;     G-cob-E2 <== start
+;     G-cob-I1 <== end
+ 52342  cgtataagatagaatttataagttccctttaaagaactacacatttaatttaaactattt
+ 52402  gtagctcaatttattaatttaaataaaatttattaattttttattgtaattcttttgcta
+ 52462  tatttcttcaaacctacactaaaaaattattatactaaatactataaaattaataaataa
+ 52522  caccgaatttaatttataattaacttaatttaaaatctatattatatagattttaccatg
+ 52582  atataataatgtagcacacgttaatttataattttttcgggaatcctttcacaattcaat
+ 52642  ttaatgtacattaattaaattataaagtaatttttaaatactctcaatcaaaaatatatt
+ 52702  tgaaaacaaatttatatatgcttagaaaatataaattctaaaaaattaacgttttatgat
+ 52762  taattatccaaaaaataatattattttaaaattattcattctaagtaatatttgtataaa
+ 52822  ttaaacctgatccaaacttttaaaccaaaaaataggattatacattcaaacatatattgc
+ 52882  aattacaggttaattcacatacatatttaaatctcaataaaaatttctaattgaacttta
+ 52942  ttaca
+;     G-cob-I1 <== start ;; mfannot: no intron type identified
+;     G-cob-E1 <== end
+ 52947  CCAAAACTTCATAAATATGAAATATTTCCTACAACAGGATAATCAACAATATGATTGTTA
+ 53007  ATTCAACTTCATGCTTTCAT
+;     G-cob-E1 <== start
+;     G-cob <== start ;; mfannot: alternative ATG start pos 53041
+ 53027  ATGTAAATACCGCATAAAACCGAATTAATATACATTTAACAATATACCTAAAAAAGTACT
+ 53087  CAATATGCAATACAGCGTTAATCATAATTCATTAAATAAATATAGTTGAACAAGAAATTT
+ 53147  TATAATTGAGAACCAACTATCGGTAAAAAATATACAAACAATTAATTATAGTTTTATATT
+ 53207  TAATCTAATAATAAAAATTTAATTTTTAAGTCATAAAAAACCCATCCCTATACATTAATT
+ 53267  TTAAAACGAAAATCTACCTACATTTAAAAAAATTAATAGTAAACATAATTTCATTAAAAA
+ 53327  TTTTTGTATAAAAAACAAAAAATTTAACAACCAATAAATAATTATCTGCAAATTACCAAA
+ 53387  AAACCTAACTAAATTATAGCAAAAAATTTGATTAACCCTAACTAAATATTTTTACCCAAT
+ 53447  TCAAAAAAACCTTTTAATAATACTTTTAATATAATGGTTTACTTATATAAAATTCATATT
+ 53507  AATATTGACCCACCCTGATCCTCCTCTACAATATAGACAATGTCTATATATTTATTATAA
+ 53567  TTTTATAAAAACCAAGATATATTTTATTTTTAATTTTATCAATTAATGTTTATTTAAAAA
+ 53627  ATCTACTCTAATAATATTTTTATTAGCATTAAATAATATAAAATAAATTAACTTACTAAT
+ 53687  AATTCACATAAACATCAAAAAACTTGTTTCTTTTAGACGCACAATCTACAAAATTTAATT
+ 53747  AAAAAATATGCATAAACACAAAAATTTTAATATATAAAATTAAATTCTATTTTTACTTAT
+ 53807  GATAATTTTTTCTAGACTACAGCCAGCTTACGTAGATAAACTCATACATAATCTAAACTA
+ 53867  ATAATAATTTTCCTTTCATACATATCCTTAAACAACAATTATACACCAGTAATAATAAAA
+ 53927  AATAACAAAAAAAAGCTAACTACTACAATAAACATAAACATATATACTTAGTTAATCTCG
+ 53987  TAAACGAATAAAAAAATTCAAAAAATCTATTAAATAAAGTAATTAATTATTGACTTTTCT
+ 54047  CTTAATCTAGTTAACATCACATAAAAACACAGCCAAGGTAAGTATACAATAAATATAAAA
+ 54107  ATTTACAATAATTTTTAATATTGCTACACAAAATAAACTCTTTTTGACTATTCTTAATAT
+ 54167  CTCTAGTTACCTGACTAAACTAACTTCTCCTTAATACCTAACTTCAAACTAAATTTATCA
+ 54227  AACCCCCACACCTATAATCTTCATCATTGCTATAAAACAAAAAAAAATAAACACTAACCA
+ 54287  AAATTTAAAACTATTTTACAACCAAAAACCTAAAAAGCTAACATATAAAAATTTATATTG
+ 54347  AAAATGCTTCTAATATATTATTACATTCTTAATAAAAATATTGTTTTTATTAAATTATCT
+ 54407  ATTATATTTTTACTTGTTTTTTCAATTTATAGTAATTTTTTTAACTTAAAAATAAAAACA
+ 54467  CCATCTGCCTTTATTAAACATAAAATTCCCTTTAACATCTTAAGTAAAACACATTTACGT
+ 54527  ACCTAGTACATAAATTTAAACACTTATGTACAAATTCTATATCAATAAACTCTAATATAA
+ 54587  ATAAATAATTCTTTTAGATATTTCCTAAAACAATAATAACACTATTTTTTTACAACCCAC
+ 54647  CTAAACCTATTTTAAAATTATGAAAATACGCACTTTAAAATTAATAAACGATTATTTATA
+ 54707  TATTATAGTAACCATAAAAATACTTACACATATTACAGTATTCAAAATATTTTTATGTTT
+ 54767  ACATAAAAATATTTTGAATACTGTAAGCATATATATATCACTAACATATTAACATTCTAC
+ 54827  ACAATTTTTAAACATATATACGTATTTACTCCAAAATCTAAACGGCAATAAATATTACTT
+ 54887  CTATATATCTACAAATATAGTACAAGTAAACATATATAAAAAAAGCATAAAAACATGCAT
+ 54947  TCAAAAAAG
+;     G-rpl5 <== end
+ 54956  TTACGAAACTGAAACTCGACAAGGAATTTTATAACTGATTAACAAAGAATGAAAAGCCTG
+ 55016  GTACACAGTTCCAACCGTTCCGTATACGTTTATATTGTAAACGAATGAATCTTCAAATTT
+ 55076  TAATACTTGCGAAAGAAAATCATCATCTTGTATTCTTTGAACAATTTTAAGCAAAAAATT
+ 55136  TGGTTTTAAACTATTCAATTTAATAGGATAAAACTGTTGACTCAAAGGAAGTTGCCTTGC
+ 55196  AAAAAAAAACTCCGATATTCAAATACTATGACGCAACGTTAGCCAAAGCCCACTCAACTT
+ 55256  CTTTTTCTTCAAGCCACGAATACTATGAGTACGTATTAAAATTTGTGGTTTTTGTCCGGT
+ 55316  TGTTAAGTATAGCAAAACTAATAACCGGTAAAAATTATTAGTTACCTTAAGATCTGATAA
+ 55376  AAATTTTGAATAAATTACAATAGAATCCAATTTTGGGCAATTATAAATATTACTTAATAA
+ 55436  AAATTTATCAAATAAAAAAATTTGAGTAAATATAGATTTATATAATAAAATTTTAGACTC
+ 55496  AATAGGTCGCAT
+;     G-rpl5 <== start
+ 55508  ATAACTAAAATATTTATA
+;     G-rpl14 <== end
+ 55526  CTATACTAACTTACTAACAACTGAGGCTAATCTCATAAACAATCCACAGCGAATTTCTTT
+ 55586  TAACGCAGGTCCAAATACACGCGTACCTAAAAGTTTTTTTGTTTCCGATAAAACAATACC
+ 55646  TCGTGTCTCATCAAAACGTATACGAATACCATTTTTTCTACTGATATTTCTCTTAACAGT
+ 55706  TACGATTAAAGCCAAACATCTCTGTTTTTTTTGAACTTTACGACCTACTCGATATCTAAA
+ 55766  TATTGATCCTAATACCAATTCTCCAACCTTACTATAATTTTGTATCAAAGAATACCCAAA
+ 55826  TAAATGAAATATTCTAATCAATTTAGCTCCCGAATTATCAACGATTTTTAACTTAGTTTG
+ 55886  TTTTCTAATCAT
+;     G-rpl14 <== start
+ 55898  TTATCAAATAAATACACTTATATTATAAAGCACCCTTCTAAACACCACCAATCTAACAGA
+ 55958  ACATGATTAAAGAATATTTAACCAGTCAATTCCAACAATAATCCTACTTAACAGCACAAA
+ 56018  CCTAATCTTTATAATTTTATTAATAAAAATTATATTTTAGTAGCTAAACCATTAAAATCA
+ 56078  TCCTCACAATTCAACTAATAGTTTTAAAAGCTAAAGAACATGCCGAATACCTAACGCAAG
+ 56138  GAGATACTTTTTATAAAATTTTATTTACTT
+;     G-atp8 <== end
+ 56168  TTATGTAAAAAAAATTTCAGACACACATGCCTGTCTTAATAGCACAACACTATAATTCTT
+ 56228  GCTATACTTCTGCTCAATAAAATCCCACTGCTTTTGCTTATAATGTTTTCAGTTCAATTC
+ 56288  TTTATTAAATAATAAAGAATTACCGGTTATATTTAAATTTTTAAAATGTACTAAAAAATT
+ 56348  ATAAATAAAATTATTTACAAAAACACTGCGTTCATTTAAAACACCACAACCAATAGAATT
+ 56408  ATAAATCTCACGAAGTTTAAATAACTTACTAAAATCTAACAAATAATACTTTCATAAAAT
+ 56468  TAAAAAAAAAAAATAATAAAACAAAATTGTTCAAAAAACTTGAGAAAATACGGTTACTTT
+ 56528  ATCTAATTGTGGCAT
+;     G-atp8 <== start
+ 56543  CAATTTACATTAACAAAAATACAACTAGTTTAAGGCAAAGTTTTCTAGTATACGTCACAA
+ 56603  CCATAATTTTTCAAATACAAAAACATAAATAGAATTTTTTAAAAAATAATGAACCATACA
+ 56663  TCATCTGATACCCCTAAACCTTTATCTTTTTTAATTTATAATAGTTTTATTTAGCAAAAC
+ 56723  TAATATATTGAAAATATTTTGCTAACAGCCACGAAATAGTGCATTTAATTTAATCATAAA
+ 56783  AATACCTTCCTAATATTTTAGAAACAACCTTAACTAGTAAAATTCTATGAATATTTTTAG
+ 56843  TATTAATTTTAATGGCTGTTCAAAATAACCAACAATATAATCAAAATCATATCTAACAAA
+ 56903  TAAAAATATTTAAATTTTAATTTGGAATTTGCACTTTAAACTGAAATCAATCTGTACTTC
+ 56963  ACTTTTATACAATATTTTCTCGTAGTACATAAAACTCCCACACTACTAACCGTTAGCTAT
+ 57023  AGCATTTAATTACAATCTATAATTTACTCCTATACCGTAGTAACTCTTTCTAATATTAAA
+ 57083  AAATCAAACTATACCCCAACTTTTTTATTAATGTTTAAATAATCTTTTTTAAAGTAAAAC
+ 57143  AGTAAAGGCTTTGATAAGTTTATACAATCTTAAAGCTAATTTAAATTATTCAATTTCTAT
+ 57203  ATAGTATTAAATTAAAAATACTTTTCATAAAATGAATACGAAAATCGTTGCCATAATATG
+ 57263  TACACGAAAATTATTTATCACAAATTAATAAACCG
+;     G-orf241 <== end
+ 57298  TTAGATGTTATTTCTACTAGTAGTAAAAGGCGGTATTAATCCTAAACCATGAAAATTAAT
+ 57358  CAATTTCAATCGAATTTGTAATAAATTAAAAATTTGTAAAAAATTAACCGTTTTCTGATA
+ 57418  CTCAGACATCGCTAAAAAAAATATTTTACCAAAATTAACGACCCGAAAATAATCATTTCG
+ 57478  TAAAGTAAATTTTTTCAAAAAACTTCACTTATAACGATAAAAAAGTTTTTTTACAAAAAA
+ 57538  TATTACAGGCCTATCTAGAAAACTAAACCAAGAAATTACTTTTGTTAGATATTCTAACAT
+ 57598  GTAAAATGCCAAAGTAGCATTAATAACCAATGCTAAAATGTTTTCCCTCAAATTAGTAAA
+ 57658  TACTACATTAAATAAATCAGCCTCAACATCTTCATCCTCTATTTCATTTTTATTTTCCTC
+ 57718  TAATGGGCGAATAAGTGTTTCTTTTCAAGTACTATTTAAATACATTGATAAAGGTAAAAC
+ 57778  CCCAACAACACGAACATAAGATGAATAAATTTTAGGAAAAACTACTATATTCAATATACA
+ 57838  ATAACGCGTAAATAAAAACTTCAAAAAAGAAAAAATATATTTAGTTTTATCAACAAAAAC
+ 57898  ATCAAAAGTACATATACTTAATTTGCAAGATTCGTATTCCTCTTGATATATCAATTTTCT
+ 57958  TGATTCAAATTGAAGTTTTCGTTTCTTAATTTTCTTATGCCCTCCAAATCAACTTCGCTT
+ 58018  TTTCAT
+;     G-orf241 <== start
+;     G-rpl16 <== end
+ 58024  TTATTTACAAATACAATAAACCTGAATTGGTAGTTTTCGTGCAATACTTTTCAACAGTAA
+ 58084  ACGAGCCTGATTTGATGGCAATCCAGATACTTCACATAGCACAAAACCAGCCTTAACTTT
+ 58144  ACATACCCAATCGTCTATGTACCCCTTACCTTTTCCCATACGTACCTCAAGCGGCTTAGC
+ 58204  TGTTATAGCTTGGTGAGGAAATACACGTATTCAATACTGACCAATTCGCTTAGTACTCTT
+ 58264  AGACAAATTTAACTTAACCATTTCTAACTGCTTAGATGTAATATAACCATTCTTTTTAGC
+ 58324  TTTTAAACCAAAATTGCCAAAATCCAAATTTAAAAAACGTGTTGCTAAATTTTTAATCTT
+ 58384  CTTTTTCTGAAATTTTATAAACTTAGTTTTTTTAGGAATAATTCCAACCAT
+;     G-rpl16 <== start
+ 58435  TTTCTCAACAAA
+;     G-orf327 <== end
+;     G-rps3 <== end
+ 58447  TTAATATATTCAAATTTTAACTCCTATAATTCCTGCCCGTGTAATAGCTTCAGCAAATCC
+ 58507  ATATCCCAGTATTAAAGGACTATTTTTAGAAGCAACAGAACCAATTTGTATATGTTTAGT
+ 58567  ACGAGCACGGCTAAAACCATTTAGTTTACCTGCTAATAAAATCTTTATCCCTTTAAACCT
+ 58627  AAAATATTTTCAAATCTCCTTCAATACTTTTGACAAAAATGATAAAAAAGATAAATGTTT
+ 58687  ATATATTTTAGACAAAATTGGAGCAATATAATTTGCAATTAATTTCGCATCCGGAATTTT
+ 58747  ATATTTTAGCGCATGCACTAAAAATACTAACGTATTCCGATATATTCCAACATCTTTATT
+ 58807  AAAACGTAAACGGAAGCGCTTGAAACTATCTGAAACAACTCGTAATAACGTTGACAAATT
+ 58867  TTTTTTTCATTCTTTATAACCTACAACACATATATTTACGAATTTGTAAAATATCTGCCT
+ 58927  CTTTAGCAAAAACTCTAAAACGTATAAAAACATTATACTACGTAATTGCACTCGGCATTT
+ 58987  ATGAATTGGCTTCATAACAGTTCCTAGTGTAGCTAAAATCTTTTTTCTATCCTTACGTTT
+ 59047  TTTAGTTTTCTTACTTTTATATTGAAAATTTTTCTTGCGCTTCTCTTCTTCCCTTATTGG
+ 59107  ATAAAAAAATAAGAAATTTACGTATAATTTCCCTAAAACTCAAAATAAACGCACCGTACC
+ 59167  CACTACTCTACGTTTTCGAGCTTTTGTTCAACGTGAAAGAAAAAAATTAACATACTTACG
+ 59227  AACAAAAATCTCTTCATAAATATGCCTACAATACTCTAAAGATTTCTTAACAAATCAATT
+ 59287  TGATTTGAATAAAAATTTTTGATAAATAACTTTATGT
+;     G-rps19 <== end
+ 59324  TTACCGTTTCGCTTTATATACAT
+;     G-rps3 <== start
+ 59347  GCAATTTTCGAGTAAACACAAAACATCCAAATTTATACCCAACCATTCCTGGAGAAATAC
+ 59407  ACAAATCAAAAAATCTACATCCAT
+;     G-orf327 <== start
+ 59431  TATAAATTTTAATCCTAACCTGAATAAAATCTGGCAAAATCATACTATCTTTACGTTTTA
+ 59491  AAAAAATAATTTTATTACTTTTGTTCTCACCATAAAAACTTGAATAAACTTTTTGCGTTA
+ 59551  TAAAAGGCCCTTTTCAAATTGCTCTCAT
+;     G-rps19 <== start
+ 59579  ATTTATATATTATTTATTACCTAAGCAAGCATTTTACAAAATGCGATACAGCATTTCCAA
+ 59639  AAATACATATTAAAATTTCAATCATAAACCAAAAAAAGGGAGGGGGGGGGTAAGGCGTTT
+ 59699  CAACAATTCCAGCCTTCGCAGAACAAAAACGCATCACATTTATCTGCGAAGCACTTTTAT
+ 59759  GGAACAGTGAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCGCTTAGGCAAT
+ 59819  TCTTTTAGAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAAA
+ 59879  TTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATTT
+ 59939  TAAGCGTAAACCAAGGGGGGGGTAAGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAA
+ 59999  ACGCATCACATTTATCTGCGAAGCACTTTTATGGAACAGTGAATAAATCACAGCAAAACG
+ 60059  CATGGATTTAGGTGTAAAATCGCTTAGGCAATTCTTTTAGAATACAACCGCGTAAAATTT
+ 60119  CCTTAGCGATTCTGTGCGACGATGCATAAAAATTGGAGGGAATTTTGCACCTACGCCAGA
+ 60179  GCAAGCATTTCCAGAAATGCATATTAGAATTTTAAGCGTAAACCAAGGGGGGGGGGGGTA
+ 60239  AGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAAACGCATCACACTTATCTGCGAAGA
+ 60299  ACTTTTATGGAACAGTGAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCGCT
+ 60359  TAGGCAATTCTTTTAGAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATG
+ 60419  CATAAAAATTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATAT
+ 60479  TAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGGGGTAAGGCGTTTCAACAATTCCAGC
+ 60539  CTTCGCAGAACAAAAACGCATCACATTTATCTGCGAAGCACTTTTATGGAACAGTGAATA
+ 60599  AATCACAGCAAAACGCATAGGTTTGGGTGTAAAATCGCTTAGGCAATTCTTTTAGAATAC
+ 60659  AACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAAATTGGAGGGAATTT
+ 60719  TGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATTTTAAGCGTAAATCA
+ 60779  AGGAGGGGGGGTAAGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAAACGCATCACAC
+ 60839  TTATCTGCGAAGAACTTTTATGGAACAGTGAATAAATCACAGCAAAACGCATAGATTTAG
+ 60899  GTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCGTAAAATTTCCTTAGCGATT
+ 60959  CTGTGCGACGATGCATAAAAATTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTC
+ 61019  CAGAAATGCATATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGTAAGGCGTTTCAA
+ 61079  CAATTCCAGCCTTCGCAGAACAAAAACGCATCACACTTATCTGCGAAGAACTTTTATGGA
+ 61139  ACAGTGAATAAATCACAGCAAAACGCATAGATTTAGGTGTAAAATCGCTTAGGCAATTCT
+ 61199  TTTAGAACACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAAATTG
+ 61259  GAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATTTTAA
+ 61319  GCGTAAATCAAGGAGGGGGGTACGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAAAC
+ 61379  GCATCACACTTATCTGCGAAGCATTTTTATGGAACAATAAATAAATCACAGCAAAACGCA
+ 61439  TGGATTTAGGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCGTAAAATTTCC
+ 61499  TTAGCGATTCTGTGCGACAATGCATAAAAATTAGAGGGAACTTTGCACCTACGCCAGAGC
+ 61559  AAGCATTTCCAGAAATACATATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGGGGT
+ 61619  AAACGTTTCGATAACTTCAGCCACCACAGAACAAAAACGCATCAAATTTATCTGCGAAGC
+ 61679  ACTTTTACGGAACAATAAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCGCT
+ 61739  TAGGCAATTACTTTAAAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACAATG
+ 61799  CATAAAAATTAGAGGGAACTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATACATAT
+ 61859  TAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGTAAACGTTTCGATAACTTCAGCCACCA
+ 61919  CAGAACAAAAACGCATCAAATTTATCTGCGAAGCACTTTTACGGAACAATAAATAAATCA
+ 61979  CAGCAAAACGCATAGATTTAGGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCG
+ 62039  CGTAAAATTTCCTTAGCGATTCTGTGCGACAATGCATAAAAATTAGAGGGAACTTTGCAC
+ 62099  CTACGCCAGAGCAAGCATTTCCAGAAATACATATTAGAATTTTAAGCGTAAATCAAGGAG
+ 62159  GGGGGTGGGGTAAACGTTTCGATAACTTCAGCCACCACAGAACAAAAACGCATCAAATTT
+ 62219  ATCTGCGAAGCACTTTTACGGAACAATAAGTAAATCACAGCAAAACGCATAGATTTAGGT
+ 62279  GTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCATAAAATTTCCTTAGCAATGAC
+ 62339  GCATAAAAATTAGAGGGAACTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATACATA
+ 62399  TTAGAATT
+;     G-orf784 <== end
+ 62407  TTAAGCGTAAATCAAGGAGGGGGGGGGTGGGGTAAACGTTTCGATAACTTCAGCCACCAC
+ 62467  AGAACAAAAGCACACTT
+;     G-orf736 <== end
+ 62484  TTAGGGGCAGCAA
+;     G-orf767 ==> start
+ 62497  ATGCCAGA
+;     G-orf761 ==> start
+ 62505  ATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGT
+ 62565  TATGCAAATCAAAA
+;     G-orf735 ==> start
+ 62579  ATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGT
+ 62639  TGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCC
+ 62699  CCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTA
+ 62759  CAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAG
+ 62819  GGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCAT
+ 62879  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 62939  TGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAG
+ 62999  AATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAG
+ 63059  TTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGA
+ 63119  GAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCT
+ 63179  CGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAA
+ 63239  ATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGT
+ 63299  TGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCC
+ 63359  CCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTA
+ 63419  CAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAG
+ 63479  GGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCAT
+ 63539  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 63599  TGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAG
+ 63659  AATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAG
+ 63719  TTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGA
+ 63779  GAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCT
+ 63839  CGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAA
+ 63899  ATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAAT
+ 63959  TGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCC
+ 64019  CCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTA
+ 64079  CAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAG
+ 64139  GGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCAT
+ 64199  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 64259  TGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAG
+ 64319  AATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAG
+ 64379  TTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGA
+ 64439  GAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCT
+ 64499  CGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAA
+ 64559  ATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGT
+ 64619  TGCCACCCTCCCCAGTGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCC
+ 64679  CCCCTCGGCACGGCAT
+;     G-orf736 <== start
+ 64695  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 64755  TGCCCAT
+;     G-orf784 <== start
+ 64762  TTTGGCGAGAAAATTTGTACGTTAA
+;     G-orf735 ==> end
+ 64787  CTAG
+;     G-orf761 ==> end
+ 64791  TTTATGGTAA
+;     G-orf767 ==> end
+ 64801  TATATATAGTATAAACATTAATAATATTTATAATATATGTATACATTATACTTAATATAT
+ 64861  ATAGTATAAACATTAATAATATTTATAACATATGTATACATTATACTTAATATATATAGT
+ 64921  ATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATTATACTT
+ 64981  AATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTT
+ 65041  AACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATAT
+ 65101  ATGTATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAG
+ 65161  ACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATACATT
+ 65221  ATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATACA
+ 65281  TATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTA
+ 65341  TAATATATGTATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATA
+ 65401  GTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATA
+ 65461  AACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATACAT
+ 65521  TATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATAC
+ 65581  ATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATG
+ 65641  TATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACA
+ 65701  TTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAAT
+ 65761  AATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATAT
+ 65821  TTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATA
+ 65881  ATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATA
+ 65941  TGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTAT
+ 66001  ACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATT
+ 66061  ATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACT
+ 66121  TAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATA
+ 66181  TATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATAT
+ 66241  AGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTAT
+ 66301  AGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACA
+ 66361  TTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAAT
+ 66421  AATATTTATAATATATGTATAATGTATACATATGTATACATTATACTTAATATATATAGT
+ 66481  ATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATA
+ 66541  CATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTA
+ 66601  TACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATA
+ 66661  TTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTAT
+ 66721  AATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATAT
+ 66781  ATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTA
+ 66841  TACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACAT
+ 66901  TATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATAC
+ 66961  TTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAAT
+ 67021  ATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATA
+ 67081  TAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTA
+ 67141  TAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGAC
+ 67201  ATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAA
+ 67261  TAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATA
+ 67321  TTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTAT
+ 67381  AATATATGTATAATGTATCATA
+;     G-orf1511 <== end
+ 67403  TTACCATAAA
+;     G-orf1486 <== end
+ 67413  CTAG
+;     G-orf1472 <== end
+ 67417  TTAACGTACAAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCAC
+ 67477  ATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGCGGCGCGCGC
+ 67537  GAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTT
+ 67597  CTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATA
+ 67657  ACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACAT
+ 67717  TCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCCAAAATGGGC
+ 67777  ACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATA
+ 67837  TATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCC
+ 67897  CTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTG
+ 67957  TAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGG
+ 68017  GGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAA
+ 68077  CTCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATT
+ 68137  TTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGA
+ 68197  GCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCT
+ 68257  CGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAAC
+ 68317  TTTTTCGAGTATAT
+;     G-orf589 ==> start
+ 68331  ATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 68391  TGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGT
+ 68451  AGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATAT
+;     G-orf699 ==> start
+ 68495  ATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 68555  TGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGT
+ 68615  AGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGG
+ 68675  GGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAA
+ 68735  CTCCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATT
+ 68795  TTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGA
+ 68855  GCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCT
+ 68915  TGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAAC
+ 68975  TTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTC
+ 69035  TGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCAC
+ 69095  GTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATA
+ 69155  TGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 69215  TGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGT
+ 69275  AGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGG
+ 69335  GGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAA
+ 69395  CTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATT
+ 69455  TTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGA
+ 69515  GCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCT
+ 69575  CGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAAC
+ 69635  TTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTC
+ 69695  TGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTCGCCAAAATGGGCAC
+ 69755  GTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATA
+ 69815  TGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 69875  TGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGT
+ 69935  AGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATAT
+;     G-orf370 ==> start
+ 69979  ATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 70039  TGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGT
+ 70099  AG
+;     G-orf589 ==> end
+ 70101  CACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGG
+ 70161  GCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATT
+ 70221  CCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTT
+ 70281  TGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGC
+ 70341  TTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCG
+ 70401  CCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTT
+ 70461  TTTCGAGTATATATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGG
+ 70521  CATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTC
+ 70581  TGAAAGATCTGTAG
+;     G-orf699 ==> end
+ 70595  CACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGG
+ 70655  GCGGCGCGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGC
+ 70715  AACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACA
+ 70775  TTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGGGCGCGCGCG
+ 70835  AGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTC
+ 70895  TTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAA
+ 70955  CTTTTTCGAGTATATATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTC
+ 71015  TGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCAC
+ 71075  GTCTGAAAGATCTGTAG
+;     G-orf370 ==> end
+ 71092  CACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGG
+ 71152  GGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAAC
+ 71212  TCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTT
+ 71272  TTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGC
+ 71332  TTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTCG
+ 71392  CCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTT
+ 71452  TTTCGAGTATATATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGG
+ 71512  CATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCCAAAATGGGCACGTC
+ 71572  TGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGC
+ 71632  CGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGG
+ 71692  GAGGGTGGCAACTCCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCA
+ 71752  CTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGC
+ 71812  GGCGCGCGCGAGCTTTTCATACAT
+;     G-orf1472 <== start
+ 71836  TCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCAT
+;     G-orf1486 <== start ;; mfannot: GTG upstream: 71924
+ 71874  GAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTGATT
+ 71934  TGCAT
+;     G-orf1511 <== start
+ 71939  AACTTTTTCGAGTATACGTTATTATAAGTTATATTTAGAAATGATATAAATTTTCTAGCG
+ 71999  GTGGTTAATAACGACCAACATTATATACATTTTTATTTTTGTTAATAGCAGTTAAGTATT
+ 72059  GGTTAGAAAATAATATAATTTTTAATTTTCTTTAA
+;     G-trnW(uca)_1 ==> start
+ 72094  AGGGAGATAGTTTAACGGTAAAATATCGATCT!TCA!ACATCGAGGTTATAGGTTCAAAT
+ 72152  CCTTTTCTCCCTG
+;     G-trnW(uca)_1 ==> end
+ 72165  AGATTTTTTAAGGT
+;     G-rps13_1 ==> start
+ 72179  ATGAAAACATCAATTCAATTTTTTAATTTACAGTTTTTGATTGAAAAAAAATTATTAATT
+ 72239  TCGTTAACGCAAATTTTTGGCATTGGTTTTTACTCTGCTATAGTAATTTGCAAAAAATTT
+ 72299  GGTTTTAATAAAAATACATATATTAAGAGTGTGGATGTAAGGATTGTAAATGCAATGCGT
+ 72359  AACTTTATTTTGGATAAATTTGTTGTTCAAGAACAACTGAAAGAGCAGATTCAGGTATCT
+ 72419  ATAGTAGAGTTGGACACTATAAAGAGTATTAGAGGGTTTCGGCATAAATTGTGTTTACCT
+ 72479  GTTCATGGACAGCGAACTAAAACTAATCGGCGTACTCAACGTAAATTTAAAAGAATGCAG
+ 72539  AGTAAATTATGGGAAGAGGATTCAACACATATTCGTTAA
+;     G-rps13_1 ==> end
+ 72578  AACATAAATTTCAGTTTAGAAAATTACGTAGAACCCTTTTATCCTTTCGAAAGAGATCTT
+ 72638  GTATTCTAAATATTAAAATTACATTGAATAA
+;;     G-rps11 ==> start ;; First ATG found at 72867 HMMmatch = 60,139
+ 72669  CATATATTTAACTTTATCTGATTGATTTGGTCAAATTATTATGGTGAAATCTGGTGGGTT
+ 72729  ATTAAAATTGCCGGTTCCGGTAGAAATACGAATTATGCCTTAGAGCTTTTAATATTAGAT
+ 72789  GCTATTAAGCAATTAACTTTGTTAAATACAAAACATATTGTTTTAAAGTTTGATCATCGT
+ 72849  GTTTTAAGGAAAAAGAAAATGATTTTAAAGTTATTAAAAAAATTTAATATTAAAATTTTT
+ 72909  CTTATACGATTAATTATGTGTAAAGTTCATAATGGAATTACATTAGCTAAAAAACGGCGG
+ 72969  GTTTAA
+;;     G-rps11 ==> end
+ 72975  GTTATC
+;     G-rps14_1 ==> start
+ 72981  ATGTTGCGTAAGGTTATTTTTGAGTCAAATACCAGATATACATTTAAGTATTTTGAGATT
+ 73041  AAACAAAGAATTATAAAATCGTTATCAAAAAATTTATACTTGCCTATATTAGTTCGACGT
+ 73101  AAATTGTTGTGGCAATTAGATAAATTATCTTTATTATCATCTTTAATTTATGTAAAAAAT
+ 73161  CGATGTGTTGTTTCTGGTCGTGCTAAATCGATTTATAAATTTTTTAATTTATCTAGAATT
+ 73221  GTTATAAAAAAATTTTTTAGATTAGGTTATATACCTGGTTTAAATAGATCAAGTTGGTAA
+;     G-rps14_1 ==> end
+ 73281  TTTAGTAATATAAAATAAAAGTTTATTG
+;     G-rps8_1 ==> start
+ 73309  ATGGTTAAATTAGGACAATTTATTTCAATTTTAAATTTTAATATTAAAGCAGGAAAGTCT
+ 73369  TTTTTTGTAATAGTTAAAACAAGGATAATTTTGGATATTGTAAAAATCTTGATTGAGCAA
+ 73429  AATTACATTCTTGGTTATACGGATTTAAAAGAAAATGGTGATAAAATTATTGTGTTTTTT
+ 73489  AAGTTAGATTTTGCGAAAAGTAATAGCCTTTTACTTAAGGGATGTAAATTTGCATTATAT
+ 73549  AAAAATAGATTTACAAGTATTGGTGCCAATAATATAGTGAATAACTCGTCGTTGGTACTT
+ 73609  GTGTCTACTGTGAAGGGCGTTATGACTCAGTTGGAGGCTAAAAAACTTCGACTTGGGTAT
+ 73669  TATCTTGTGTTATATAATATAAAATTGTATAAAAAAATA
+;     G-rpl6_1 ==> start
+ 73708  ATGAGAGCTAAATTTATTTATCAAATTTTTAATAGGTTGTTTATCTATATATTTCAACAC
+ 73768  AATAAATTACTGTATATTCGAGGCCCTCTGGGTTTACTACGCTATAATGTTCCCAGTGGC
+ 73828  ATTGATATTTGTAAATATCGGTCAATGGTGTATATTTCTGGACAAAAAGCTGCCCACCCT
+ 73888  TTAGTTGCAATGTCACATAGAATAGTTTGCCAGAAAATGAAAGGGCTTGAGGTTGGTTTT
+ 73948  TCTGAAATTATGATAATTGCTGGTATGGGTTGGCGCGTTGATAAAGAAGACGTTTTATTA
+ 74008  AAATTTACAATTGGTTATAGTCATATTGTACATTATCTGATTCCGAATGATATTGAAATT
+ 74068  GTTTTACTTAGTAAAAATCTTTTTAAGATTTTTGGTTCTGATTTGAGTCGAATTCAGTGC
+ 74128  ATTGCGTCCGAATTGTGCAAACTGCGTTCATCTGATGTGTATAAAGGTAAAGGAATTCGT
+ 74188  CGTCAAGCTTTTAAAGTAGTTTTAAAATCAAGTACTAAATCGAAAGTTTAA
+;     G-rps8_1 ==> end
+;     G-rpl6_1 ==> end
+ 74239  TTTATGAAGAAAGTAAGCAGTGTTTTTATATTTTATTGTTTTTTAAATT
+;     G-rps12_1 ==> start
+ 74288  ATGGTTACAATTAATCAATTAATTCGATTAAGTCATCCTACTAAAAATAGGAAAAATACG
+ 74348  GTGCCCGCTTTAGACAGTAGTCCATACAAGAAGGGTGTTTGTTTAAGAGTATTTACGATG
+ 74408  ACTCCTAAAAAACCAAATTCCGCATTACGAAAAGTTGCCCGCATTAGATTATCAAATGGA
+ 74468  TATAAAATAACGGCGCATATCCCTGGTGAAGGTCACAATTTACAAGAATATTCGATTGTA
+ 74528  TTAGTACGTGGTGGGCGTGCTCGCGATTTGCCTAGTGTTCGATATAAAGTTGTTAGAGGT
+ 74588  AAATACGAT
+;     G-orf106 <== end
+ 74597  TTAGAACCTGTACGTAATAGGAGAACTCGGCGATCTAAATATGGTATTAAAAAAATATAA
+;     G-rps12_1 ==> end
+ 74657  AAATTGATTGATTGCGTCTGAGAGTATACATAAGTATTTGTTGTGGGTTAATGTTGAATG
+ 74717  GAAAAGTGTCTCAATTAGAAAAAATTGTTTTTTTTGTTTTCGAGACTTAAAATATAAGTT
+ 74777  TAATATGGATTCGTTGTTCTCTGTTTTTATATGTTGTAGACGAGATAATGCCTTATATAG
+ 74837  AGCTTCGTACGTTAAGGTTAGGGAGTGTTTTTTATCGAATACCAAAGCCTCTTCGAAAAG
+ 74897  TAAGCAGTTAAATTGTGGCAT
+;     G-orf106 <== start
+ 74918  TAAGCTGTTAGCCAAAACTGTCTAATTTAAACTTGTGTGTACGCAATGTAGCGGCGCTAT
+ 74978  AAAAAATACAACAGGAAATTTTAGCTGTTCTTCAAAAGAAAAGTTTACTTTTTAAGCAAA
+ 75038  ATAGAAATCTGTATCAAGTTGCGTCAACAACAGATCGTTTGCACATTACCGGTGGGATTA
+ 75098  GTTTTTACGAAATAGCGTGTCATATATGTGTAGTACAAT
+;     G-trnP(ugg)_1 ==> start
+ 75137  CGGAATATAGGCATAATGTAATGTATCTGATT!TGG!GATCAGATGAGTATAGGTTCGAG
+ 75195  TCCTATTATTCCGA
+;     G-trnP(ugg)_1 ==> end
+ 75209  AGTAAGGTATTTATTATAATTAGAAGTGTATATGAAGCGAATTAAATATTTTAAATTTAA
+ 75269  GTTTAGGGATATTTCAAAGGAAATTATTTAAGAAAGTCATATTTTAGATTGTTAAAAACA
+ 75329  AAAGCATATTTTAAGATTTTATTGGTGGATTAAAACAACGACAATTAGCACGGATTTACA
+ 75389  AAATTATTTATTCTAAACGGTTGTTTTTAACTTTTCTTACGAAATTAGAATATCGTATGA
+ 75449  ATTTATCTTGAAAGCCGGGTTTGTTTTAACCGGAAAACAGGCTAGGCAATTAATTCGCAT
+ 75509  AAGCATGTTATTGTGAATGGACAGCGGACTCAATTTTGCAATTTGCATATAAAAACATTT
+ 75569  GATATTATATCTCTAGAATCAGTAGTATTTTCAAAGTATAAACGCAAACTAGTATCAAGT
+ 75629  TTTTTTAAAACTCCAGGTTTTTTTGGTTATTTACGGCGACGTGGTATAAAAAAGAAACTT
+ 75689  ACAGTTCAACGTATGTTTATTTATGCTAAATTTCAATTTTTTTGCGAAACTAATTATAAA
+ 75749  ATCTACGATGTTTTTTGTGCGAAAGCTTAATTTGCATAAGATTTCTTCGTCTCAAGTTCT
+ 75809  TTTAATGTATGGGTGGTGACGAATACGTTTTTTATTTTAAAAAACGTTTTGTGATTAGTT
+ 75869  AAATTTTTATTATAATTATTTTTAGGTTTATAATGAGTTTAATTTTAGGTGATGTGTGTT
+ 75929  TAACAATAGTGTGCTTAGTTTAATTATTTTTTTACCGTTGTTTAGTAGTTTTTGTTCTGG
+ 75989  ATTGTTTTTGTTGGATTGGAGCTAAAGGTGTTGCTTTTATAACTTTTTTATCTCTACTAG
+ 76049  GGTCATTAATTTTAACTTGTAATTATTTAAGTTTTATAAGTTTTTATTTGGTTTCAAATT
+ 76109  ATGTATCCGTATTATCTTGGATGAAATTAGGTTCATTTTATGTGACATGGTCATTTTGTT
+ 76169  TTGATAGTTTATCTCGTTAATGCGGTTTTTTAGTTACTGTGTTAGTTAGTTTAGTTTATC
+ 76229  TATATGTGCGTCCTAGGTTATTTTTATTTTATAAATTTATTAAAGTAAAGTTGGAAATTT
+ 76289  AAGTGAGTTGTAATGGGCGAATAAGTGTTTCTTTCAAGTACTATTAAAATACATTGATAA
+ 76349  AGGTAAAACCCCAACAACACGAACATAAGATGAATAAATTTTTAGGAAAAACTACTATAT
+ 76409  TCAATATACAATAACGCGTAAATAAAAACTTCAAAAAAGAAAAAATATATTTAGTTTTAT
+ 76469  CAACAAAAACATCAAAAGTACATATACTTAATTTGCAAGATTCGTATTCCTCTTGATATA
+ 76529  TCAATTTTCTTGATTCAAATTGAAGTTTTCGTTTCTTAATTTTCTTATGCCCTCCAAATC
+ 76589  AACTTCGCTTTTTCAT
+;;     G-rpl16 <== end
+ 76605  TTATTTACA
+;;     G-rpl16 <== end
+ 76614  AATACAATAAACCTGAATTGGTAGTTTTCGTGCAATACTTTTCAACAGTAAACGAGCCTG
+ 76674  ATTTGATGGCAATCCAGATACTTCACATAGCACAAAACCAGCCTTAACTTTACATACCCA
+ 76734  TCGTC
+;;     G-rpl16 <== end
+ 76739  TATACCCCTTACCTTTCCC
+;;     G-rpl16 <== start ;; 86,134
+ 76758  ATACGTACCTCAAGCGGCTTAGCTGTTATAGCTTGGTGAGGAAATACACGTATTCAATAC
+ 76818  TGACCAATTCGCTTAGTACTCTTAGACAAATTTAACTTAACCATTTCTAACTGCTTAGAT
+ 76878  GTAATATAACCATTCTTTTTAGCTTTAAAACCAAAATTGCCAAAATCCAAATTTAAAAAA
+ 76938  CGTGTTGCTAAATTTTTAATCTTCTTTTTCTGAAATTTTATAAACTTAGTTTTTTTAGGA
+ 76998  AT
+;;     G-rpl16 <== start ;; 4,91
+ 77000  AATTCCAACCAT
+;;     G-rpl16 <== start
+ 77012  TTTCTCAACAAATTAATATATTCAAATTTTAACTCCTATAATTCCTGCCCGTTAATAGCT
+ 77072  TCAGCAAATCCATATCCCAGTATTAAAGGACTATTTTTAGAAGCAACAACCAATTTGTAT
+ 77132  ATGTTTAGTACGAGCACGGCTAAAAACCATTTAGTTTACCTGCTAATAAAATCTTTATCC
+ 77192  CTTTAACCTAAAATATTTTCAAATCTCCTTCAATACTTTTGACAAAAATGATAAAAAAGA
+ 77252  TAAATGTTTATATATTTTAGACAAAATTGGAGCAATATAATTTGCAATTAATTTCGCATC
+ 77312  CGGAATTATATTTTAGCGCATGCACTAAAAATACTAACGTATTCCGATATATCCAACATC
+ 77372  TTTATTAAAACGTAAACGGAAGCGCTTGAAACTATCTGAAACAACTCGTAATAACGTTGA
+ 77432  CAAATTTTTTTTTCATTCTTTATAACCTACAACACATATATTTACGAATTTGTAAAATAT
+ 77492  CTGCCTCTTTAGCAAAAACTCTAAAACGTATAAAAACATTATACTACGTAATTGCACTCG
+ 77552  GCATTTATGAATTGGCTTCATAACAGTTCCTAGTGTAGCTAAAATCTTTTTTCTATCCTT
+ 77612  ACGTTTTTTAGTTTTCTTACTTTTATATTGAAAATTTTTCTTGCGCTTCTCTTCTTCCCT
+ 77672  TATTGGATAAAAAAATAAGAAATTTACGTATAATTTCCCTAAAACTCAAAATAAACGCAC
+ 77732  CGTACCCACTACTCTACGTTTTCGAGCTTTTGTTCAACGTGAAAAAAAAAATTAACATAC
+ 77792  TTACGAACAAAAAATCTCTTCATAAATATGCCTACAATACTCTAAAGATTTCTTAACAAA
+ 77852  TCAATTTGATTTGAATAAAAATTTTTGATAAATAACTTTATGTTTACCGTTTCGCTTTAT
+ 77912  ATACATGCAATTTTCGAGTAAACACAAAACATCCAAATTTATACCCAACCATTCCTGGAG
+ 77972  AAATACACAAATCAAAAAAATCTACATCCATTATAAATTTTAATCCTAACCTGAATAAAA
+ 78032  TCTGGCAAAATCATACTATCTTTACGTTTTAAAAAAAATAATTTTATTACTTTTGTTCTC
+ 78092  ACCATAAAAACTTGAATAAACTTTTTGCGTTATAAAAGGCCCTTTTCAAATTGCTCTCAT
+ 78152  ATTTATATATTATTTATTACCTAAGCAAGCATTTTACAAAATGCGATACAGCATTTCCAA
+ 78212  AAATACATATTAAAATTTCAATCATAAACCAAAAAAAGGGAGGGGGGGGGTAAGGCGTTT
+ 78272  CAACAATTCCAGCCTTCGCAGAACAAAAAACGCATCACATTTATCTGCGAAGCACTTTTA
+ 78332  TGGAACAGTGAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCGCTTAGGCAA
+ 78392  TTCTTTTAGAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAA
+ 78452  ATTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATT
+ 78512  TTAAGCGTAAACCAAGGGGGGGGTAAGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAA
+ 78572  AACGCATCACATTTATCTGCGAAGCACTTTTATGGAACAGTGAATAAATCACAGCAAAAC
+ 78632  GCATGGATTTAGGTGTAAAATCGCTTAGGCAATTCTTTTAGAATACAACCGCGTAAAATT
+ 78692  TCCTTAGCGATTCTGTGCGACGATGCATAAAAATTGGAGGGAATTTTGCACCTACGCCAG
+ 78752  AGCAAGCATTTCCAGAAATGCATATTAGAATTTTAAGCGTAAACCAAGGGGGGGGGGGTA
+ 78812  AGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAAAACGCATCACACTTATCTGCGAAG
+ 78872  AACTTTTATGGAACAGTGAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCGC
+ 78932  TTAGGCAATTCTTTTAGAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGAT
+ 78992  GCATAAAAATTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATA
+ 79052  TTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGGGGTAAGGCGTTTCAACAATTCCAG
+ 79112  CCTTCGCAGAACAAAAACGCATCACATTTATCTGCGAAGCACTTTTATGGAACAGTGAAT
+ 79172  AAATCACAGCAAAACGCATAGGTTTGGGTGTAAAATCGCTTAGGCAATTCTTTTAGAATA
+ 79232  CAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAAATTGGAGGGAATT
+ 79292  TTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATTTTAAGCGTAAATC
+ 79352  AAGGAGGGGGGGGTAAGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAAACGCATCAC
+ 79412  ACTTATCTGCGAAGAACTTTTATGGAACAGTGAATAAATCACAGCAAAACGCATAGATTT
+ 79472  AGGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCGTAAAATTTCCTTAGCGA
+ 79532  TTCTGTGCGACGATGCATAAAAATTGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATT
+ 79592  TCCAGAAATGCATATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGTAAGGCGTTTC
+ 79652  AACAATTCCAGCCTTCGCAGAACAAAAACGCATCACACTTATCTGCGAAGAACTTTTATG
+ 79712  GAACAGTGAATAAATCACAGCAAAACGCATAGATTTAGGTGTAAAATCGCTTAGGCAATT
+ 79772  CTTTTAGAACACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACGATGCATAAAAAT
+ 79832  TGGAGGGAATTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATGCATATTAGAATTTT
+ 79892  AAGCGTAAATCAAGGAGGGGGGTACGGCGTTTCAACAATTCCAGCCTTCGCAGAACAAAA
+ 79952  ACGCATCACACTTATCTGCGAAGCATTTTTATGGAACAATAAATAAATCACAGCAAAACG
+ 80012  CATGGATTTAGGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCGTAAAATTT
+ 80072  CCTTAGCGATTCTGTGCGACAATGCATAAAAATTAGAGGGAACTTTGCACCTACGCCAGA
+ 80132  GCAAGCATTTCCAGAAATACATATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGGG
+ 80192  GTAAACGTTTCGATAACTTCAGCCACCACAGAACAAAAACGCATCAAATTTATCTGCGAA
+ 80252  GCACTTTTACGGAACAATAAATAAATCACAGCAAAACGCATGGATTTAGGTGTAAAATCG
+ 80312  CTTAGGCAATTACTTTAAAATACAACCGCGTAAAATTTCCTTAGCGATTCTGTGCGACAA
+ 80372  TGCATAAAAATTAGAGGGAACTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATACAT
+ 80432  ATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGTAAACGTTTCGATAACTTCAGCCAC
+ 80492  CACAGAACAAAAACGCATCAAATTTATCTGCGAAGCACTTTTACGGAACAATAAATAAAT
+ 80552  CACAGCAAAACGCATAGATTTAGGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAAC
+ 80612  CGCGTAAAATTTCCTTAGCGATTCTGTGCGACAATGCATAAAAATTAGAGGGAACTTTGC
+ 80672  ACCTACGCCAGAGCAAGCATTTCCAGAAATACATATTAGAATTTTAAGCGTAAATCAAGG
+ 80732  AGGGGGGGTGGGGTAAACGTTTCGATAACTTCAGCCACCACAGAACAAAAACGCATCAAA
+ 80792  TTTATCTGCGAAGCACTTTTACGGAACAATAAGTAAATCACAGCAAAACGCATAGATTTA
+ 80852  GGTGTAAAATCGCTTAGGCAATTACTTTAAAATACAACCGCATAAAATTTCCTTAGCAAT
+ 80912  GACGCATAAAAATTAGAGGGAACTTTGCACCTACGCCAGAGCAAGCATTTCCAGAAATAC
+ 80972  ATATTAGAATTTTAAGCGTAAATCAAGGAGGGGGGGGGTGGGGTAAACGTTTCGATAACT
+ 81032  TCAGCCACCACAGAACAAAAGCACACTTTTAGGGGCAGCAA
+;     G-orf766 ==> start
+ 81073  ATGCCAGA
+;     G-orf760 ==> start
+ 81081  ATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGT
+ 81141  TATGCAAATCAAAA
+;     G-orf734 ==> start
+ 81155  ATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGT
+ 81215  TGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCC
+ 81275  CCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTAC
+ 81335  AGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGG
+ 81395  GCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATA
+ 81455  TATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGT
+ 81515  GCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGA
+ 81575  ATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGT
+ 81635  TATGCAAATCAAAAATGTGACAAGTG
+;     G-orf424 <== end
+ 81661  CTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCC
+ 81721  CAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCTCGGCACGGC
+ 81781  ATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGA
+ 81841  CGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCC
+ 81901  AGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAA
+ 81961  AGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGC
+ 82021  GAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAG
+ 82081  CTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAA
+ 82141  AAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGA
+ 82201  GTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCC
+ 82261  CCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTG
+;     G-orf315 <== end
+ 82320  CTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCC
+ 82380  CAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCTCGGCACGGC
+ 82440  ATATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGA
+ 82500  CGTGCCCATTTTGGCGAGAAAATTCATGGAATTGCCACCCTCCCCAGGGCAGCAAATGCC
+ 82560  AGAATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAA
+ 82620  AGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGC
+ 82680  GAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAG
+ 82740  CTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAGTTATGCAAATCAA
+ 82800  AAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGAGAAAATTCATGGA
+ 82860  GTTGCCACCCTCCCCAGGGCAGCAAATGCCAGAATGTATGAAAAGCTCGCGCGCGCCGCC
+ 82920  CCCCTCGGCACGGCAT
+;     G-orf424 <== start
+ 82936  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 82996  TGCCCATTTTGGCGAGAAAATTCATGGAGTTGCCACCCTCCCCAGGGCAGCAAATGCCAG
+ 83056  AATGTATGAAAAGCTCGCGCGCGCCGCCCCCCCTCGGCACGGCATATATACTCGAAAAAG
+ 83116  TTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACGTGCCCATTTTGGCGA
+ 83176  GAAAATTCATGGAGTTGCCACCCTCCCCAGTGCAGCAAATGCCAGAATGTATGAAAAGCT
+ 83236  CGCGCGCGCCGCCCCCCCCCTCGGCACGGCAT
+;     G-orf315 <== start
+ 83268  ATATACTCGAAAAAGTTATGCAAATCAAAAATGTGACAAGTGCTACAGATCTTTCAGACG
+ 83328  TGCCCATTTTGGCGAGAAAATTTGTACGTTAA
+;     G-orf734 ==> end
+ 83360  CTAG
+;     G-orf760 ==> end
+ 83364  TTTATGGTAA
+;     G-orf766 ==> end
+ 83374  TATATATAGTATAAACATTAATAATATTTATAATATATGTATACATTATACTTAATATAT
+ 83434  ATAGTATAAACATTAATAATATTTATAACATATGTATACATTATACTTAATATATATAGT
+ 83494  ATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATTATACTT
+ 83554  AATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTT
+ 83614  AACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATAT
+ 83674  ATGTATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAG
+ 83734  ACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATACATT
+ 83794  ATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATACA
+ 83854  TATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTA
+ 83914  TAATATATGTATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATA
+ 83974  GTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATA
+ 84034  AACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATACAT
+ 84094  TATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTATAC
+ 84154  ATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATG
+ 84214  TATAATGTATACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACA
+ 84274  TTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAAT
+ 84334  AATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATAT
+ 84394  TTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATA
+ 84454  ATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATA
+ 84514  TGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTAT
+ 84574  ACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATT
+ 84634  ATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACT
+ 84694  TAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATA
+ 84754  TATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATAT
+ 84814  AGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTAT
+ 84874  AGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACA
+ 84934  TTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAAT
+ 84994  AATATTTATAATATATGTATAATGTATACATATGTATACATTATACTTAATATATATAGT
+ 85054  ATAGACATTAATAATATTTATAATATATGTATAATGTATACATATGTTAACATATGTATA
+ 85114  CATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATAATGTA
+ 85174  TACATATGTTAACATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATA
+ 85234  TTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTAT
+ 85294  AATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATAT
+ 85354  ATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTA
+ 85414  TACATTATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACAT
+ 85474  TATACTTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATAC
+ 85534  TTAATATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAAT
+ 85594  ATATATAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATA
+ 85654  TAGTATAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTA
+ 85714  TAGACATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGAC
+ 85774  ATTAATAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAA
+ 85834  TAATATTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATA
+ 85894  TTTATAATATATGTATACATTATACTTAATATATATAGTATAGACATTAATAATATTTAT
+ 85954  AATATATGTATAATGTATCATA
+;     G-orf1493 <== end
+ 85976  TTACCATAAA
+;     G-orf1477 <== end
+ 85986  CTAG
+;     G-orf1510 <== end
+ 85990  TTAACGTACAAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCAC
+ 86050  ATTTTTGATTTGCATAACTTTTTCGAGTATAT
+;     G-orf1086 ==> start
+ 86082  ATGCCGTGCCGAGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCC
+ 86142  TGGGGAGGGTGGCAATTCC
+;     G-orf1225 ==> start
+ 86161  ATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTG
+ 86221  ATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTT
+ 86281  TTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCC
+ 86341  AAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTT
+ 86401  TCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGC
+ 86461  ATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCCAAAATGGGCACGTCT
+ 86521  GAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCC
+ 86581  GTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGG
+ 86641  GAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCA
+ 86701  CTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGC
+ 86761  GGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCC
+ 86821  ATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTG
+ 86881  ATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTT
+ 86941  TTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTTGCC
+ 87001  AAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTT
+ 87061  TCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGC
+ 87121  ATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCT
+ 87181  GAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCC
+ 87241  GTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGG
+ 87301  GAGGGTGGCAACTCCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCA
+ 87361  CTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGC
+ 87421  GGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCC
+ 87481  ATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTG
+ 87541  ATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTT
+ 87601  TTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCC
+ 87661  AAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTT
+ 87721  TCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGC
+ 87781  ATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCT
+ 87841  GAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCC
+ 87901  GTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGG
+ 87961  GAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCA
+ 88021  CTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGC
+ 88081  GGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCC
+ 88141  GTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTG
+ 88201  ATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTT
+ 88261  TTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTCGCC
+ 88321  AAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTT
+ 88381  TCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGC
+ 88441  ATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCT
+ 88501  GAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCC
+ 88561  GTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGG
+ 88621  GAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCA
+ 88681  CTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGC
+ 88741  GGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCC
+ 88801  ATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTG
+ 88861  ATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTT
+ 88921  TTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCGCC
+ 88981  AAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTT
+ 89041  TCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTGGC
+ 89101  ATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACGTCT
+ 89161  GAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATAT
+;     G-orf451 ==> start
+ 89216  ATGCCGTGCCGAGGGGGGGCGGCGCGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGC
+ 89276  TGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGA
+ 89336  TCTGTAG
+;     G-orf1086 ==> end
+ 89343  CACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGG
+ 89403  GCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACT
+ 89463  CCATGAATTTTCTTGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTT
+ 89523  TGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGC
+ 89583  TTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCG
+ 89643  CCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTT
+ 89703  TTTCGAGTATATATGCCGTGCCGAGGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCT
+ 89763  GGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCGTGAATTTTCTCGCCAAAATGGGCACG
+ 89823  TCTGAAAGATCTGTAG
+;     G-orf1225 ==> end
+ 89839  CACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGG
+ 89899  GCGGCGCGCGCGAGCTTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACT
+ 89959  CCATGAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTT
+ 90019  TGATTTGCATAACTTTTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGC
+ 90079  TTTTCATACATTCTGGCATTTGCTGCCCTGGGGAGGGTGGCAATTCCATGAATTTTCTCG
+ 90139  CCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTT
+ 90199  TTTCGAGTATATATGCCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACATTCTG
+ 90259  GCATTTGCTGCCCTGGGGAGGGTGGCAACTCCATGAATTTTCTCGCCAAAATGGGCACGT
+ 90319  CTGAAAGATCTGTAGCACTTGTCACATTTTTGATTTGCATAACTTTTTCGAGTATATATG
+ 90379  CCGTGCCGAGGGGGGGCGGCGCGCGCGAGCTTTTCATACAT
+;     G-orf1477 <== start
+ 90420  TCTGGCATTTGCTGCCCTGGGGAGGGTGGCAACTCCAT
+;     G-orf1493 <== start ;; mfannot: GTG upstream: 90508
+ 90458  GAATTTTCTCGCCAAAATGGGCACGTCTGAAAGATCTGTAGCACTTGTCACATTTTGATT
+ 90518  TGCAT
+;     G-orf1510 <== start
+ 90523  AACTTTTTCGAGTATACGTTATTATAAGTTATATTTAGAAATGATATAA
+;     G-orf451 ==> end
+ 90572  ATTTTCTAGCGGTGGTTAATAACGACCAACATTATATACATTTTTATTTTTGTTAATAGC
+ 90632  AGTTAAGTATTGGTTAGAAAATAATATAATTTTTAATTTTCTTTAA
+;     G-trnW(uca)_2 ==> start
+ 90678  AGGGAGATAGTTTAACGGTAAAATATCGATCT!TCA!ACATCGAGGTTATAGGTTCAAAT
+ 90736  CCTTTTCTCCCTG
+;     G-trnW(uca)_2 ==> end
+ 90749  AGATTTTTTAAGGT
+;     G-rps13_2 ==> start
+ 90763  ATGAAAACATCAATTCAATTTTTTAATTTACAGTTTTTGATTGAAAAAAAATTATTAATT
+ 90823  TCGTTAACGCAAATTTTTGGCATTGGTTTTTACTCTGCTATAGTAATTTGCAAAAAATTT
+ 90883  GGTTTTAATAAAAATACATATATTAAGAGTGTGGATGTAAGGATTGTAAATGCAATGCGT
+ 90943  AACTTTATTTTGGATAAATTTGTTGTTCAAGAACAACTGAAAGAGCAGATTCAGGTATCT
+ 91003  ATAGTAGAGTTGGACACTATAAAGAGTATTAGAGGGTTTCGGCATAAATTGTGTTTACCT
+ 91063  GTTCATGGACAGCGAACTAAAACTAATCGGCGTACTCAACGTAAATTTAAAAGAATGCAG
+ 91123  AGTAAATT
+;     G-rps11 ==> start
+ 91131  ATGGGAAGAGGATTCAACACATATTCGTTAA
+;     G-rps13_2 ==> end
+ 91162  AACATAAATTTCAGTTTAGAAAATTACGTAGAACCCTTTTATCCTTTCGAAAGAGATCTT
+ 91222  GTATTCTAAATATTAAAATTACATTGAATAACATATATTTAACTTTATCTGATTGATTTG
+ 91282  GTCAAATTATTATGGTGAAATCTGGTGGGTTATTAAAATTGCCGGGTTCCGGTAGAAATA
+ 91342  CGAATTATGCCTTAGAGCTTTTAATATTAGATGCTATTAAGCAATTAACTTTGTTAAATA
+ 91402  CAAAACATATTGTTTTAAAGTTTGATCATCGTGTTTTAAGGAAAAAGAAAATGATTTTAA
+ 91462  AGTTATTAAAAAAATTTAATATTAAAATTTTTCTTATACGATTAATTATGTGTAAAGTTC
+ 91522  ATAATGGAATTACATTAGCTAAAAAACGGCGGGTTTAA
+;     G-rps11 ==> end
+ 91560  GTTATC
+;     G-rps14_2 ==> start
+ 91566  ATGTTGCGTAAGGTTATTTTTGAGTCAAATACCAGATATACATTTAAGTATTTTGAGATT
+ 91626  AAACAAAGAATTATAAAATCGTTATCAAAAAATTTATACTTGCCTATATTAGTTCGACGT
+ 91686  AAATTGTTGTGGCAATTAGATAAATTATCTTTATTATCATCTTTAATTTATGTAAAAAAT
+ 91746  CGATGTGTTGTTTCTGGTCGTGCTAAATCGATTTATAAATTTTTTAATTTATCTAGAATT
+ 91806  GTTATAAAAAAATTTTTTAGATTAGGTTATATACCTGGTTTAAATAGATCAAGTTGGTAA
+;     G-rps14_2 ==> end
+ 91866  TTTAGTAATATAAAATAAAAGTTTATTG
+;     G-rps8_2 ==> start
+ 91894  ATGGTTAAATTAGGACAATTTATTTCAATTTTAAATTTTAATATTAAAGCAGGAAAGTCT
+ 91954  TTTTTTGTAATAGTTAAAACAAGGATAATTTTGGATATTGTAAAAATCTTGATTGAGCAA
+ 92014  AATTACATTCTTGGTTATACGGATTTAAAAGAAAATGGTGATAAAATTATTGTGTTTTTT
+ 92074  AAGTTAGATTTTGCGAAAAGTAATAGCCTTTTACTTAAGGGATGTAAATTTGCATTATAT
+ 92134  AAAAATAGATTTACAAGTATTGGTGCCAATAATATAGTGAATAACTCGTCGTTGGTACTT
+ 92194  GTGTCTACTGTGAAGGGCGTTATGACTCAGTTGGAGGCTAAAAAACTTCGACTTGGGGGT
+ 92254  ATTATCTTGTGTTATATAATATAA
+;     G-rps8_2 ==> end
+ 92278  AATTGTATAAAAAAATA
+;     G-rpl6_2 ==> start
+ 92295  ATGAGAGCTAAATTTATTTATCAAATTTTTAATAGGTTGTTTATCTATATATTTCAACAC
+ 92355  AATAAATTACTGTATATTCGAGGCCCTCTGGGTTTACTACGCTATAATGTTCCCAGTGGC
+ 92415  ATTGATATTTGTAAATATCGGTCAATGGTGTATATTTCTGGACAAAAAGCTGCCCACCCT
+ 92475  TTAGTTGCAATGTCACATAGAATAGTTTGCCAGAAAATGAAAGGGCTTGAGGTTGGTTTT
+ 92535  TCTGAAATTATGATAATTGCTGGTATGGGTTGGCGCGTTGATAAAGAAGACGTTTTATTA
+ 92595  AAATTTACAATTGGTTATAGTCATATTGTACATTATCTGATTCCGAATGATATTGAAATT
+ 92655  GTTTTACTTAGTAAAAATCTTTTTAAGATTTTTGGTTCTGATTTGAGTCGAATTCAGTGC
+ 92715  ATTGCGTCCGAATTGTGCAAACTGCGTTCATCTGATGTGTATAAAGGTAAAGGAATTCGT
+ 92775  CGTCAAGCTTTTAAAGTAGTTTTAAAATCAAGTACTAAATCGAAAGTTTAA
+;     G-rpl6_2 ==> end
+ 92826  TTTATGAAGAAAGTAAGCAGTGTTTTTATATTTTATTGTTTTTTAAATT
+;     G-rps12_2 ==> start
+ 92875  ATGGTTACAATTAATCAATTAATTCGATTAAGTCATCCTACTAAAAATAGGAAAAATACG
+ 92935  GTGCCCGCTTTAGACAGTAGTCCATACAAGAAGGGTGTTTGTTTAAGAGTATTTACGATG
+ 92995  ACTCCTAAAAAACCAAATTCCGCATTACGAAAAGTTGCCCGCATTAGATTATCAAATGGA
+ 93055  TATAAAATAACGGCGCATATCCCTGGTGAAGGTCACAATTTACAAGAATATTCGATTGTA
+ 93115  TTAGTACGTGGTGGGCGTGCTCGCGATTTGCCTAGTGTTCGATATAAAGTTGTTAGAGGT
+ 93175  AAATACGATTTAGAACCTGTACGTAATAGGAGAACTCGGCGATCTAAAT
+;     G-rps7 ==> start
+ 93224  ATGGTATTAAAAAAATATAA
+;     G-rps12_2 ==> end
+ 93244  AAATTGATTGATTGCGTCTGAGAGTATACATAAGTTTATTTGTGGGTTAATGTTGAATGG
+ 93304  AAAAGTGTCTCAATTAGAAAAAATTGTTTTTTTTTGTTTTCGAGACTTAAAATATAAGTT
+ 93364  TAATATGGATTCGTTGTCTCTGTTTTTATATGTTGTAGACGAGATAATGCCTTATATAGA
+ 93424  GCTTCGTACGTTAAGGTTAGGGAGTGTTTTTTATCGAATACCAAAGCCTCTTTCGGAAAG
+ 93484  TAAGCAGTTAAATTGTGGCATTAAGCTGTTAGCCAAAACTGTTAAAATTACTTGTGTACG
+ 93544  CAATGTAGCGGCTGCTATAAAAATACAACAGGAAATTTTAGCTGTTCTTCAAAAGAAAAG
+ 93604  TTTACTTTTTAAGCAAAATAGAAATCTGTATCAAGTTGCGTCCAACAACAGATCGTTTGC
+ 93664  ACATTACCGGTGGGATTAG
+;     G-rps7 ==> end
+ 93683  TTTTTGACGAATAGCGTGTCATATTGTGTAGTACAAT
+;     G-trnP(ugg)_2 ==> start
+ 93720  CGGAATATAGCATAATGGTAATGTATCTGATT!TGG!GATCAGATGAGTATAGGTTCGAG
+ 93778  TCCTATTATTCCGA
+;     G-trnP(ugg)_2 ==> end
+ 93792  AGTAAGGTATTTATTATAATTAGAAGTGTAT
+;     G-rps4 ==> start
+ 93823  ATGAAGCGAATTAAATATTTTAAATTTAAGTTTAGGGATATTTCAAAGGAAATTTATTTA
+ 93883  AGAAAGTCATATTTTAGATTGTTAAAAACAAAGCATATTTTAAGATTTTTTATTGGTGGA
+ 93943  TTAAAACAACGACAATTAGCACGGATTTACAAAATTATTTATTCTAAACGGTTGTTTTTA
+ 94003  ACTTTTCTTACGAAATTAGAATATCGTATTGAATTTATCTTGATAAAAGCCGGGTTTGTT
+ 94063  TTAACCGGAAAACAGGCTAGGCAATTAATTTCGCATAAGCATGTTATTGTGAATGGACAG
+ 94123  CGGACTCAATTTTGCAATTTGCATATAAAAACATTTGATATTATATCTCTAGAATCAGTA
+ 94183  GTATTTTCAAAGTATAAACGCAAACTAGTATCAAGTTTTTTTAAAACTCCAGGTTTTTTT
+ 94243  GGTTATTTACGGCGACGTGGTATAAAAAAGAAACTTACAGTTCAACGTATGTTTATTTAT
+ 94303  GCTAAATTTCAATTTTTTTGCGAAACTAATTATAAAATCTTTACGATGGTTTTTGTGCGA
+ 94363  AAGCTTAATTTGCATAAGATTTCTTCGTCTCAAGTTCTTTTAATGTATGGGTGGTGACGA
+ 94423  ATACGTTTTTTATTTTAA
+;     G-rps4 ==> end
+ 94441  AAAACGTTTTGTGATTAGTTAAATTTTTATTATAATTATTTTTAGGTTTATAATGAGTTT
+ 94501  AATTTTAGGTGATGTGTGTTTAACAATAGTGTGCTTAGTTTAATTATTTTTTTACCGTTG
+ 94561  TTTAGTAGTTTTTGTTCTGGATTGTTTGGTTGTTGGATTGGAGCTAAAGGTGTTGCTTTT
+ 94621  ATAACTTTTTTATCTCTACTAGGGTCATTAATTTTAACTTGTAATTATTTAAGTTTTATA
+ 94681  AGTTTTTATTTGGTTTCAAATTATGTATCCGTATTATCTTGGATGAAATTAGGTTCATTT
+ 94741  TATGTGACATGGTCATTTTGTTTTGATAGTTTATCGTCGTTAATGGCGGTTTTAGTTACT
+ 94801  GTTGTTAGTTGTTTAGTTTATCTATATGTGCGTCCTAGGTTATTTTTATTTTATAAAATT
+ 94861  TATAAAGTAAAGTTGGAAATTTAAGTGAGTTGAAAAAGAGGATTCATAAAAAGTACATTA
+ 94921  AAGGATTTTTCATTATTCTTTATGAAGCAGGTAATGCTCAACTAGTTAAGTTTGAAAAGT
+ 94981  AGAATTAGCAATTACAAATATGTATTTAACGCGTATGCTATGGGATTTATTTAATTTTTC
+ 95041  TACTATTTGTGTATTTGTATAAGTACACAAATAATTTAGCTTTAGTGTAAAATATATCTA
+ 95101  CTTAATTAAATGTGTGAGGTGAGTGTTAGTAATGGGTAAGCTACGTTAATACTGGTTTTT
+ 95161  TCGTAGATATAGGATAATATTAATCAATTACATAAATTTAATTGTAAAATTTTTTTTAAT
+ 95221  AAAATTTATGAACTGAGCAAATGTAGTATAGAAG
+;     G-orf465 ==> start
+ 95255  ATGAAAAAATTTAATTTTATGCCATCAGTTTCTGTTTTTTTTCAATTTAGTTCGAATTTT
+ 95315  TTATTTATCTTTAATTTTTGTTTTTCTTGGGAAGTTGTGAATTGGAATTCGATTAATAGG
+ 95375  CATCTGTATAAATATCAAAGAGTAATATTTATTAACGTTAAGACTAGGTGGGGTTTGTCT
+ 95435  ATATGTGATCATTGGGAATTAAAATTAGATGTGTTTTGGTTTCAAATTAAATGTTTTGGT
+ 95495  TCATTTAGTTTTAATTTGGTTTGTATTAGAATTTATTTCTTAGATTTGATCTTCAAATTT
+ 95555  TTTTCTTATAGTAAAATTGGTTGATTTTTAGATCGAGGTATACGTTTAAGCTGTTTCGGA
+ 95615  GTAGAAGGTTATATTCGTAATTATATTTTATTAGTAAATTTTGATTTGAATAATATTCAA
+ 95675  GAAAAGCATGATTTTTTATGATTATCTAAACGATTAATTAGTTTAGTATGAGAGCCTGAG
+ 95735  TGTGTGGCAAGATATTTGTTTAACTTTTGTGGTTTTGTAGGAACTCGTAATGTGTATTTT
+ 95795  ATTTTGCGAATAGTTTACCAAACTGCTTTATGCGGAATTAAATTTGTTTTTACAATTAAT
+ 95855  TTGTTAAAATTATTTAAATTTATAACAGTCAAACATTTATTTTTTTGATTGGGTTTTCCT
+ 95915  GTTTATATTTGTAAATGATATGAATACGATAAAAAAATAATATTAAGCAATCTTTCTGAT
+ 95975  TTTCTTAGACAGGATACAGAAAAACACTTGTTATTTATATTAATTAATAAATTTTTTTGT
+ 96035  GAATTAGAAAACCACTTAAGGAGATTTTATGTTTTTTTATTAGGTAATATTTTGCCATTT
+ 96095  TTTTCTGAAAATTTTATAGCGAACTTGGTAGTTTTATGTTATTTAGGAGATCTAATAATT
+ 96155  ATACATAGGGATAATTTAATCGTTGATTTATTAAGGTTAGAGTTTTTTTACAAATTAACT
+ 96215  ACGCTTGGAGTCGATGTAGTAGACGAACAAACGTTTGGTTTATCAAATTGTATTGATACA
+ 96275  ACTAAAGGGTTTAATTTTATAGGTTTTTATATTCGATTTAATAATGCGTTTTTATTTGGT
+ 96335  GTATACCAAACTAAAAATTGTATAGTTTTGCAACCTTCTGTTGGTTCTATTAAAGGGCTT
+ 96395  TTAACCGGTATTCAACGTTTTTTGAAAAATAATAATTGTGAACAAGTAGTATTAACTAAT
+ 96455  ATATTTTATATTATGCGAAGATGGTTTTGTTATTATTTTCCATTCATTAGGTTATGCCGT
+ 96515  AGATTAATTGTTTCGTTAATTTTTTGTTTACGTCTTAAATTATTTTATTGGTTGTTTAGA
+ 96575  AAATATGGTCGAATGGGTAAAAAGTACATCTATAAGCAATATATAAAATTTTTATTTAGC
+ 96635  CAATTTAAATTTTGGTAA
+;     G-orf465 ==> end
+ 96653  AAAAATAAAAATTATTTTTTTTTTTGCTATTGCAGTTTAGTATAAATTTTATTTTTAAAA
+ 96713  AT
+;; mfannot:     /group=II
+ 96715  gagctgtaagatgaaaaattatcgtgtacagttcagaagtaggg
+;; mfannot:
+ 96759  ATTTGTTTATTAAATTAAATCTACTATAACTCATTGGCATACATGTTAGAAGATCCTCAT
+ 96819  ATAGTTAGGTTTTCATGTTATATTTCGTTATGTGTGGCTTGTTAGGTGATAGGTATAGAT
+ 96879  AATTTTGGAACTATACATTAATTTATTAAATTTAAAAATTATAGTCATAGCTGATTTTAT
+ 96939  AAGCTTAGATTGAGCATAATTGTAGAATTTGTTAAAATCTATTCTATAATACATATATAG
+ 96999  ATATAATGCATAATTACATATTTTCTTCGTTACCGTACGGTATGGGTGGTAAATTTTTTA
+ 97059  TAAATAGCGATAAAAAAAATCGGTCAATAGGACTTATGTTAAATTTGTAGTTATATGGGT
+ 97119  AGGTTGTTTGGTTAAGTGTAAATAAATGTATGGAACAAGTTTTTATGGCTAAGTTTTAGT
+ 97179  GTATAAGAAGGATAGGATGAAAAATTAAGTTTGATATTTTTCCATAAATTCAAAAGTATA
+ 97239  AATTTTATGTTATTATGCTAATTATGAATTTACTGGTAAG
+;; mfannot:     /group=II
+ 97279  aagccgtatgattttgaaaatcatgtacggttttgaattagagg
+;; mfannot:
+ 97323  TTTAATTGATCGACTAAAACTTACATTTTTCAGTGCGGCACGTAAAAATTTTGTTATTAA
+ 97383  AATTTAAAAGAAATTAATACGAATTGTAATTGTGTTTTGATTTTTGTGTAAGTTCTACAG
+ 97443  TTATCGGATTTTATTATTTATTAATTTTTTAATTTTAAGCTGATTATTTGTATCTTAATG
+ 97503  GTTGTAATTTAAATATATAGGGATACTGGTTTTAAATGGTATACTTAAAGTATTTAATCT
+ 97563  ATTAAGTAGTTTCTAGTTTATTATAAATAAACTTTAGATAGTTTTGCTAAATTTTATTTT
+ 97623  GAAAAGTAAAGTTTAGTTTTAACTATTTATATAAAATGCTAATCCTATTTTATATTTTTG
+ 97683  CGTAATGCGCGAAACGTGTTATGGTGTAAGTAATAAAAAATTTTTTCATCTAGGTTAAAT
+ 97743  AAGATATGGTATATATCTTTTATATAGGATAAACTTAGTTTTATTTATTTTTAAATTTAA
+ 97803  TAAGCTTTAATACAGTTAAGAATTTAGAAAAT
+;; mfannot:     /group=II
+ 97835  gagccgtatgctaacaaattagc
+;     G-nad5 ==> start
+;     G-nad5-E1 ==> start
+ 97858  atgtacggttttgagtcag
+;; mfannot:
+ 97877  AAATTTCAGCCAATTTTATTGAAGTTTTATGA
+;; mfannot: G-nad5 ==> start Def by similarity
+ 97909  ATACTGTTAGTGTTAGTATCATCCGAAAATTTTGTTCAATTGTTTTTTGGATGGGAAGGT
+ 97969  GTAGGATTATGTTCTTACTTATTAATAAATTTTTGATATATTCGATTACAAGCTAATAAA
+ 98029  GCGGCTATTCAAGCTTTAATGGTTAATAAAATAGGTGATATTGGGGTATTATTAGGTATT
+ 98089  TGTTCTATTTTTTCATTGTATCGTTCAGTTGAATTTAGTATTATTTTTGCGTTAACTTCT
+ 98149  TATATGCAAGGTGAATCATTTATTTTATCAATTTTTAATGTTAATGGTTTATTAATGATT
+ 98209  GGTTTATTTTTGTTTGTTGGAGTTGTTGGAAAATCCGCGCAATTAGGGTTACATACATGA
+ 98269  TTACCATCTGCAATGGAGGGACCTACTCCAGTGTCTGCATTAATACATGCTGCAACAATG
+ 98329  GT
+;     G-nad5-E1 ==> end
+;     G-nad5-I1 ==> start /group=II(derived)
+ 98331  gtatgaaatgctaacaattccagtttaggttattttaaattgtatacatttgttaatggc
+ 98391  aatttttatttttttgcgatgattttggataagtatttgtcgaatttaagattaattttc
+ 98451  ttgaataacatttttttagttggataaggttatacgaattttttaatgataacggtgttt
+ 98511  ataatttgttattgtatttgtgagatacataaggtgttggcgattcatgattttttttga
+ 98571  taaattaaagcctaataaaatataatatgctgaaatttaattatgcggtcatgatcttta
+ 98631  tattgttaatgtgatacagaatatccagtgaaatatattaatagtctgtgcctagataga
+ 98691  tctaaacgttatttttgtgtaatgtaggttaggaaatatgttagtggaataattgtgaaa
+ 98751  ttaattcagaaataatatttaatttaaataattaagtattattgaatatttttgtaaatt
+ 98811  cataggaaatatttaaaaattgcttgttaattgagtatgtttaaaattttgatagtttta
+ 98871  gctgatattatttaatagagtattattaattggtattttaatggattttctgaaatttta
+ 98931  aagctgaatgcaaagtaatttgctcgttcagtttaatgagggtttaaccgtaaagtctta
+ 98991  aactacctttat
+;     G-nad5-I1 ==> end
+;     G-nad5-E2 ==> start
+ 99003  AACTGCTGGTATATTTCTTATTATTAGGTGTTCAGAATTTTTTGAATATGTGGATTTTAT
+ 99063  TTTAGTTTGTCTAGTGTTATTAGGGGCGTTAACTGCTTTTTTTGCAGCAACTGTTGGTTT
+ 99123  ATTTCAAAATGATCTTAAACGTGTTATTGCGTACTCTACGTGTTCACAACTGGGTTATAT
+ 99183  GGCGTTTTCTTGTGGATTATCTGCTTATTCTGTAGCGTTTTTTCATTTAGTCAATCATGG
+ 99243  GT
+;     G-nad5-E2 ==> end
+;     G-nad5-I2 ==> start ;; mfannot: no intron type identified
+ 99245  ttgggcgaagttctagtttaattaaaatataattttttataaatttgatgcaaaaaaata
+ 99305  ttaggataatttaaaacttagaagttgagtgaaattacgaatatatagttagtagataat
+ 99365  aaaatttctatatggttatgaacccatattctatagagtatatttaataatttttttatt
+ 99425  tgatgcataatgataataatacaagagctaatttttcttttaaatattaaactgaagatt
+ 99485  tatttattaatttataattttattaaaagtagtttataataaaattggaataaaaagatt
+ 99545  tctatttgagagtaatgcatagaatggtttaaatgatacttttttagaattaaagttgaa
+ 99605  aaaaatttgatttattttatcgtaatggagttataagcatacgggtacgtgttttaaatt
+ 99665  atgatttaacaaataaatgaaaaatacttgtggaagcttagtgtattgagatatactagt
+ 99725  tgagtttcaaagggggatttaataagtaaggattccatccaa
+;     G-nad5-I2 ==> end
+;     G-nad5-E3 ==> start
+ 99767  ATTTTAAGGCTTTGCTATTTTTAAGCGCAGGGTCAGTGATTCATGGTTTTTCTGATGAGC
+ 99827  AGGATTTACGCCGTATGGGTGGATTAGGTAAGGTTTATCCTTTAACGTATTGTAGTATAT
+ 99887  TAATTGGATCGTTTGCTTTAATGGGTTTTCCATTTTTATCTGGTTTTTATTCTAAAGATT
+ 99947  TAATTTTAGAAATTACTTTTATTCAACATACTGTTGCTAGTTTTTTTGTTTATTGTTTAG
+100007  GAGTGTTTTCCGCATTTTTTACTGCATTTTATTCTTTTCGTGTTATTTATTTAACTTTTA
+100067  TTGTTCCAACAAATAGTACCCGGCAATTTATATTACGTATTCATGAATCTCCGCTATTAA
+100127  TTATAATACCTTTATGTATTTTGAGTATGGGAAGTGTATTTAGTGGTTTTTTACTTAAAG
+100187  ATATGTTTATAGGTTTAGGTTCAGTATTTCTAGGAAATTCTATTTTTAGAATGGCCGGTA
+100247  GATTTGATTTAATAGAAGCAGAAATTTTACCTGTAGAAGTTAAATTGGTACCTTTAATTG
+100307  TTAGTTTAGGTGGAGTTTTAGCTGTTATATGTATAAATTATGTCTATAGGCAAACTGCAT
+100367  TTTACTTAAAGATTAGTAATAAGTACCTTATGAAGTATTATTCATTTTTTAATCAAAAAT
+100427  GGTACATTGATGGTATATATAATGTTTATTGTATAAAGCATTTTTTTAACTTTGGGTATT
+100487  TGGTGCCTTTTCAGATGCTAGATAAAGGCTTTATTGAGTTAGTTGGACCATTTGGTGTAT
+100547  CTTCTAAATTTAATATAATTTCAAGAAAAATAAGTGAATTTCAAACTGGATTAATATATC
+100607  ATTATACATTTGTTATTTCAGTTGGTGTACTTGTTTATATCAATATATTATCAATTTTTA
+100667  ATGCAGTTTCAGTATTTATTGAATTAGAAGGTATACTGGTATATATTTTTATTTCATATA
+100727  TTATACTGTTATAA
+;     G-nad5-E3 ==> end
+;     G-nad5 ==> end
+100741  ATTTAGTAGTGAATG
+;;     G-nad6 ==> start
+100756  ATGTTAGTTTTTTTT
+;;     G-nad6 ==> start ;; 4,70
+100771  CAATTTTTCTTTTATTTGTTTTCGAGCGTTGCTAGTATTTCAGCGGTGATGGTAATCCTA
+100831  AGTACTAACGCAATCTATTCAGGTTTATTTTTGATTTGAGTTTTTTTTAACTCAGCTTTG
+100891  TTGTTATTACTTTTAGATTTGGAGTATTTAGCTATAATTTTTATTATAGTTTATGTTGGT
+100951  GCAGTTATGGTTCTTTTTTTA
+;;     G-nad6 ==> end
+100972  TTGTGCGGATACGAGACATTATGGAAAAATTGTTATAAAGTGTATTATGAAAAATTGACT
+101032  AAAATTTAATATTTTAAGGAAAAGCGTTAGATGTCAAATATTCTAATTAAAATCAAGATC
+101092  TAAATTTATATGCAATGTACTTTTAAATTTGAATTGTAAAAGTTTTTATATTTTGTTTTG
+101152  TTTGATATGTTTTAATGTTTGACTAAAGTAATGTTGGGTAAGCATGAAAAGTTTTGTGTT
+101212  ATGTTAAAATTATAGGTTTAGTCTTGCATTTAATATGAAAAAATGTAGTATAGCTAAGAA
+101272  CTACTTATATTGAACAGATTAGAAAAATTTAAAAGGGATTTTATTTAGGAATAAGTTTTT
+101332  ATAAATGTTTAATGTTATCGTGTTGATTTGGGAAAAATTTTTTTCAAAGAAGAAGGTAAA
+101392  ATCATAATTTTTGTAAAAAAGAAT
+;     G-orf688 ==> start
+101416  ATGTTAGTTAATTACAGATACTTTAAAAAGTGAATTAAGATTTTTAGAAATGAAATTTCT
+101476  TTAAATTTTTTATGAAAATTTTTAGGGTTTGTACCTCTGAATATATTAAAAACTCAGATA
+101536  CAGTTTACAAATTATATCAGTTATGACTCTATTGATGCTAAAGCTGATTTAGTTATAGCA
+101596  TTGAGTTGTCTTAAGAAAATTAATGGGAAGTTTAGAAAATTATTTATTCGTTTTATTATT
+101656  GACCCTGAGCTACTGTGGTTAGCCTATATTAATTTAGTGATAGTTGGAGTGAAATGAATT
+101716  TTTAGAAAAACGCGCAAGTTTTTACTATATAGTTTAAGTTGTAAATTTTATTATTTTGAT
+101776  AACTTAAGATTTCTTTTAAGAAAATTAAATATTTATAATGAGAAATTATATTCTGATGAT
+101836  AGGCTTCAAATTACATTGATACAAGAAAGCATTCGGCTATTATTTACAGTTATAATTGGA
+101896  GATTACGTATATTATTTTGGAGGTAATACTTTATTTAAAGTTAAAGATGTAGAAGGTATT
+101956  GGATTTGTATTTGAATATATTCGACGAAATTGTGGTTCTATGCGTTGATTTATTGAGTTT
+102016  GTTTTAAAGAAAAAAAAAATTACCCTGGATATTTTAGTTTTTTTGCAACGTTTACTTAGT
+102076  CTCTATGTAGATGATAGTCAATTTGTAGGTTTTTTATTTAAATTGCTTAAAAACAATATA
+102136  CAAACTGTTAAACTAATTGATAGCTATCAAGTGTTAAAGATCGATTTGTTAGATTGGTTA
+102196  TTATCGAAATTTTATTTTTTAACTTTAGATAATTTTGTTGAGAAATTATTTGTAAAATGC
+102256  ACTAATACTAATTTTTGTATTTTGAATAGTGCACCGCAATACAATTTAATATTTAAATTT
+102316  CCTTTTTTGAATGTAAAAGTTTTTCCAAAATGTAGTAATGGTTGTGTATTATCTTTAAAA
+102376  TATATTCGGTATGGATCTAATTTTTTAATAGGTGTTAGTGATACTTCTAAAAATATTGTG
+102436  GATTGAATTAGTAATATAATACTTAATTATATAAATTCTTTTTTGTATCTTGACCAAATT
+102496  TTAGTTGTAAAAAAAGTTATAATCAATAATTCTCTAATGATTAGACTTTTTGGGATGCGT
+102556  TTTGAAAAATGTCGATATAAAGATTTTGTTAAGAAAGTTAAAATGTATTCTTTGAAAAAT
+102616  AAAATGAATTTAATTTTTTCTAAAATACATTATTTGCGTTATAATCTTGATATAGGGGAT
+102676  AATAAATTTAGATGTGAGCAACATAATATATTTTGTTGAGAGTATAAACAAATAGCATCA
+102736  TATATGTTTCCTGTAAGCATAAGAAAAAAGATTTCTTTTTTTTGTATATATGTGTATTTA
+102796  CGTGATAAAATAAATGATTTTAGTGTAATATTTTGAAATATTAGTGCGTTAAATTACTTT
+102856  GTAATTAATTATGTACCAAGGAAATTGCAAATTCCGTATCGTGTTTTGCTTAGACTGATT
+102916  AAAAATATAACCTGCTCTAGGTATACTGGATTAATAATTAATTACTGTGTTATGTTAAGT
+102976  TATTTATTATGGTTGCAAGAATATAGTAGCGGCGTCTTATCAAGAAATTTAATGCTATAT
+103036  TATTATCCTAATTTTGAAAATCAGTTTATTGTATCTAAAAAGTTTACTGTTCGGTTACTT
+103096  GTAGATACTGCTTTAGTATCGCGTTGGTTATATAGTGTTGGTATTATTAATAAATTTGGT
+103156  TGCCCGCTAGTTAAACGTAAATTGATTTTGTTAGAAGATTTTATCATTGTATTGTATTAT
+103216  CGGAAGTTAGCTTTTAAACTAATAAGATATTATTTATATGCTAATGATTGAGTTAAATTA
+103276  TATAGTATTTTATTTAAGTTAAAAATTTCGTTGATGAAGACTTTAGGGGTTAAATATAAA
+103336  TTGAATATGAATGTAATTAAGCAAATTTATGGAGATTCGATCTATTGTTCATCTTTAGAT
+103396  GGGAAGTTTATATCTTATTTTTTTAAAACAGATTTATATTTATATAAGCGCAAATTTTTG
+103456  ATAAATTTTTTCAGATGAAAACAGTAG
+;     G-orf688 ==> end
+103483  AATATAATGTATAAAATGGTAGTTTAAGAGAGTTAGAGAGAGAGTTAAATTTTTTGTTAG
+103543  TAATTTTCTGAGGAAGTAATATATTG
+;; mfannot:     /group=II
+103569  gagctgt
+;     G-orf132 ==> start
+103576  atgataaaaaattatcatgtacagttttggatgggag
+;; mfannot:
+103613  TTAATTGCTTACCTAAAATC
+;;     G-nad6 ==> start ;; 72,199
+103633  ATTGTAATGATGTTAGACGTTAAATATCAATCTATTAATCTTGAAATGGGTTATTATCAT
+103693  ATTATTGGAGGAATTGTGTTATTATGTTTAATGGTAAAATTTGTAAATATTTTAGTAAAT
+103753  GAATTAATTTTCGAACATGGTTATTTGATGGGGATAAGTGTAGATTATCTAAATTGGTTT
+103813  GATTTAATTGTAGAAGTTGTAAATATTCGTAATATTGGGTTGCATTTGTATAATTATTTT
+103873  TTTATTCCTTTTATAAGTGCTGGGTTAATTCTTTTAGTAGCTATGATTGGGGCTATAAGT
+103933  TTAGTTTTGCCTTCTGAGACCTCGAAC
+;;     G-nad6 ==> end
+103960  TTGAAAAGTTATTAG
+;;     G-nad6 ==> end
+;     G-orf132 ==> end
+103975  TTTTAATTATTAAATAACATAGAAATTGGGAGATTATTATCTTTATTTGTAGAGAGTACT
+104035  GGTACTTATAGTATTTTAAAC
+;     G-trnR(ucu) ==> start
+104056  GCATCTTTAGCTTAATTGGAAAAGCATTGATTT!TCT!AAATCAATAAATATAGGTTCGA
+104114  GTCCTATAAGATGTA
+;     G-trnR(ucu) ==> end
+104129  GTGCAATTTAATTTCAGAATGTAT
+;     G-nad3 <== end
+104153  TTATCATTCTAATGCACCGCGCCCTCATTCGTAGTAAAAACCAATAGTTAATAAGAATAA
+104213  AAATAAGATCAT
+;     G-nad3 <== start
+104225  ATGATAGAATACAGAATAAGAGTTTAAAGTAAGTGCCCATGGAAATAGAAATATGATTTC
+104285  TAGATCAAATATTATGAATAAGATAGCTATGAGATAAAATTGTACAGAAAATGTATGCCT
+104345  TGCATCTCCAAAAGAGTG
+;; mfannot: G-nad3 <== start Def by similarity
+104363  GATAGGTTTTATAAATTTATAATTCTATTT
+;; mfannot:
+104393  accttcctagaaacttaacaagttatttttaaacattaagctt
+;; mfannot:     /group=II(derived)
+104436  TGCATAGAGAAATTTCAAATTGAGGTGTAGAGATTTTAAAATATTAAGGTAGATTTTAAT
+104496  TTTTAGTAAATTTTTGAAATTTAGTATGTTTTTATTTATTTTAATATAAACAATATTTTG
+104556  TGTACGTTAAAAGTATAAAAAATCCATGAATTCATATAACCACTCATATGAATTTGCGAT
+104616  TCTATCTTCTTAGAAAAAGATGATTTAAATTTTTATTTTTACTTTTAAAAAAATTAATTT
+104676  ATAGTTTTGTTAATTATTGGTGTTATTTATTTGACTAAAATAAATAAATAAAAATATTGT
+104736  TTTATTACATATTATATTTTGGTGTATTAGATACATGTTGTATATTTTAGAGGTTATTGT
+104796  AGAAATTGATGATTAGTTTTTTTTAAATTATGCTATAAAAAGATAGAAAACTGCGACACG
+104856  GTCAAAACCACATTCGTATGCTGAGTATTTGTCGAAATTACTTTTACGTTCACCAGCTTG
+104916  AATAGTTAAAAATATAAGAAGTATAGTTATTAATGAATTTATAATTATAAATAATAAAAG
+104976  TGAGTAATATTCAATAAAATTTGTAGGAATTAAAAAATAAGACATGATTTTAAC
+;     G-atp6 <== end
+;     G-atp6-E2 <== end
+105030  TTAATGCGATAAATTGATTGCGTCGTTTAAATATAAGCATATTAGTATTGTAAACACGTA
+105090  AGCTTGTAAAAGTGCTATTCCTAATTCTAGGACAGTTACAGCAATTATTATTAATAGTGG
+105150  GAACATAGCTAAAATATATCATAATCCTCCAAAATTTATCATAGATCATGTAAAGCCAGC
+105210  TATAATTTTTAACAAGGTATGGCCTGACATTATATTGGCAAAAAGTCGAATTGAAAGACT
+105270  AAATACTCTGGTGATGTAGGATATTAATTCAATTGGAACAAGAAGGGGTATAAGTAATGC
+105330  ACTGATACCATTTGGTAGAAAGAATCCAAAAAACTGGAATTTATGTCGTGAAAAAGCTAT
+105390  TAAATTTATTCCAATAAAAAATGTAAGTGCTAAACTAAACGTTACAGAGATGTGGCTTGT
+105450  AATAGTGAAAG
+;     G-atp6-E2 <== start
+;     G-atp6-I1 <== end
+105461  agtaagaatttttttatcttctcgcagaactgtacgtgagtattattactcatacagctc
+105521  tttttagattttatataattatcttataacagtaacgtgagaatactaagttcaatgttt
+105581  ttgaaaaattaactttttaatttttgttaagcttttgcttaatagagagtacttctttat
+105641  tattatttgtgtttatttttatcttgaaatatttga
+;     G-atp6-I1-orf499 <== end
+105677  ttatcttttgcatttaaaagctaatgtacaatataatgatttttttagatgataatcaat
+105737  aagtttttgaatttcttgtatgttatctgtatatttatagtaaaccgataagtttgatgc
+105797  ttgtaacgcatatcatcatattattttactaattggtccatatgatatgactatacaatt
+105857  ttgtggttgtccatttgttgaaagtatacctttatttttccaattttgttttattttttt
+105917  aataggcgcatataattttaaatagctttgttgtttttgtatggctttggttattgaatt
+105977  ttcatagatttttgatagtcaatttttttttattaaaatgcttttatatgttttaatttg
+106037  ttcttctaaaatatgtgttgtagctaagatttttttgggaagtattttgttgatcgctat
+106097  tagcgagtttaatattgttttggagaaaattggttttaatgaattgagattaagggtttt
+106157  aattagtgcagttaggtactcttgtgtatgcagaaaattgagattttttgaatttttaga
+106217  tatagatttatataaactaaatttcaaatttttcttggttttttttaaattttgatgctc
+106277  cgtattattttttattcactttcaatattttgttaatcttgctatatattctgttggatt
+106337  attgtttatatgatttttttttttatgtatgttaactccaagaaaattgacataatgttt
+106397  gtttgcttgggagcatattgggttgattatcgaaattaagggtaatttattttttgaaaa
+106457  tatttttattttattatatataaattttactaagtttaatgatccataaaatccgagtag
+106517  taattcgttgttatatctatgaaattgtattttaaaagcgctatgtatattatagcgatt
+106577  tatatttgataaaaattggtctaaagtaagtaagtatgtattactaagaattgaaaaaag
+106637  attgcttactttgaaaactgaattgctgattgtatctgcgtattctttgattaagatatt
+106697  tataaattgtttatccatgattattttttttagcggtttaaaaagttttataaaatttat
+106757  gctagtaatatttactgttatatttaactttaaaaatcattttatattgtttcatgagtt
+106817  ttttatattgaaaagtatatcatgtggtccaaaatttgtttgtattccaaatgagttttt
+106877  gtggaattttggttctaaaatagctaataaaattatttgtaaagctttatgtttcttaaa
+106937  aagaaatggtggagtatttttttgtcacatttctatgaggattatcaatctaatgaaaat
+106997  tttatttaattttttgtattttccttttttatagagcttagttattcatttatttacaat
+107057  tatgttaatagtttttgttttagtattgaaattgtaatgaattattttttttggacaaaa
+107117  tttaattaattttattatctggttcaaattggggtatatatattttataagatgtttcat
+;     G-atp6-I1-orf499 <== start
+107177  ttttgtttctaaagttttctacttatatattttataattgtgtagaatctcaatgcaacc
+107237  tttcgatttttttgaaaaatctatagaaaggcgctatagtagtttctcatagtttatcat
+107297  aaaatagtttgtattttttaaaaaaggttatgattttatcataaaataaattagaatttt
+107357  atagtattattagatttttttaaaaaaattgctgtgtcaaaaatgcttctgtttttctag
+107417  caattgtaattatcgatttatagttgtgttagaattaactttgtaaggtgaataaaatgt
+107477  tatttaattatgcagagtttatggtaggtttgtttaatcaaccttttagatattctaaac
+107537  ctgtttctttaatatctcttgtatttaagatagtttggttagtaataacgttaacatgtt
+107597  aacaggcagtagccattataatttgtgagaatcatactccgaataagc
+;     G-atp6-I1 <== start /group=II ;; mfannot: splice boundaries uncertain
+;     G-atp6-E1 <== end
+107645  TATAGGGTATCATCCCTAATAGGTTAGTTAATGCTATAGAAGAGAATAGAGAAAAGATTA
+107705  AAGGAAAATATTGTAGGCCTGCTTGTCCTATGTTTTTTTCAATAAGATTTCGGTTAAAAA
+107765  TTAATAATTCTTCGAGAATGGATTGTCAATAAGATGGAATAATTTTATTATTGTAGGTAA
+107825  TAATGTGAAAGGTTAGGAGTATACTTGCAAAAGTGATTATTGCAAATATAGTAGAATTTG
+107885  TAATAGTTAATTGTTGGTTAAAAATCTTAAGATTTGGAAGTAATGCAGTTATTTCGAATT
+107945  GTTCTAAGGGTGAATGGTGTAGCAT
+;     G-atp6-E1 <== start
+;     G-atp6 <== start
+107970  AATATTTATATTTTTTAAGAGGTTGTTGGTGATACTTAAAGTAATTTTATGATGTATACG
+108030  TTATATAGTGAATTAAATATATTTTTTTC
+;     G-rps10 <== end
+108059  CTAAATTTTTGTTAAGATTTTTTGTTTAAATTGTATAGTAAGTGAATGCGGTAGATTTTT
+108119  TGAAGAAAAATTTAATATTTGCATTAGTTTTTGCATGTTATATGATGATATATTAGATAT
+108179  GTATAGTGTGTATTTATAGGTTCGTATTTCTAATTGGGTACGTGCTGTTTTATGGACATG
+108239  TGGTGATTTAAGTATAGTAAATTTTTTGATATATAAAGGTAAATAGGTTTTTGTTATTTG
+108299  TAAATTTAAAATATTATTTTTTTTTAAAAGAAAAGCAAGTAAAGAAAAAAAATGTTTTAT
+108359  TGTGTTTGTATTTAAGCTGTTAGCAATAAATTGAATTGTGGTTTTAAATTTCAT
+;     G-rps10 <== start
+108413  TAGTATTA
+;     G-nad2 <== end
+108421  TTACAGTAAAATACTTAATTCGTGAGTTGGAAGTAATAAAATGTTTGTTGTTGTCAGAAA
+108481  TAAACTGATTAATAGTGATCCAGAAAATATTATTAATATTGCAATAGTACTATCTAAAGT
+108541  ATATTTTTTATAATTTAAATTTATAAATTTTTCGAAATTAATTATTTTGATTAATCGTAT
+108601  ATAATAATAAGTACTAGTTGTACTTGTTAATATACCTAATGCAACTAGAATATAAGAATG
+108661  TAAATCAATTAAGGAAAGAAAAATTTGAAACTTGATAAAGAATCCTCAAAGGGGTGGGAT
+108721  TCCAGCAATTGAAAATAGTAGTAGAATAAATGTGAATTTATACAATGGATGAATTGTGTT
+108781  TGGATTCAATAAGTCTGTTAAGTAGATTAAATTTTTATTTGTGTTTTTTTTAATAAGTAT
+108841  AAGAAGCCCAAAAAAACAGAATATTGTAGTTACATAGATTAAAAGATAAAGAAAAAAAGA
+108901  ATGCAAACCGAGCATAGTTCCAGTTGAAAGTCCCATTAACATATAACCTATATGGCTTAT
+108961  AGAACTATAGGCTAAGAAGCGTTTTAATTTTTTTTGATATAATGTTGCGAAACTTGCAAA
+109021  AATTATGGTTAAAATGGAAGATAATACAAATATTGGGTGTCATAGATTATGGAATTGTAA
+109081  AAATACTGAGAATATGAGTCTGATAAAAATGCTAAAGATAGCTATTTTTGGAATTGTCGA
+109141  GAAAAAAATTGTTATAGATAATGGTGCGCCTTCGTAAACGTCTGGACTTCAAATATGGAA
+109201  AGGTACCGCAGTTAATTTAAGCAAAAAACTACAAAGCAGAAGAATGAATCCTAATTTTAA
+109261  AAGAATTGGGGTGTTTTGAATAATAATTGTTAACTGATATAAATTGCTAAAATTAGTTGA
+109321  ACCTGTAAGACCATATATTAAAGAAATCCCAAAAAGCAGAAAACTAGATGAAAGGGCGCC
+109381  TACTATAAAATATTTTAGACCTCCTTCTAAGGAAAATCTAGAAGTTTTTTTAGAAGCTGT
+109441  TAATATATAAAAACAGAGGCTTTGTAATTCTAAACTTAGGTAAAGAATCACGAAATCATT
+109501  AGCAGAAGTTAATATAAGCATTGCGCTAAGGGAAAGCATCTTTAAGATGATTGATTCAAA
+109561  GTCAGTTGGGAAATTTTGTTTTTCGGAATATTTGATTAAGCTTAAGAAAAAAATTGTGAA
+109621  TAAAATTATTAACAATTTTATGTTAGTTGTATAATTATTTATAACTAAATTTTCATAACA
+109681  AATTGTTTTTGTTATATTTAGGTTATTAAGCATTAGAATAAATCCTAGTATTGTTGTAGT
+109741  TATTACTAAAGTAATAAGATTTTTTAGTATAGATTCAGCATTTATTTTATGCATAGTAAT
+109801  TTGAAATGTGCTTCAGATTAGAAGAAAAATAGTTATGCTTCCGATAAAAATTTCTGGTAA
+109861  AAGAAAATATAGATCTGTATTGAGTTGAGTCAT
+;     G-nad2 <== start
+109894  ATAAGAAAGTTTAATTTATTTAGATTTTGGGTAGATAGTTATTTAATGGGGTATTTATAA
+109954  TAAATGGAGGTGGTAAAGTATTGTAATGGTAAATACATATAAATTTA
+;     G-nad7 ==> start
+;     G-nad7-E1 ==> start
+110001  ATGGAAAAACGATTAATTAAAAGTTTTACAATGAACTTTGGGCCACAGCATCCAGCTGCG
+110061  CAT
+;     G-nad7-E1 ==> end
+;     G-nad7-I1 ==> start /group=II(derived) ;; mfannot: splice boundaries uncertain
+110064  ttacggtagtaaggttggttaattttattgtttatttttgttagcattgtgtaatgaatt
+110124  tttttaagataatgttttttaaaaaattttatatgagaatcaatttataatatatataca
+110184  tgcattctataactttaatttttgattaatttttaatctgatgggtatgtttaagaaaaa
+110244  ttttatatgtttttagaaattagagattaatgcgtgttttaattattattagtaaatata
+110304  atacctagggaaatctgtgatattttattcgataaagaagaaattttaagtaatctatac
+110364  ggaaatttttcagtttttaatattgttgaagaaaattatttagatattaagtttttacga
+110424  cgtagttagcttgttattatataaattttttgatctcttgcagaaaatttgaaaaagtag
+110484  atgattgaaaatgttttaattttttgaagtaaaatttagattaacaagataattcaattt
+110544  agcttagagaatggcaattattaaatcggtattaatttaaagatatttttttattttttc
+110604  gattttaaaatttttctctttaggttgagccgtatattcatggaaattatatttacggtt
+110664  ctttgaaaagggatttctctctatttcagt
+;     G-nad7-I1 ==> end
+;     G-nad7-E2 ==> start
+110694  GGAGTGTTGCGATTAGTTTTAGAATTAGATGGTGAGATTGTAAAGCGAGCTGACCCGCAT
+110754  ATTGGATTATTACACCGGGGTACTGAAAAATTAATAGAGCATAAGCTGTATATACAAGCA
+110814  TTGCCATACTTTGATAGATTGGAT
+;     G-nad7-E2 ==> end
+;     G-nad7-I2 ==> start /group=II(derived)
+110838  gtataagggctttaactaaataatataattgtattgagtcaggatttatagttaaattta
+110898  ttttcgttttttttatatagtataaaaattttatcttaaatgcaacgcactaaatattta
+110958  atttttgcttgtataaagaaccttggtatgattaaataattttataaacaattaagttgg
+111018  ctaatatgttatattttttacattgcagattaaattagtataataatttttatgtaataa
+111078  gtattttataggatctatatgcattaatcaagttaacttactaataatttttgaaaatga
+111138  tatgagtccagtattaaatgaaattatggttaaattataaaataggtatagtattcttgt
+111198  ttataagtggaaagttaatatattaacaagttaaagtttatttataaccttaattggttt
+111258  tatgtaaaggatttttgcagtttatagtgtggtatgtaaaaagatgtgatttactttatt
+111318  aagtaaataagtttacttgttgtaatataaataatatatggtgaaaactagtaaatttaa
+111378  taattatcaaagagccaagtattttgtaaaaaatttgtttggttcggaatgggctaaaaa
+111438  attatgctatacatctaattttgatatataggaaattcaaacagctaccattac
+;     G-nad7-I2 ==> end
+;     G-nad7-E3 ==> start
+111492  TATGTTTCTATGATGGCACAAGAACATGCATTTTCATTAGCTATTGAAAAATTGTTAGGT
+111552  TGTATGATACCACGCCGTGCCCAGTATATTCGTGTTATTTTTTTAGAAATTACAAGAATT
+111612  TTAAAT
+;     G-nad7-E3 ==> end
+;     G-nad7-I3 ==> start /group=II
+111618  gagtaggaagctgttagattagtattttattggctagttattgtttaagtaattaggctt
+111678  tttaatgtatttaaacttaagcctaaactatcgtaactttgtatttcttttgaaagagaa
+111738  atttaagtatagtactcctaattagatagtattttgttccaaaacaaataaaaggaaaga
+111798  ataacttttaagttaatatgtttttattttttgacatattttttattgtgggttgggcgg
+111858  acagtaaaatttaattaattcatgttatatttgtaagtaaacttaaaataggtataggtt
+111918  taatattgaaaagaaaagtttaagttttgattttattataaatgaaagaggtatttggta
+111978  caatttaataaaatacagtaaatatacaatagtgtttttattaccatttttgttagaatt
+112038  tagattaaaaatatataaaatcgttttacttaatagtggtataaattcagaaattttcac
+112098  aaatctttattatatggggatagcgaaattagtgaacaattttttct
+;     G-nad7-I3-orf505 ==> start
+112145  atgtccaaaaatagtttcttgagattgcgtttaaggcatagagagttagttactgattgt
+112205  aattttttttgtcaattatttttggaagcacggcaattgcttagtagtgatttttttcct
+112265  acagaaaagtatagttttagattaaaattaattagagaagtttttaagttgcaaaaaaga
+112325  attgctcagttgggaaatataggaaatacgaatggagcattatttttaataaataaatat
+112385  gtatctcatttatgtgtacgacttttcgtcataggcgcgctaaaaggtagcataagtgtt
+112445  aattttttatttcattttaaaattttagaagtatttgattgcttttacatattaaaatat
+112505  ggttggtttttaattggacttaaatcattatataatgtgaaaaaaatttattttaaaaaa
+112565  gggaatgatagcgtatatagcgttttacttagctctgtttttgacaaaattgcacagcga
+112625  caaattttgattttattagacccattagttaatgcaatttcaaaatttaatcgatatggt
+112685  ttaagtcgtgagcggttttatagacagttggtaaatcactcggattttttttttttaaat
+112745  aaaaattttataaaattgagattattaaaatttgaagttagtaattgatttagtaaaata
+112805  tcacatgtatatttgtataattatttaccttggccgcgtggatataaatatttactagag
+112865  cggtgattgaacccaagcttggcggtaaagataagaaaaaataattgtagtaagatttta
+112925  acacaaggtataatgcaggatttgatacttggtcctattatatttaattttattttaaat
+112985  agtttttttaaatttttattgtttaaattgaattatagaacaatttttgtgaaacattta
+113045  cgtgtgtttattataggaagtattattggggttattactgtttctaattcagttttgagc
+113105  cgtttttattcaaatattattaattatttaaattttagaaaaattgtaaattatgatttt
+113165  ttgaagatgtgttatgttgatttttttcatgtaagaaaatttaattttttaggttgacag
+113225  gcgttttttactcgtagagtttgaataagcgtagcattgagcaaaaattatttgggtttt
+113285  agatatcctttgaagagaagtgttactggtttacttgaatggcgaccttgtttttatcat
+113345  cgtttgggatttaagaaggcaattaagtgggaaatttttaaatctccttataatatacga
+113405  attattatatttgttaaagtatatttaattatgcggaattttgtttggtattatttatat
+113465  atagataattttattatatactttatattgttgtgttgttttgtatctaggtgcttgagg
+113525  agaaatttaaaatataagatgtattgtggtaagagacgattattttctaatttgatttga
+113585  aaagtgtttttcggattagatatggtctttttttataaaatttgaggatactgggtgaga
+113645  atatttttaaaatattaa
+;     G-nad7-I3-orf505 ==> end
+113663  tttggttaataaataagatttttagttaagattagtattagaattttaacatttagtttt
+113723  tgatttttattttaaattgtatgtattttaaaaaatttaaaaatacaaattaaaagaaga
+113783  tattaataaaattaaggggaaataatatatttaagctttagttattaagttaactttggt
+113843  ttagatttaaggcgctatataaaggttttattgtttaaaattgttatatttatatatttt
+113903  tagagatacatattttgaaaaattcggttcatgatgagcctaatacggggcaactcgttt
+113963  gtttggttctgagaagaggaaatttagtacttacttctat
+;     G-nad7-I3 ==> end
+;     G-nad7-E4 ==> start
+114003  CATTTAATGGCATTAACTACTCACGCAATGGATGTGGGGGCGGTAACTCCTTTTTTATGA
+114063  GGATTTG
+;     G-nad7-E4 ==> end
+;     G-nad7-I4 ==> start /group=II(derived)
+114070  gtgtaatttgttttttctatttatataatagttttaatttagcaaaaaatatttaaaaaa
+114130  ttgattttaggattttgtgaatgcagagattagtacgaattttctaatatttatggctta
+114190  tgatacaaatatgtatcgaaattcaaattatttttttgtggttgttatactataaaatat
+114250  agtatttaattttacacaggctgttaaaattaataatgaatgtgttcacttagattaaat
+114310  tagtttatgattagggcataaaaaattacttaaaaattttaaattagcggtctaagctaa
+114370  cgtgattaagtatagaaatttgtgaattaaaaattactattaataaggtatttaggtacg
+114430  aatagcgtgtttttttaattagattaggtattataatgtaatattga
+;     G-nad7-I4-orf511 ==> start
+114477  atggagttaatatttcaattatatgaagtaaaatttgaaattaataaaattaataaaata
+114537  ataggtaattattttagatatttacgatgacctattggattaggtgtaggttgtttgatt
+114597  agaaagatagccatttttattcaatatttgattcgtgtgggttttattgtaaatgatgta
+114657  ttttgtgttaaattgcagagagaatttttttgctcgataatcaatcgacttcttgttgta
+114717  gattacatatcgcattttgtttacagagcattgaagaaaatttttcaatttttaatttga
+114777  gagtgaagatttgaaataattaataaatttaagcaaaataatttatataactatttaaat
+114837  aggaacatttcttcaattttgttttatgaagaggcagcacaagttcttattttaaatact
+114897  acagtatcttgcctagaggtatttactgtatttggattattcattttccaacatattcga
+114957  agagtttgtgtaacaatgaattatttaggtcgttttttggttcctaagctgaacactgtt
+115017  aactactcaattattaaattaaaaattaagtgtattctgaagacatttttttgtaaacag
+115077  ctaaaagggcctgctttaatacttattaaatatttcaaatttattccaaatttatggaag
+115137  agccaaatattttttaaacagtgagagtgttacaattgagtgtctgcattcagtttatca
+115197  agtatgtggaatcagtttttatttgatttggtgttagcagatttcgaatttgtaattaaa
+115257  gaaagaatttttaatttttattatagaaaattttttgggagctatcagaattttaaaagc
+115317  aaatttcaatgcggagtaggtttattttttgttaaatgtgtagatgaaatattgatatgt
+115377  tgtgaaaattatgaagagaaagattggataattggggtattatttgaaattttagtagat
+115437  aaacagttaattatagattttttaagttctaaaattttactaggtcagcgcaatactaat
+115497  tttctttatttaggttttgagattagaaatcatgacgtaagaagtagaaataagtacatt
+115557  agactatgttgtgtaaaattttgaggtgatttagtgattcttccatgccaacgttcggta
+115617  ttgggattgaaatcagagttaagaggggtgttatctaacgttaacgcctcggtttcttct
+115677  ataatttaccgcttgaaccgtattgtttatcaatgaggtatgtattattcatttagtatt
+115737  tcaagtgttttatgtctcttattggacagttttattcattttagggtttggagattttta
+115797  aaacagaaattttctaaaataggtaaaacatatttagcagagcgatttttttttaccggg
+115857  aatctgaaatatcaaacaaattttaagaaatggcattttcatgatgttttatctgaatca
+115917  acgcagaatttattgtttaataataaaatttggtttatctgattagtgtctttaaggcaa
+115977  ttttgttctataaaaagattttactatattcaataa
+;     G-nad7-I4-orf511 ==> end
+116013  atattagagttatttgttggaataaaatttttctattgattttacatattgtaatatgta
+116073  aaaaaagatataatacttaatttgtagcagcaacgcgcataagctgaatactataagaat
+116133  agtttgttcagttttatagggaaaagtttttgcaacaacgatttatcctaat
+;     G-nad7-I4 ==> end
+;     G-nad7-E5 ==> start
+116185  AAGAGCGGGAAAAGTTATTAGAATTTTATGAGCGTGTTTCGGGAGCACGGATGCATGCTA
+116245  ATTATATACGGCCTGGTGGTGTAAATAGGGATTTACCATTAGGATTTTTAGAAGATTTAT
+116305  ATACATTTATTGTACAGTTTGGTTCACGAATTGATGAAATTGAAGAATTGTTAACGTATA
+116365  ATCGTATTTGAAAGCAGCGATTAGTTGATATTGGTATTGTATCTAAAGAATTGGCTTTAG
+116425  ATTGAGGATTTACTGGGGTTTTATTACGAGGATCTGGTGTAGTTTGGGATTTAAGAAAAA
+116485  CGCAACCTTATGAAATTTATAATGAATTATTTTTTGATATTCCAGTTGGTAAAAATGGTG
+116545  ATTGTTATGATGT
+;     G-nad7-E5 ==> end
+;     G-nad7-I5 ==> start ;; mfannot: no intron type identified
+116558  gtggtgtaagaaacgtattatattagagtaaataggttaattaatgtttaaagtgtctgt
+116618  aaaatgtttagcaactataatattaaaatattagtgttataaacaatcaattcgtaagat
+116678  tgaaattgatattttttgatttaaaatatatataataaaaaattattaaattgatcatat
+116738  ttaataaatgtattgtattactgcattagtttattaataagaaatttattaaagtgttaa
+116798  ataaatttaattagataagtagtttagtgtaatcgttatggcatacatagtttacaaaat
+116858  ttttattattacgttatttttgtaaaagaaaagagaattaattgattattcaaagtattt
+116918  agggagatttatattaaagaccttttgattgtaaataggattttgtcttaagattaattt
+116978  agttaaaaaaattatcctagtcgattgtagaatattttaatattaatattaaataagatt
+117038  aattttggtagtataaattgggtttttatttattttttttacgtaatttgtatttctagt
+117098  aaatactaagaataaaatggttagttgtatttatttttaattattaaggataatttgcca
+117158  ttagttagaatattttgctaatagtttgttacgtagtttacaaattgtaaatgccttgta
+117218  aatttattgatagtttagcttaatttgaacgcttttaaaaattttgctgagcacagttat
+117278  tatattatattattttcgtaaagcttaatatattgggaaatatacgttaagtttgaattt
+117338  gaaatcaatttaaagattttaaaataccg
+;     G-nad7-I5 ==> end
+;     G-nad7-E6 ==> start
+117367  TTATCTAGTTCGTATTGGTGAAATGCGTCAAAGTTTAAACATATTAAATCAATGTATTAA
+117427  TGAAATTCCGACTGGTTCAGTTAAATCAGATGATAAAAAATTGATGTCTCCAACAAGAAG
+117487  TGAAATAAAACAATCAATGGAAGCTTTGATACATCATTTTAAATTATATAGTAGAGGGTT
+117547  TGATGTTCCGCAGGGTGAAACATATGTTGGGGTTGAAGCGCCTAAGGGTGAATTT
+;     G-nad7-E6 ==> end
+;     G-nad7-I6 ==> start /group=II(derived) ;; mfannot: splice boundaries uncertain
+117602  ttaagttaatatataattttacatatatttttgcattactatgatttagttaataatata
+117662  ttattataggtttttatttttagtagcttattttccaattaagaataagtaaaaatcaag
+117722  ttattgaacaatattaaatttgaggttatagtttatagtgtatttttatttaggcaatat
+117782  atattgttgttaataagaaaaaagtttttacaaaattttgtaaataatgtataggtcaaa
+117842  gtgatctaatttttgttcgagaaaaaattatatatctatttttttaattaaattcgtgga
+117902  gtttacgtagtttaaaaagttgttgtatatttattgggcaaaaagtacatattaaatagt
+117962  tagctatataaataaaatgtaaattttatttaaaagcaaaacgtatatatgtaaaaaatt
+118022  atgtgatcttattagtataataatagaatattggaaattaaaggattttgttaaaaattt
+118082  tccagttataagaaataacttaaaatgtaagttagattttaaatttaattaaaattttta
+118142  ttaattattttagatgagaagaatggttaaaaaagttagataaacattattgtttcattt
+118202  ctaattgtagttaaacaattataattatatttttttaattttaatttagttgttaaatta
+118262  tgtataataagaaatggaattaagctgtatgtattgtgaattacatgtacagttttataa
+118322  agggttttctactttttagtatagaaattctattttacgt
+;     G-nad7-I6 ==> end
+;     G-nad7-E7 ==> start
+118362  GGAGTTTATTTAGTAGCTGATGGTTCAAATAAACCTTATAGATGCAAGATAAAAGCGCCA
+118422  GGATTTGCGCATCTTCAAGGGTTAAATTTTATGGCTAAAGGACATATGATTGCGGACGTT
+118482  GTTACTATTATTGGTACACAAGATATAGTTTTTTTATGATTTAAGTTTAGCGTTAATTTA
+118542  TTTTATTTTTATTAA
+;     G-nad7-E7 ==> end
+;     G-nad7 ==> end
+118557  AAAAATATTTAAAATTTTAATGAAAATATAATTGTTTAATATAGTGCATTGGAAGTATAT
+118617  TTAAGTAAAATTTCTGAAAAAATTTATATTTAATCGTAGATGTTTTCAAAATTTATTATA
+118677  GGATTTTAAGAAACAAATTTTATATTAATTTATTTTGTATGTAATAAAGTAAGAAGGTAC
+118737  TAGTGTAAAATGCTGAATCGTAATGTGGGTAGGTATGGTATAGACTGTTTAACGTAAAAA
+118797  TGTAAATTTTTTAAAAAAATTTTCTACCCATAAATTCTTATGATATAGAAATATTTTTAG
+118857  AGTTTTTGTGTAATATGTAGAAAATATATAATTAAAATAGCAATAAAAAATGAAATGGAG
+118917  TTTATTTCTATATTGTAATAGGCTACATTTTTTAAGTTTATGAATCGTACAGTAGATTAG
+118977  CACTAAAAAAATTAGTTTTTTAAATTTACTATTAAATTAGGTAATGGTTATGATGAATTG
+119037  GCTAAGATATTATATTAAAATAAATATTGATCAAAAGCTAGTTAAAGTAACCATTTAATA
+119097  TTTTTTTGTTTATTTTTAGAATTTACTGCATCTATATGTATGTATTT
+;; mfannot:     /group=II
+119144  aagctgtatactaatctgattggtatgtgcagttttttaggagga
+;; mfannot:
+119189  TTTAGAAGATTCTATCTTTTGTGGTGAAGTAGATAGGTAGGTTTTGAAAAAATTTATGTT
+119249  GTAGTATTAATATATTAGTTTATACAAATTTCAGTTATTTTTTGGATTGTTATTAGTTAT
+;     G-nad4 ==> start
+;     G-nad4-E1 ==> start
+119309  ATGATTATTTTACCGATTTTAGTTTTGATTTGTGGAATTGTATGTATTAGTTTAATTTCT
+119369  TCAGTGAGGTATATATATATTAAAAAGTTAGCTTTGTTTATTACAATTGCTGTGTTTTAT
+119429  TTATCGCTATTGTTTTGAATATTTTATGTTAAGCAAAGTTTATTTTTTCAATTCATGTTT
+119489  TATAGAGAATGGTTAGTGTTCATGAATATTGATATTATATTTGGTTTAGATGGTATATCA
+119549  ATATTTTTTATTATTTTAACAACTTTTTTATTTCCTATATGTGTATTGTCAAGTTGAAAA
+119609  ATAATTTTAGTAAATGTAAAGGAATTTTTTCTTTTACTTTTATTTTTAGAAAGTTTTTTA
+119669  TTATTTATTTTTTCTACATTAGATTTAATATTATTTTATATTTTCTTCGAGAGTGTGCTG
+119729  ATTCCTATG
+;     G-nad4-E1 ==> end
+;     G-nad4-I1 ==> start /group=II(derived) ;; mfannot: splice boundaries uncertain
+119738  gtggtattataactttcttttctgaatatatttagaaaaaaataaatttcttattgatat
+119798  tgttttttagctaatttgattagataaaatttaagtgattttgaagaatgcatttaagat
+119858  ttttattaattactaaagcaataatttgtaataattgtaagtaatatttttataaaatac
+119918  ttaagcaaaatttctgattagaagttgatataggattttaatgagagtaaatttaaagaa
+119978  ttttcaatatagaaatattagaagggtgacttataaattactattaatttcaataatttt
+120038  tagttgttggaaaagtcattgtttaataatgaatttataaaaatttaaattgattgggaa
+120098  ttgtgtgggaattaatattaaaggaaaaaaattaaagtttttattagagctatataatag
+120158  gaaactattttgtatggtttagaaacagaatttcttgtaataaactttattatttttaag
+120218  tagtttagtgtaaaattttaattctagtttgt
+;     G-nad4-I1 ==> end
+;     G-nad4-E2 ==> start
+120250  TTTTTAATTATAGGAATTTGAGGTTCGCGTGAGTGTAAAATTAAAGCTTCTTATTACTTT
+120310  TTTATGTATACATTACTTGGATCATTGGTTGCTCTTATTGGTATATTAATAATTTTTTTT
+120370  GAAACAGGCACTACGAATTTTTTTATTTTGTTAACTCATAAATTCAGCTTTGAACGGCAA
+120430  TTGTTACTATGAATTATGTTGTTTATTTCATTTGCAGTTAAATTTCCAATAGTTCCTTTT
+120490  CATATTTGGTTGCCAGAAGCTCATGTGGAAGCGCCTACGGTGGGATCTATTATTTTAGCG
+120550  GGGGTTTTATTAAAATTAGGTATTTATGGTATGCTACGTTTTTCAATTTCTTTATTCCCT
+120610  CAAGCCAGTAGTTATTTTACACCTTTTGTATATACAATTTGTATTATTTCCATCCTTTAT
+120670  AGTTCGTTAACAACAATTCGTCAAGTGGATTTAAAACGTATAATAGCGTATTCTTCAGTT
+120730  TCTCATATGAATTTTGGTTTGTTAGGTTTATTTTCTGGTACTTTACATGGTATTATTGGT
+120790  GGTTTAGTTTTATCTATAAGTCACGGTTTTGTGACAAGTGGGTTATTTATTTGCATTGGT
+120850  GTATTATATGATCGTTATCATACTCGTCTGCTAAAGTATTATAGCGGTATTGTGTTAGTA
+120910  ATGCCTGTTTTTTCTGTTTTATTTTTATTTTTTTCGTTAAGTAATTTAGGTATGCCGGGT
+120970  ACAAGTAGTTTTGTAGGTGAATTACTTATTTTGATAGGTACATTTAGCCAAAATAGTATT
+121030  TCAGCTATTTTTGGATCGAGTGGTATTTTGCTTGGTACTTTATATTCAATTTGGTTATAT
+121090  AATAGAGTGTGCTTTGGTAATTTGAAAATACAGGATAATGTATTAGTATATCTAGATATA
+121150  TCAAAACGTGAATGTTTTTGTATTTTTCCATTAGTAGGTTTAGTGTTATTGTTAGGTTTA
+121210  AATTCTAATTTATTTTTAGATTACTTACAGAGTGCAGGTTATATGTTACTTTTTGAATAG
+;     G-nad4-E2 ==> end
+;     G-nad4 ==> end
+121270  TTTATTTTTTACTAGCAGTAAAAAAATTATATTTTTAATTATTATTACATTTATTAAAAT
+121330  AAAAGTTATATGTTTAATATTAGAAAAATTTCGTTTTACGTAGTTTTTTATATCAGTTTA
+121390  TGATAAAAATAATTTTGTCATAATAATACTTTATATTTTAGACTTGACGTTTTTTTTGTG
+121450  TTTCTACATGTAATTTACATGTAAATTAAAGTGTGGTATTTTTTATAAAAAAGATGTATT
+121510  TTTATATAAAATAGTCTTTAAAGTTGTTTATAAGTGTTATG
+;     G-atp9 ==> start ;; mfannot: alternative ATG start pos 121548
+121551  ATGATTTTAGAAAGTGCAAAAGTTATTGGTGCTGGGTTAGCAACGATTGGATTAGCTGGT
+121611  GTAGGTTTGGGTATTGGGACAGTATTTGCGGCATTAATTACAGGAGTAGCTCGTAATCCA
+121671  TCTTTAGTAAATCAGTTATTTACGTATGCGATGTTAGGGTTTGCTTTAACAGAAGCAATA
+121731  GCTTTGTTTGTTTTAATGATTGCTTTTTTATTGCTTTTTGCTTTTTAG
+;     G-atp9 ==> end
+121779  TGTTTACGGCAAAAATATATTAAAGATTAATTTTTTATAATTCTATACAGAAACTGAAGG
+121839  ATCTGTTTTTTGTATAGAAGATAGAGGATTGGGTACTTGAAATATTTA
+;     G-trnD(guc) ==> start
+121887  GGATTAGTAGCTTAATCGGGAAAGCTCCAAATT!GTC!ATTTTGGTAGATGTAGGTTCAA
+121945  GTCCTATCTAATTCG
+;     G-trnD(guc) ==> end
+121960  TT
+;     G-trnC(gca) ==> start
+121962  GATTGGATAACATAACGGTAATGTGTTGAATT!GCA!AATTCATTTTATAGCGGTTCGAT
+122020  TCCGCTTCCAATCT
+;     G-trnC(gca) ==> end
+122034  TGTTAATGTGATTGGTT
+;     G-trnH(gug) ==> start
+122051  GGCGGATATAGCTCAATGGTAGAGTATTAGTTT!GTG!GAGCTGATTGTTATGAGTTCAA
+122109  ATCTCATTATCCGCC
+;     G-trnH(gug) ==> end
+122124  TATTTATTAAGATTTTTT
+;     G-trnV(uac) ==> start
+122142  TGGTAGTTAGCTCAAGTGGTAGAGCATCTCTTT!TAC!ACGGAGGGGGTTGTTGGTTCAA
+122200  ATCCGATACTATCAA
+;     G-trnV(uac) ==> end
+122215  AAAATTTAAGTTTTTTACG
+;     G-rnpB ==> start ;; mfannot: Approximate position
+122234  AAGGAAAATCCTAATGTATTGTTATTTATACTGTAGTCAGTAAATGTAAATTTAAGAAGA
+122294  CTTATTAGAAATAATTAAAATTTATTTTATGTTTTAAATTTATGTAATAAGAATATGCGT
+122354  AAGCTTATCTATTTGTTGTTAAGTCTGCTGAAACAATAGAATATTTATTAATCATTAATT
+122414  TAAGGTTTGGTTTTTTTACAGAATTAGGTTTAT
+;     G-rnpB ==> end
+122447  AAAAATTTTAGTTTACTTAATATAGATAATAAAAGTTGTATTTGGGTGTTTGAAATAGTT
+122507  AAATGAAAATTTATTATATTTGGATTTGGAGTTTCTT
+;;     rns ==> ;; mfannot: start of 5'
+122544  TATAAGAAGGGTTTGATCCTGGCTCAGAATGAATGCTAGAAGTATACATAACACATGCAA
+122604  GTTG
+;;
+122608  GACGAGTAATTATTTTACAAGTAGCGAACGGGTGCGTAATGTGTAAGAATTTGCCTTCTA
+122668  ATTTGGGATAACCGGGTAATGCTGGCTAATACCAAATAATTTTTTTAAAAAGATTGAATC
+122728  GTTAGGAGATAAGCTTACATAGGATTAGGTAGTTGGTAGAGTAATGGTTTACCAAGCCAA
+122788  TGATCCTTAGCTAGTCTGGGAGGATGAATAGCCACATTGAAACTGAGACAAGGTTCAAAC
+122848  TTTTACGGAGGGCAGCAGTGTGGAATATCGGACGTGCGGTTCATCTAATAATTTTATTTC
+122908  GTAGTAAAATTACTTTGAAGTAAAAAAAAATTGTATATGTGTTACCTTTTTTAGGATGTA
+122968  TTGATTTGTACAAACATCAGTACAATTAAATATGAGATTTATATAGAGAAATTTTAATAT
+123028  AAATTGGTATTGTGTATAATAAAGGTTAAAAAATTATAATTAATAGTAATAAAATATATT
+123088  TTATGTGGAATAGAGTTGTTGATAGCAAATCATTGAATGGGAAATACCATAATTTCATAT
+123148  GCAAACTTAAATATGATTAGTAAAAGAGAATGTAATATAAAAATTAATGATAGATATATT
+123208  TGTATGAAGTATGTAATGTAGGATTGTCGCAAGTTCTACATTTTTTTAAGCGTTAAGATA
+123268  CTTCTGGTAGATAAATGAGTTCCAATTAATAAGTAAACACAAAAAGATGAATATTTTGTT
+123328  TAATTCATATATATTAAAAGTTTAGTATGAATTTTAATAAAAAGGATATTCAATATATAG
+123388  TTGATGAAATTAGTTTTCATTATAAAAATATATAAGTGAACATTGTAGTTTATTTAAGTA
+123448  TTCAAAAATTAAAAAAAAGTACTTTAACTTATTTTTATTTAAAGTTTAGATAAAATATAA
+123508  ATAATTTTGGTAAGAAGAACTTTAATATAAAATGCAATATTTTTGAATACGTAAATTTAA
+123568  TTACTGAG
+;; mfannot:     /group=II
+123576  tagctgtataaaaggttacttttatgtacagttccgtaggggaagg
+;; mfannot:
+123622  TTAAATTTACAAATATAACTTTACTCTCTGACAATGAGCGCAAGCTTGATCCAGTAATAC
+123682  TTTATGTGTGATGTGAAGAGTAGGAGACTATTTGTAAAGCACTATCGGTAAAAACGAAAT
+123742  TGACTATATTTACATAAGAAG
+;;     rns ==> ;; mfannot: corr to pos 485-571 of R.americana
+123763  CTCCGGCAAATTTCGTGCCAGCCGCCGCGGTAATACGAAAGGAGCAGGTGTTATTCAGAT
+123823  TAACTGGGCGTAAAGGGCATGTAGACGG
+;;
+123851  TTCATTATGTGTACTATGAGTTACAAAGTATAATTTTGGAAAGTAGTATACACAGCAGAA
+123911  CTTGAGTTGGGTATAGGGTAGCAGAATCTTTAATGTAAAGGTGAAATTTGGTGAAATTAA
+123971  AGAGAATACCAAGGCGAAAGCAGTTACCTATGACGAAAC
+;;     rns ==> ;; mfannot: corr to pos 735-810 of R.americana
+124010  TGACGTTGAGGTGCGAAGGCATGGGTAGCAAATAGGATTAGAGACCCTAGTAGTCCATGC
+124070  AGTAAACGATGAATATT
+;;
+124087  AAATTTTGAAATAATGATTTTCAAAGTTAAAGCTAACGCGT
+;;     rns ==> ;; mfannot: corr to pos 850-965 of R.americana
+124128  CAAATATTCCGCCTGGGGAGTACAATCGCAAGATTGAAACTTAAAGGAATTGACGGGGAT
+124188  CTAAACAAGCGGTGGAACATGTGGTTTAATCCGATGTGCGTTTCGGTAAGAGTGAG
+;;
+124244  TAGGAGGCTCATTGTCTTATTCAGTTTTTATAGTTAAGCTGATTTTTTGGTATATATATA
+124304  TAGATGTAAATTCTGTATCTTTTATGTATAGATAGTTCACCAGAAGCTAAATTTCGGTTT
+124364  AGTTATCCCACCACAAGAGAATAAGTAGGTAGTATTTGTGTGGTAAGCAGGGACTTTAAT
+124424  ATTTAATGTATGCGGTATATTCAAAATACAGTAAAGTGTGAACATAGATTATTAGGAGAG
+124484  AAAATACGTACTATTATTGTAATAGTGAAATTCGTCATAAAAACTTTATTTAGAGATTTT
+124544  TTAAATCGAAAAGTTTAAAGATTTGTATTGATTATTGTGTAAAGTTGGATTACGCAACAT
+124604  GTATAATAACTTTTGGCGTTTATAATAGCTCCAACAATAATTTTAATTGGAATGTATACC
+124664  AAAACGAAATTTTTTCTTCATAGTCTATTCTTTAATTTCTATGGTATTCTACAAGGGGTA
+124724  TGAAGAATTTAGGATAATATAGAATATTAATAAAACCTGTTGTTTTGGAAAAATTAGGTA
+124784  AAATTAAATTATATTTACAAAAATTTTTTATTTTCTGATTTTAGAATTT
+;     G-orf148 ==> start
+124833  ATGTTTTTGTTAAG
+;; mfannot:     /group=II
+124847  tagctgtatgaattggaaaattcatgtatggtttcgaataggcgg
+;; mfannot:
+124892  TCTAAGTTTTTTAGAATATAGTATCGACCTTACCACTACGCGTAAAATCTTACCAGTTTT
+124952  TGAATATTTTA
+;;     rns ==> ;; mfannot: corr to pos 1014-1044 of R.americana
+124963  TACAGGTGTTGCATGGCTGTCGTCAGTTCGTGTTGTGAAG
+;;
+125003  TGTTTGGTTTAGTCCCTATAACGAACGCAATCCCTATCTCTTATTGCTAAAATACTTCTG
+125063  CAAAAGTATTAAGAACTTAGGAGAATCGCTAATAACAAATAAGCTGAAAGTGGGG
+;;     rns ==> ;; mfannot: corr to pos 1190-1252 of R.americana
+125118  GTGACGCCAAGTCGTCATGGCCCTTATAGACTGGGCTACACACGTGTTACAATAATTATT
+125178  ACA
+;;
+125181  ATGAGAAGCAATAATGTAAGTTGGAGCAAAACTCTAAAGGTAATTTTAGTT
+;;     rns ==> ;; mfannot: corr to pos 1305-1411 of R.americana
+125232  CAGATTATTCTCTGTAACTCGAGAATATGAAGTTGAAATCGCAAGTAA
+;     G-orf148 ==> end
+125280  TCGCAGATTAGTATGCTGCGGTGAATATGTTCTTAGATCTTGTACACACCGCCCGTCAC
+;;
+125339  ACCCTGGGAATCGGTTTTATTGTAAACAGATTGTATAACTTAAAGGAGATTGTAAAATAA
+125399  ATTTAGGAGTTCGTCTGTTAGATTAGAAT
+;;
+125428  CGGTGATTGGGGTGAAGTCGTAACAAGGTAGTTGTAGGGGAACCTGCAGCTGGAAGTAAG
+125488  ATATA
+;;     rns ==> ;; mfannot: end of 3'
+125493  AATAACACTCATTTATTATTTTGTATGTATTTTATCG
+;     G-rrn5 ==> start ;; mfannot: complete
+125530  GATATTCTAATAATATATATTGATACTGGATCCCATTTCGAATTCCGGAGTAAAACATAT
+125590  ATATTTCATATATAGCATAAATGTTGTGAAACGTGATTATGGTATT
+;     G-rrn5 ==> end
+125636  TAATGTAG
+;     G-trnF(gaa) ==> start
+125644  GTTTAGATAGCTCAGCGGTAGAGTAAAACACT!GAA!ACTGTTTGTGTCGCTGGTTCAAA
+125702  TCCAGTTCTAAACA
+;     G-trnF(gaa) ==> end
+125716  AAAATTGCTATAAAAAAG
+;     G-trnK(uuu) ==> start
+125734  GAATGTGTAGCTCAAGTGGTAGAGCAGTAGGCT!TTT!AACTTAATGGTTCCGAGTTCAA
+125792  GTCTCGGTACATTCA
+;     G-trnK(uuu) ==> end
+125807  ATTGTATAGGGTTTTAACTCAAATATTTTTTGTGATATGAAACGGCAAAATAAATTTAAC
+125867  AGTTTGAAGTTTATGTATGTATACACTAATGGTTCTATTTTAATTTCTAAAGATTTTTGT
+125927  AAATATAATTTTTTATTAGGTGTGGATATTTTTAACTCAAAGCATTGGTTACGTGTAAGA
+125987  TCAATATTTTTCGAAGGTAAATCAGTGATAAAATTTAAATCAAAATTTTCAAAAATTGGG
+126047  AATATCTAAATTATATAGTAATAAATTCATATAAAATAGAAATATC
+;     G-trnT(ugu) ==> start
+126093  GTATCGTTAGCTTAATTGGTAGAGCATTGATTT!TGT!AGTTCAGAGGTTGTGGGTCCGA
+126151  GTCCCATGCGATACA
+;     G-trnT(ugu) ==> end
+126166  ATTTTTTGGGTTAT
+;     G-trnM(cau)_1 ==> start
+126180  TGTAGTATTGAGTAATTGGTAACTCACTAGATT!CAT!GCTCTAGGAATATTGGTTCAAG
+126238  TCCAATTACTACAA
+;     G-trnM(cau)_1 ==> end
+126252  ATTTAGACTAGAATTGAAGAAGAGAGTAATAA
+;     G-trnM(cau)_2 ==> start
+126284  GGGTTTATAGCTTAATGGTTAAAGCAGACTACT!CAT!AATGGTTTTATTGTAGGTTCGA
+126342  ATCCTACTAGACCCA
+;     G-trnM(cau)_2 ==> end
+126357  TATATGG
+;     G-trnA(ugc) ==> start
+126364  GGGGATGTAGCTTAATGGAAAAGTTCATACTT!TGC!AAGTATGCAGATATCGGTTCGAA
+126422  TCCGGTTGTCTCCA
+;     G-trnA(ugc) ==> end
+126436  AAGTATTTAGAGTGAGT
+;     G-trnR(ucg) ==> start
+126453  GCGTCTATAGCTTAATTGGAAAAGTACCGAACT!TCG!GATTCGTGTTATGAGAGTTCAA
+126511  ATCTTTCTAGACGTA
+;     G-trnR(ucg) ==> end
+126526  TA
+;     G-trnI(gau) ==> start
+126528  AGGCTTATAACTCAATTGGTAGAGTACGCAAGT!GAT!ATTTGTGGAGTTGGTGGTTCAA
+126586  GTCCACTTAGGCCTA
+;     G-trnI(gau) ==> end
+126601  ACATTTTTTAATAAAGATTTATCGTATG
+;     G-trnL(uag) ==> start
+126629  GCCTTTGTGGCGGAATTGGTAGACGCGCTAAACT!TAG!AATTTAGTTTTTTCGGATGTA
+126687  AGAGTTCGAGTCTCTTCAAAGGTA
+;     G-trnL(uag) ==> end
+126711  TAGAAATTGAAAA
+;     G-trnN(guu) ==> start
+126724  TTCCATCTAGCTTAATAGGTAAAGCAATTCACT!GTT!AATGAATGGAGTATAGGTTCGA
+126782  GTCCTATGATGGAAG
+;     G-trnN(guu) ==> end
+;     G-trnY(gua) ==> start
+126797  GAAGGAGTGGCTGAGTGGTTTAAGGCGGTAAACT!GTA!ACTTTACTAATGTTATCATTA
+126855  TCATAGGTTCGAATCCTATCTCCCTCA
+;     G-trnY(gua) ==> end
+126882  AAAGATATTAATGAAGTTAAAAGAA
+;     G-trnE(uuc) ==> start
+126907  GTTCCTTTCGTCTAGTGATTAGGACATTGCCTT!TTC!AGGGTGAGAACGTGGGTTTAAT
+126965  TCCCACAAGGAATA
+;     G-trnE(uuc) ==> end
+126979  ATGTATTGTTATGAATATATTAT
+;     G-trnQ(uug) ==> start
+127002  TGGGATATAGCCAAATGGTAAGGCATTGGTTT!TTG!ACATCATGAGTATAGGTTCGATT
+127060  CCTATTATCCCAA
+;     G-trnQ(uug) ==> end
+127073  AGTTATTCATTTGAAAATCGTATA
+;     G-trnG(ucc) ==> start
+127097  GCGAATATAAATTAATGGTAAATTATTTGTCT!TCC!AAACAGATTTTGAGAGTTCGAGT
+127155  CTCTCTATTCGCA
+;     G-trnG(ucc) ==> end
+127168  AT
+;;     rnl ==> ;; mfannot: 5' +/- 50 nt
+127170  AATATATAACTTAATATTTGCATGTAAAGTATATTTAATGAATACCTTGGTATAACAAAT
+127230  GGTAAGGACGTTTTGAAATGCGAAAAGTCGTGGTGTTAAGTAGAAGATTGTTAAACGCGA
+127290  ATTTCCTTGCGAAGAAATTTATTCTTATAAGAATTATGAAAAAGAATTTAGGGAATTGAA
+127350  ACATCTTAGTACCTAGAGAAAAGAAATCAATCGAGATTCCGAAAGTAGTGGTGAGCGATT
+127410  TCGGATATAGGTTAATTAAATTAGTTTTTATACACTAGGAAATATCTTGAAAGGTATACC
+127470  GTAGAAAGTTGTAGTCTTGTTATTTGGTGTATAGAGATTTATATATTTAAAATATTTAAA
+127530  ACGATTTTCGTGTAGAATTGTTTGAAAATGGGAGGCCCACCTTCCAAACCTAAATATTTG
+127590  TTATAACCGATAGTGTAT
+;;
+127608  AAGTACCGTGAGGGAAAGGTGAAAGAAAACCCATTAGGGAGTGAAAAGAAGTTGAAATTA
+127668  AATATAAAGAAATAATTTAATAATGATTTTATTTTATAATTATTATAAATGTACCTTTTG
+127728  TATAAGTGTTACAAATAAAGTTATTGGGAGAAAGAAGTAGTCGCTGCATTAGATAGGAAA
+127788  TAGAAAAAAAACGTTCATATTGCATTGTATTAATAAATAGTAAATAAAAGAATAAGTTAT
+127848  TAATTAAGAATAGGCTTCCATAGATAAAAGGTTTTAACTACTGAAGTATTAAAATTCTTA
+127908  TACGGTTAAATTAAATTGTTAAAAAATTGGGAGTAAACTTTGATTCTAATTTACTAAAAA
+127968  CCTCTTAATAAATATTTGGGTTTATTCATAATTATATACCTAATTATGAATTAATGAAGA
+128028  GTATTTAGATAATTTTTTAAATAAATACCAAATTAAAGTTTAATTATTTATATTAATTAA
+128088  TTTGAAATTGGCG
+;; mfannot:     /group=II
+128101  gagcttcatgttatgaaatagcatgtgtagttttaggtggg
+;; mfannot:
+128142  GAAAATTTTAATTTTCTATCATAATTGGGTCAGCAAGTTAATAAGGATAGTTTGCTTAAC
+128202  TTTGGTGATAAAAGGGAGGCGTAGCGAAAGCGAGTTTTAAAAAAGCGAAAATTGGATCTT
+128262  TCTTATTAAACCCGAAGCCAAGTGATCTAACCATGATCAAGTTGATATTACTGTGATAGG
+128322  TAATTGAGGACTGAACCCGTATATGTGGCAAAATATTGGGATGAATTGTGGTTTGGAGTG
+128382  AAAGGCTAATCAAACTTGGCAATAGCTGGTTTTCTGCGAAATCTATTTGTGTGCTTAGTG
+128442  CGAATACGCTTATAATGTAAAAGAATTGTAATAATAATATTAATGTAAAAAATATAGATA
+128502  AATTTAATTTATTAAATTTTATATAATATGAATTTCGTCATATTTTTGGTTTTAAACTAG
+128562  AAAAAATATATGAGAGTAAATTCTAGTATAAAAATAATGAATTTTTTAATTGACTTTAAA
+128622  GTTTTTAAACAGAATATTTATTTTACAAATTGTTAAAAATTATTTGTGAATAATATAAAA
+128682  TAAACTATGTTTGTTAATTGTTACCGTAATTCGCGATTTTACAAAGTTAAGAAGTTTATA
+128742  GTATAAAAAAAATATATTTGTTTGTATAAGATAGATATGGAAAATTTAAATTAGTTTTTT
+128802  CGGTAAAAAACATAGATTCATTTAATAATAAACATAAGAGATATATAAACTAAGAGTGTT
+128862  TTTAAAATAAATTTGAAAGAAATTTATAGAAAAATTACACAACGGATCAAATTCATATTT
+128922  TTTCTTTAAAGATTTATAATTAATTTGCGTTTTAAATATAGCATTAAATAATGTATATAC
+128982  TATGTATGGTTTTTGTTATGTAAAAATATTTTGAAATAAGGGAGGTTTTTACAAATTGCG
+129042  TAATTAAAAATATAATCATACTATACGTTTGTTAATTTTATATTTAAAGTGAGAAGCTAG
+129102  GTAATAATAAATTATTATGTGTAGTTTTGAAGTAAAGTTTTCTTAATAGAAATCGATTAT
+129162  AACAGGTAGAGTGTTATATAGTTTATTTTACGGGTAGAGCTCTAGTTATTTGATGGGAGT
+129222  GTAGCAGCTTTACTGAGAATAATTAAACTTCGAATAGTAAATTTTAAGTTATAATAAACA
+129282  GACTTTTGGCGATAAGGTCGAAGGTCAAGAGGGAAACAGCCCAGATTACATGATAAGGTC
+129342  TTAAAATAATTTTTTGAGTGAAAAAGGAAAATTTAGTACTTAAACAATTAAGAGGTAGGC
+129402  TTGGAAGCAGCCATTCTTTAAAGAAATCGTATTAGATCATTAGTTATTCTAGTTTAAATT
+129462  TTTCTAAAATGTATAGAGGCTAAAAAATTTACCGAAGCAGTAAATAAGAAATAATTTCTT
+129522  ATGGTAGCAGAACGTTCCGTAGTTTTTTGAAGGAAAATTGTGAAATTTTTTGCAGAAATC
+129582  GGAAGTGAGGATGCTGATATGAGTAACGAAAAATATAGTAATAATCTATATCGCTGTAAG
+129642  TTTAAGGTTTTCAAAGTATGGGTTAACTACTTTGAGTAACACAGTATCTAAGATAAAAAA
+129702  AGGGTGAAGACTTAAGTTGATGAAGAAAGAAGTTTATATTCTTCAGTAATTTTAGAAAAT
+129762  TAATAGTTATTGTGCGAATTTGGTTTAATTATCTTATCAAGTTTCTCATAAGCTATTCGA
+129822  GAAAAATTCTAAATATTAAAACTGTATTTAAACCGACACTGGTGAACTGGTACGATTATG
+129882  TACTAAAGCGATTGAAAGAATAGTATTGAAGGAACTCGGCAAAATTGTTCTGTGACTTCG
+129942  GTATAAAGAACACCAATCATATTTATATAGGTTTATATTTTGGTTGGTAGCAGAAATAGG
+130002  GGGTAGCGACTGTTTAATAAAAAGTATGATTTGTTATTATGATTCTGTTTACTAATGGTT
+130062  AATTAACATTTTTTTCTTAATACTGTAATAAAAAAATTATGTTAAATTTAGAATTTACAC
+130122  CATTAACTTGCGATGCAGGCATTTTATATAAGAATAATTTAAATATATGTATATTTTAGG
+130182  CGTCTAGAAGGCAGCGTATTTTATGAAAAAAATAGAAATATTAGGTTATATAAATTAAGA
+130242  TGAGAAAATAGCGTAATTATTCTTTTTAGTAATTACGTAAGAATTGTATAATTATTTTTA
+130302  CAATTTTTGTAAGGCGTAGAGAATAATTTTAACTACAAAATGAGATGCATTATTCATAAT
+130362  AAATAAAGTAGTTTTTTAATATATTGTAAGTAGTTGAACAAAGTTGTTTTAGAATTTTGT
+130422  TATATATATAGTTAAAAAATTAAAGAAAATATAAAAATACGATTAAATTTTTAATGTAAT
+130482  TTATATAATTAGTGTTGATTAAATACTCCCCTAGTATTATAATCTTTTTAGATATACAAT
+130542  AGGGAGTTATTAACATTT
+;; mfannot:     /group=II(derived)
+130560  gagctgtatataatgaaaattatatgtacagtttttatagggggaa
+;; mfannot:
+130606  AATTTGAAAAAATTTACCTATCTAAATCACAGGACTCTGCTAAATTGTAAAATGATGTAT
+130666  AGGGTCTGACACCTGCCCAGTGCTGTAAAGTTAAAAATTAGTTGTTTATGCTTCTAATTT
+130726  AATCTCCAGTAAACGGCGGCTGTAACTCTGACGGTCCGTGTGTTTCCGTAATTAAAATAT
+130786  AGTTTAATTAAAATTATATTGAATAAGAATTTATAGTGTGGATTTAGAAAAATTATTTTG
+130846  GTATATAAAATGCAAGAAAATTATTAATTTTGATCAAAGATGGTTTTATACTTAAGTAAT
+130906  TTATAAATTAGAAAAAAATAGATCATTGTTTGAAAATAAAGAACGACTCCAGTTAAGTTT
+130966  CGAATAACAGAGAGTTATACTTTAAAAAATTTATAAATATAGAAAATATAGGGTTTGAAA
+131026  GTTTTTTATTAAAAATTGTGATGTTTTTAAATTGGACTAATTTAAATATGTTTTATAAAG
+131086  ACAATTCGGAATAAAAGTCGAATCTATTTTTTGTCTAAACACTGTAAAAACGGAAATAAC
+131146  ATTTATATTTATTTATTTTTTAATATAAGTATATTATCAATTGAAAGGTAATAAAGATTA
+131206  AAATAGTAAGTATAAACGGAAAGATACTATAAAAATGCTTTTAATTTTTTAAACATGTTC
+131266  ATAGTATTAACTTAATAAAAATTTGAATGTTTTTAGAATGGTTAATAGAAGTTGTATGGT
+131326  AATAATTACCAGGTACAATTTTAATTAGCAAATTATTATTAATGATTTGACTATAATTAA
+131386  GGTAGCGAAATTCCTTGTCTAGTAATTTTAGACCTGCATGAATGGTGTAACGACTTCCCT
+131446  ACTGTCTCCAATACTATTTCAGTGAAATTAGAATATCCGTGAAGATACGGATTATTATAT
+131506  GATTAGACGGAAAGACCCTATGCACCTTTACTAGATTTTTATATTGTTACAAAGACTAAA
+131566  TTGTGTAGAATAGGTGGGATGTTTTTGATCTTTTTTAAAAAGGAAAACGTAAGTGAAATA
+131626  CCACTCGTTTTAGTTCTTTGAACTTACTTATTTTCAATAAGGATAGTGTATATTTGCTAG
+131686  TTTGGCTGGGGCGGCCGCTTCCTAAAGAGTAACGGAGGTGTACAAAGGTAAATTTGATTT
+131746  AATGTTTATTAAATTTTAAGTGTAATGGCAAAATTTGCTTGACTGCGAGACTAACAAGTC
+131806  AAGCAGGGACGTAAGTCGGTCATAATGATCCGGTAATTCTGCGTGGTAAGGTTATCGCTC
+131866  AACGGATAAAAGGTACGCTAGGGATAACAGGCTTATGACCCTCGAGAGTTCTTATCGGCG
+131926  GGGTCGTTTGGCACCTCGATGTCGAGTGTAATTCGCTAATTATCATATATAGGAAATAAT
+131986  AATATTATTTTTTATATTGATGTAATTTTTGTTATGTTTAATGTATTTATTTATTAAATT
+132046  AATTTTTGGTAAACTTTTAGATTCTATCAAATAATTTTTTCCAAGCAATACATTATAATT
+132106  TACTTAGAGTTGAGTTAGGTCTTATTTATGAAGAATATTTCGTTATGAGGTGTATATAAT
+132166  CCGAAAGGGTAGTATGAATTTTTTTATATACATAAACTGCTATTATATTGGCGTTAATAG
+132226  GTTTATAAATTATAATTGGATCGGAATAGAGTAAACAAAACTAAGTATTATAATAGCAAA
+132286  AGAGGTGAATAGACGTTGAATTATATGTTAAAATGTAATTCGCGAAAATGGATTCGATAA
+132346  TATATGTTTCTATTTATGGAAATAGAAGTTACTAGTAATATCGAAAGAAAATTGAAAATT
+132406  TTTTTGTTTTACGAAGCATAAAGTTTTGGATATTGATTTA
+;; mfannot:     /group=II(derived)
+132446  aagctatatagtaagaaattactacgtatagtttggcagtagcagta
+;; mfannot:
+132493  TGATGTTTATATGATATTGACT
+;;
+132515  ATAACCTTTTCACATCCTGGAGCTGAAGAAGGTTCCAAGGGTTCGGTTGTTCGCCGATTA
+132575  AAGTGGAACATGAGTTGGGTTTAGAACGTCGTGAGACAGTTTGGTCCCTATCTGTCATAT
+132635  ACGTTTTAAAACTGAAAAAATTTGTATCTAGTACGAGAGGATCGATATGAATTGGCCGCT
+132695  GGTAAATCAATTATTTTGATATAAAGTATCGTTGAGACGCTACGCCAATTATATATAACT
+132755  GCTGAAGGCATATCAAGCAGGAAGATGATTTTAAGAAGAGTTTTAATTAGTTGTTGAAAC
+132815  AGTTAGTTGGTTATAGATAATGACTTTGATAGGCTACTAGATGTACATAGTGTAAATTAT
+132875  TCAGTCTGGAGTACTAAATAACTAAT
+;;     rnl ==> ;; mfannot: 3' -20/+180
+132901  ATATAATTTATATATACAATTAT
+;     G-trnS(gcu) ==> start
+132924  GGAAAGGTGACTGAGGGGTTGAAGGTGATGGTTT!GCT!AAATCATTATATAAAGTTTTA
+132982  TATCGTGGGTTCGAATCCCATTCTTTCCA
+;     G-trnS(gcu) ==> end
+133011  ATTTAAAATATA
+;     G-trnL(uaa) ==> start
+133023  GCTTACTTGGTGGAATTGGTAGACACGATTGACT!TAA!AATCAATTCTTTAAGAGGTAT
+133081  CGGTTCAATTCCGATAGTAAGTA
+;     G-trnL(uaa) ==> end
+133104  AATTAATTTTAAAATATAAACAAAGGA
+;     G-trnS(uga) ==> start
+133131  GGGCGTATGGCTGAGTGGTTTAAAGCGTTAGTCT!TGA!ACACTAATATGTAAAATTTTT
+133189  ATATCGTGGGTTCGAATCCTGCTACGTCTA
+;     G-trnS(uga) ==> end
+133219  AGGGT

From b3fcd52f2039056907160baf85fd84ac9c5b96be Mon Sep 17 00:00:00 2001
From: Sai Nirmayi Yasa <92786623+sainirmayi@users.noreply.github.com>
Date: Fri, 8 Nov 2024 10:13:55 +0100
Subject: [PATCH 42/42] Fix multiple components (#162)

* output index when only_build_index is true

* fix threads option

* fix argument type and add log

* fix output handling

* fix output arguments and update docker image

* fix log2stderr argument and remove discard reads option

* remove echo

* update bbsplit build index test

* add workdir

* apply suggestions from code review

* accept more than two reference files

* update changelog

* update changelog

* remove indentation

* minor fixes

* fix descriptions

---------

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
---
 CHANGELOG.md                                  |  14 +++++++++
 src/{ => bbmap}/bbmap_bbsplit/config.vsh.yaml |  13 +++++---
 src/{ => bbmap}/bbmap_bbsplit/help.txt        |   0
 src/{ => bbmap}/bbmap_bbsplit/script.sh       |   8 ++---
 src/{ => bbmap}/bbmap_bbsplit/test.sh         |   2 +-
 src/kallisto/kallisto_index/Kallisto          | Bin 2439 -> 0 bytes
 src/kallisto/kallisto_index/script.sh         |   2 +-
 src/kallisto/kallisto_quant/config.vsh.yaml   |   8 ++++-
 src/kallisto/kallisto_quant/script.sh         |   4 +--
 src/rsem/rsem_calculate_expression/script.sh  |  25 ++++++---------
 src/sortmerna/config.vsh.yaml                 |  29 ++++++++----------
 src/sortmerna/script.sh                       |  15 +++------
 src/sortmerna/test.sh                         |   2 +-
 .../umi_tools_extract/config.vsh.yaml         |   7 -----
 src/umi_tools/umi_tools_extract/script.sh     |   9 ------
 15 files changed, 65 insertions(+), 73 deletions(-)
 rename src/{ => bbmap}/bbmap_bbsplit/config.vsh.yaml (94%)
 rename src/{ => bbmap}/bbmap_bbsplit/help.txt (100%)
 rename src/{ => bbmap}/bbmap_bbsplit/script.sh (92%)
 rename src/{ => bbmap}/bbmap_bbsplit/test.sh (99%)
 delete mode 100644 src/kallisto/kallisto_index/Kallisto

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 35aa33b5..1e7481ac 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -30,6 +30,18 @@
 
 * `cutadapt`: Fix the the non-functional `action` parameter (PR #161).
 
+* `bbmap_bbsplit`: Change argument type of `build` to `file` and add output argument `index` (PR #162).
+
+* `kallisto/kallisto_index`: Fix command script to use `--threads` option (PR #162).
+
+* `kallisto/kallisto_quant`: Change type of argument `output_dir` to `file` and add output argument `log` (PR #162).
+
+* `rsem/rsem_calculate_expression`: Fix output handling (PR #162).
+
+* `sortmerna`: Change type pf argument `aligned` to `file`; update docker image; accept more than two reference files (PR #162).
+
+* `umi_tools/umi_tools_extract`: Remove `umi_discard_reads` option and change `log2stderr` to input argument (PR #162).
+
 ## MINOR CHANGES
 
 * `agat_convert_bed2gff`: change type of argument `inflate_off` from `boolean_false` to `boolean_true` (PR #160).
@@ -38,6 +50,8 @@
 
 * Upgrade to Viash 0.9.0.
 
+* `bbmap_bbsplit`: Move to namespace `bbmap` (PR #162).
+
 # biobox 0.2.0
 
 ## BREAKING CHANGES
diff --git a/src/bbmap_bbsplit/config.vsh.yaml b/src/bbmap/bbmap_bbsplit/config.vsh.yaml
similarity index 94%
rename from src/bbmap_bbsplit/config.vsh.yaml
rename to src/bbmap/bbmap_bbsplit/config.vsh.yaml
index 61336b35..da2f643a 100644
--- a/src/bbmap_bbsplit/config.vsh.yaml
+++ b/src/bbmap/bbmap_bbsplit/config.vsh.yaml
@@ -30,12 +30,9 @@ argument_groups:
     type: boolean_true
     description: If set, only builds the index. Otherwise, mapping is performed.
   - name: "--build"
-    type: string
+    type: file
     description: |
-      Designate index to use. Corresponds to the number specified when building the index.
-      If building the index, this will be the build's id. If multiple references are indexed
-      in the same directory, each needs a unique build ID. Default: 1.
-    example: "1"
+      Index to be used for mapping. 
   - name: "--qin"
     type: string
     description: |
@@ -95,6 +92,12 @@ argument_groups:
 
 - name: "Output"
   arguments:
+  - name: "--index"
+    type: file
+    description: |
+      Location to write the index.
+    direction: output
+    example: BBSplit_index
   - name: "--fastq_1"
     type: file
     description: |
diff --git a/src/bbmap_bbsplit/help.txt b/src/bbmap/bbmap_bbsplit/help.txt
similarity index 100%
rename from src/bbmap_bbsplit/help.txt
rename to src/bbmap/bbmap_bbsplit/help.txt
diff --git a/src/bbmap_bbsplit/script.sh b/src/bbmap/bbmap_bbsplit/script.sh
similarity index 92%
rename from src/bbmap_bbsplit/script.sh
rename to src/bbmap/bbmap_bbsplit/script.sh
index ac8542c9..098c7b55 100755
--- a/src/bbmap_bbsplit/script.sh
+++ b/src/bbmap/bbmap_bbsplit/script.sh
@@ -30,17 +30,17 @@ if [ ! -d "$par_build" ]; then
 fi
 
 if $par_only_build_index; then
-    if [ ${#refs[@]} -gt 1 ]; then
+    if [ "${#refs[@]}" -gt 1 ]; then
         bbsplit.sh \
             --ref_primary="$primary_ref" \
             "${refs[@]}" \
-            path=$par_build
+            path=$par_index
     else
         echo "ERROR: Please specify at least two reference fasta files."
     fi
 else
     IFS=";" read -ra input <<< "$par_input"
-    tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
+    tmpdir=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXXXX")
     index_files=''
     if [ -d "$par_build" ]; then
         index_files="path=$par_build"
@@ -51,7 +51,7 @@ else
     fi
 
     extra_args=""
-    if [ -n "$par_refstats" ]; then extra_args+=" --refstats $par_refstats"; fi
+    if [ -f "$par_refstats" ]; then extra_args+=" --refstats $par_refstats"; fi
     if [ -n "$par_ambiguous" ]; then extra_args+=" --ambiguous $par_ambiguous"; fi
     if [ -n "$par_ambiguous2" ]; then extra_args+=" --ambiguous2 $par_ambiguous2"; fi
     if [ -n "$par_minratio" ]; then extra_args+=" --minratio $par_minratio"; fi
diff --git a/src/bbmap_bbsplit/test.sh b/src/bbmap/bbmap_bbsplit/test.sh
similarity index 99%
rename from src/bbmap_bbsplit/test.sh
rename to src/bbmap/bbmap_bbsplit/test.sh
index 1ad7aac2..e0fe00d8 100644
--- a/src/bbmap_bbsplit/test.sh
+++ b/src/bbmap/bbmap_bbsplit/test.sh
@@ -55,7 +55,7 @@ echo ">>> Building BBSplit index"
 "${meta_executable}" \
   --ref "genome.fasta;human.fa;sarscov2.fa" \
   --only_build_index \
-  --build "BBSplit_index" 
+  --index "BBSplit_index" 
 
 echo ">>> Check whether output exists"
 [ ! -d "BBSplit_index" ] && echo "BBSplit index does not exist!" && exit 1
diff --git a/src/kallisto/kallisto_index/Kallisto b/src/kallisto/kallisto_index/Kallisto
deleted file mode 100644
index 3c7b5b2bff962965d99ca3f9a4a6b6af6da1f3f0..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 2439
zcmeHJTTBx{6rFCj(iW7(R4PH@f{iwcTCEBOksY_VO)N&jMkQb@MkU0z5<nES+iDtq
ztcrfC@k>b%pZJItq>6y3i5QJ<W1@)>C02jb;7mKSNlWm{Po|xmnK|d)x%cj5cE^Hf
zo5Ms=gP>q-=Dx`Y&8Tam%b*(*sAYc)aw*sH`%Qmb#irI!wJd#g^%N{jJ942Y*B;7v
zR8myiWp1c$UHLL&=A`AC^o$*g(yPKxE9&k2TL(+#tPc&Aci!%O@N%%KUe}u^mgT!#
zi%Y`WO!oXjbAkQw#f0xZP+kZo&F8eXH{r?boldZcY~7r&DBNCpW#D;g<%>%V>88S*
zn(fP&_poKg&FeY4ul8ovgjcuEYhCLuEry9t<T_<r;)x-6Y*vFQbgQZF#X(b>sbgo;
zfcbNEMVB@;wf0@jF3-`vqIs3Vln>G5z~*-Q*|O7R4?+L_8<&^;{;|Iv=yw^0x?1Fq
zKKH{t-pk+KZ7B%++|bB;H`mVIyPQk@FmSG}@6usnuL2HvP5$1gzh1Rfd<xy4PRpnY
z$lwb>lA=oM8O*p9pU%UxMoXtZa4MiI;e6lN9)Zj9SsczoL4F-nxW<!$M?XHx=`41$
zNSX;JB>7gjS;lL#ITXLZTe7Tm{6y<^@7*BFcmyOu`fe-N>GMM_MNm8n3s15~-A-du
zs>S(M$mE>7gVQsBMnsW@M&}f>b(D#qkcNO}MG=t0Wgt?+7*11VW2Q|{Q<m5`y=8^|
z9+;~?zuCji5Og|U(_<Ox<qoL${owUfYj%<xv>bnyveDt-4O(8?_>}HQ+|nsHK}JD>
zEI}_&#snCR!wU`=ImWmY2!IuUp@YzBuGIbjA?Q;}NCbS6Ej{38{Q(Ed(9~7CHlh~@
z(zu?LfHTHk>o~Hk>ib8~11>`F@%qmr>8X$)4UE=ZAnP=qIJp|ns6M_j(fMdSqjeZP
zKmX@E#Gf*HzUVyzWeLinBtr;M7o$H(mPA>GqA1d9hM{`wuLpH{uWG16io*!3&Oq!~
zY>JwOK3Z%+$HT0KEnpYTsK*f41@5@T5O@KlCdndB3}+_eA<9%le@sX@NTS*o#e2qq
z{mZk6y~(I%5^AViqJ%~m(IWP&+TTT!n9y(~NA!$eA9;u!LWn;?@K-_t>bR9cmu<nq
hRn%Ezn!9QyjNx;|7(NH_$x(BVXNzpCjqI)ke*rf>5}*J8

diff --git a/src/kallisto/kallisto_index/script.sh b/src/kallisto/kallisto_index/script.sh
index 59a5d3de..d1ec98dd 100644
--- a/src/kallisto/kallisto_index/script.sh
+++ b/src/kallisto/kallisto_index/script.sh
@@ -28,7 +28,7 @@ kallisto index \
     ${par_min_size:+--min-size "${par_min_size}"} \
     ${par_ec_max_size:+--ec-max-size "${par_ec_max_size}"} \
     ${par_d_list:+--d-list "${par_d_list}"} \
-    ${meta_cpus:+--cpu "${meta_cpus}"} \
+    ${meta_cpus:+--threads "${meta_cpus}"} \
     ${par_tmp:+--tmp "${par_tmp}"} \
     "${par_input}"
 
diff --git a/src/kallisto/kallisto_quant/config.vsh.yaml b/src/kallisto/kallisto_quant/config.vsh.yaml
index e92ac6b3..c162faf2 100644
--- a/src/kallisto/kallisto_quant/config.vsh.yaml
+++ b/src/kallisto/kallisto_quant/config.vsh.yaml
@@ -32,9 +32,15 @@ argument_groups:
   arguments:
   - name: "--output_dir"
     alternatives: ["-o"]
-    type: string
+    type: file
     description: Directory to write output to.
     required: true
+    direction: output
+  - name: "--log"
+    type: file
+    description: File containing log information from running kallisto quant
+    direction: output
+
 
 - name: "Options"
   arguments:
diff --git a/src/kallisto/kallisto_quant/script.sh b/src/kallisto/kallisto_quant/script.sh
index a7105cd1..ad3b54e2 100644
--- a/src/kallisto/kallisto_quant/script.sh
+++ b/src/kallisto/kallisto_quant/script.sh
@@ -41,6 +41,4 @@ kallisto quant \
     ${par_sd:+--sd "${par_sd}"} \
     ${par_seed:+--seed "${par_seed}"} \
     -o $par_output_dir \
-    ${input[*]}
-
-
+    ${input[*]} 2> >(tee -a $par_log >&2)
diff --git a/src/rsem/rsem_calculate_expression/script.sh b/src/rsem/rsem_calculate_expression/script.sh
index e8c6ce5d..b30b2f37 100644
--- a/src/rsem/rsem_calculate_expression/script.sh
+++ b/src/rsem/rsem_calculate_expression/script.sh
@@ -5,13 +5,6 @@
 
 set -eo pipefail
 
-function clean_up {
-    rm -rf "$tmpdir"
-}
-trap clean_up EXIT
-
-tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
-
 if [ "$par_strandedness" == 'forward' ]; then
     strandedness='--strandedness forward'
 elif [ "$par_strandedness" == 'reverse' ]; then
@@ -22,14 +15,14 @@ fi
 
 IFS=";" read -ra input <<< $par_input
 
-INDEX=$(find -L $meta_resources_dir/$par_index -name "*.grp" | sed 's/\.grp$//')
+INDEX=$(find -L $par_index -name "*.grp" | sed 's/\.grp$//')
 
 unset_if_false=( par_paired par_quiet par_no_bam_output par_sampling_for_bam par_no_qualities 
                  par_alignments par_bowtie2 par_star par_hisat2_hca par_append_names 
                  par_single_cell_prior par_calc_pme par_calc_ci par_phred64_quals 
                  par_solexa_quals par_star_gzipped_read_file par_star_bzipped_read_file 
                  par_star_output_genome_bam par_estimate_rspd par_keep_intermediate_files 
-                 par_time par_run_pRSEM par_cap_stacked_chipseq_reads par_sort_bam_by_read_name )
+                 par_time par_run_pRSEM par_cap_stacked_chipseq_reads par_sort_bam_by_read_name par_sort_bam_by_coordinate )
 
 for par in ${unset_if_false[@]}; do
     test_val="${!par}"
@@ -60,12 +53,7 @@ rsem-calculate-expression \
     ${par_run_pRSEM:+--run-pRSEM} \
     ${par_cap_stacked_chipseq_reads:+--cap-stacked-chipseq-reads} \
     ${par_sort_bam_by_read_name:+--sort-bam-by-read-name} \
-    ${par_counts_gene:+--counts-gene "$par_counts_gene"} \
-    ${par_counts_transcripts:+--counts-transcripts "$par_counts_transcripts"} \
-    ${par_stat:+--stat "$par_stat"} \
-    ${par_bam_star:+--bam-star "$par_bam_star"} \
-    ${par_bam_genome:+--bam-genome "$par_bam_genome"} \
-    ${par_bam_transcript:+--bam-transcript "$par_bam_transcript"} \
+    ${par_sort_bam_by_coordinate:+--sort-bam-by-coordinate} \
     ${par_fai:+--fai "$par_fai"} \
     ${par_seed:+--seed "$par_seed"} \
     ${par_seed_length:+--seed-length "$par_seed_length"} \
@@ -101,3 +89,10 @@ rsem-calculate-expression \
     $INDEX \
     $par_id
    
+[[ -f "${par_id}.genes.results" ]] && mv "${par_id}.genes.results" $par_counts_gene
+[[ -f "${par_id}.isoforms.results" ]] && mv "${par_id}.isoforms.results" $par_counts_transcripts
+[[ -d "${par_id}.stat" ]] && mv "${par_id}.stat" $par_stat
+[[ -f "${par_id}.log" ]] && mv "${par_id}.log" $par_logs
+[[ -f "${par_id}.STAR.genome.bam" ]] && mv "${par_id}.STAR.genome.bam" $par_bam_star
+[[ -f "${par_id}.genome.bam" ]] && mv "${par_id}.genome.bam" $par_bam_genome
+[[ -f "${par_id}.transcript.bam" ]] && mv "${par_id}.transcript.bam" $par_bam_transcript
diff --git a/src/sortmerna/config.vsh.yaml b/src/sortmerna/config.vsh.yaml
index 6477660f..a2d1c530 100644
--- a/src/sortmerna/config.vsh.yaml
+++ b/src/sortmerna/config.vsh.yaml
@@ -42,15 +42,17 @@ argument_groups:
     description: Sortmerna log file.
   - name: "--output"
     alternatives: ["--aligned"]
-    type: string
+    type: file
     description: |
       Directory and file prefix for aligned output. The appropriate extension: 
       (fasta|fastq|blast|sam|etc) is automatically added.
       If 'dir' is not specified, the output is created in the WORKDIR/out/.
       If 'pfx' is not specified, the prefix 'aligned' is used.
+    direction: output
   - name: "--other"
-    type: string
-    description: Create Non-aligned reads output file with this path/prefix. Must be used with fastx. 
+    type: file
+    description: Create Non-aligned reads output file with this path/prefix. Must be used with fastx.
+    direction: output
 
 - name: "Options"
   arguments:
@@ -91,7 +93,7 @@ argument_groups:
     type: integer
     description: |
       search all alignments having the first INT longest LIS. LIS stands for Longest Increasing Subsequence, it is
-      computed using seeds’ positions to expand hits into longer matches prior to Smith-Waterman alignment. Default: '2'.
+      computed using seeds' positions to expand hits into longer matches prior to Smith-Waterman alignment. Default: '2'.
     example: 2
   - name: "--print_all_reads"
     type: boolean_true
@@ -152,7 +154,7 @@ argument_groups:
   - name: "--N"
     type: integer
     description: |
-      Smith-Waterman penalty for ambiguous letters (N’s) scored as --mismatch. Default: '-1'.\
+      Smith-Waterman penalty for ambiguous letters (N's) scored as --mismatch. Default: '-1'.
     example: -1
   - name: "--a"
     type: integer
@@ -207,7 +209,7 @@ argument_groups:
     - name: "--otu_map"
       type: boolean_true
       description: |
-        Output OTU map (input to QIIME’s make_otu_table.py).
+        Output OTU map (input to QIIME's make_otu_table.py).
 
 - name: "Advanced options"
   arguments:
@@ -226,7 +228,7 @@ argument_groups:
     description: |
       The number (or percentage if followed by %) of nucleotides to add to each edge of the alignment region on the
       reference sequence before performing Smith-Waterman alignment. Default: '4'.
-    example: 4
+    example: "4"
   - name: "--full_search"
     type: boolean_true
     description: |
@@ -263,8 +265,6 @@ argument_groups:
       Maximum number of positions to store for each unique L-mer. Set to 0 to store all positions. Default: '1000'
     example: 1000
   
-  
-
 resources:
   - type: bash_script
     path: script.sh
@@ -276,15 +276,12 @@ test_resources:
   
 engines:
 - type: docker
-  image: ubuntu:22.04
+  image: quay.io/biocontainers/sortmerna:4.3.6--h9ee0642_0
   setup: 
     - type: docker
       run: |
-        apt-get update && \
-        apt-get install -y --no-install-recommends gzip cmake g++ wget && \
-        apt-get clean && \
-        wget --no-check-certificate https://github.com/sortmerna/sortmerna/releases/download/v4.3.6/sortmerna-4.3.6-Linux.sh && \
-        bash sortmerna-4.3.6-Linux.sh --skip-license
+        echo SortMeRNA: `sortmerna --version | sed -n 's/.*version \([0-9]\+\.[0-9]\+\.[0-9]\+\).*/\1/p'`
+
 runners: 
 - type: executable
-- type: nextflow 
\ No newline at end of file
+- type: nextflow
diff --git a/src/sortmerna/script.sh b/src/sortmerna/script.sh
index 8dda3d60..59fc56f1 100755
--- a/src/sortmerna/script.sh
+++ b/src/sortmerna/script.sh
@@ -37,16 +37,11 @@ if [[ ! -z "$par_ribo_database_manifest" ]]; then
 
 elif [[ ! -z "$par_ref" ]]; then
     IFS=";" read -ra ref <<< "$par_ref"
-    # check if length is 2 and par_paired is set to true
-    if [[ "${#ref[@]}" -eq 2 && "$par_paired" == "true" ]]; then
-        refs="--ref ${ref[0]} --ref ${ref[1]}"
-    # check if length is 1 and par_paired is set to false
-    elif [[ "${#ref[@]}" -eq 1 && "$par_paired" == "false" ]]; then
-            refs="--ref $par_ref"      
-    else # if one reference provided but paired is set to true:
-        echo "Two reference fasta files are required for paired-end reads"
-            exit 1
-    fi
+    for i in "${ref[@]}"
+    do
+        refs+="-ref $i "
+    done
+
 else 
     echo "No reference fasta file(s) provided"
     exit 1
diff --git a/src/sortmerna/test.sh b/src/sortmerna/test.sh
index 390b9307..4c5b3e4e 100644
--- a/src/sortmerna/test.sh
+++ b/src/sortmerna/test.sh
@@ -31,7 +31,7 @@ rm -f rRNA_reads_fwd.fq.gz rRNA_reads_rev.fq.gz non_rRNA_reads_fwd.fq.gz non_rRN
 rm -rf kvdb/
 
 ################################################################################
-echo ">>> Testing for paired-end reads and --ref and --paired_out argumens"
+echo ">>> Testing for paired-end reads and --ref and --paired_out arguments"
 "$meta_executable" \
     --output "rRNA_reads" \
     --other "non_rRNA_reads" \
diff --git a/src/umi_tools/umi_tools_extract/config.vsh.yaml b/src/umi_tools/umi_tools_extract/config.vsh.yaml
index b93c8cb9..4b2b5370 100644
--- a/src/umi_tools/umi_tools_extract/config.vsh.yaml
+++ b/src/umi_tools/umi_tools_extract/config.vsh.yaml
@@ -128,12 +128,6 @@ argument_groups:
         Method to use to determine read groups by subsuming those with similar UMIs. All methods start by identifying
         the reads with the same mapping position, but treat similar yet nonidentical UMIs differently. Default: `directional`
       example: "directional"
-    - name: --umi_discard_read
-      type: integer
-      choices: [0, 1, 2]
-      description: |
-        After UMI barcode extraction discard either R1 or R2 by setting this parameter to 1 or 2, respectively. Default: `0`
-      example: 0
 
   - name: Common Options
     arguments:
@@ -144,7 +138,6 @@ argument_groups:
     - name: --log2stderr
       type: boolean_true
       description: Send logging information to stderr.
-      direction: output
     - name: --verbose
       type: integer
       description: Log level. The higher, the more output.
diff --git a/src/umi_tools/umi_tools_extract/script.sh b/src/umi_tools/umi_tools_extract/script.sh
index 4514860e..b9395733 100644
--- a/src/umi_tools/umi_tools_extract/script.sh
+++ b/src/umi_tools/umi_tools_extract/script.sh
@@ -82,12 +82,3 @@ umi_tools extract \
     ${par_log2stderr:+--log2stderr} \
     ${par_verbose:+--verbose "$par_verbose"} \
     ${par_error:+--error "$par_error"}
-
-
-if [ "$par_umi_discard_read" == 1 ]; then
-    # discard read 1
-    rm "$par_read1_out"
-elif [ "$par_umi_discard_read" == 2 ]; then
-    # discard read 2 (-f to bypass file existence check)
-    rm -f "$par_read2_out"
-fi
\ No newline at end of file