A trimmer of reads produced by NGS dedicated for common applications like genomic, transcriptomic, targeted metagenomic and shotgun metagenomic.
conda install -c bioconda hmntrimmer
# From docker hub
docker pull hmntrimmer:<VERSION>
# From github
docker pull ghcr.io/guillaume-gricourt/hmntrimmer:<VERSION>
Prerequisites
Use software with debian systems :
yasm
build-essential
zlib1g-dev
GCC used for compilation must be > 4 and < 9.
Test software
python3
Create statistic report
With conda :
python3 django matplotlib seaborn packaging
With ubuntu/debian using pip :python3-pip
django matplotlib seaborn packaging
Compile
Install first igzip
hmndir=./HmnTrimmer
cd ./lib/igzip-042/igzip && make slib0c
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD
Then
make
make test
Software is available by :
HmnTrimmer [OPTIONS] [TRIMMERS]
Minimal example :
./HmnTrimmer \
--input-fastq-forward INPUT_FILE \
--output-fastq-forward OUTPUT_FILE \
--length-min 50
Files are indicated with these differents commands :
--input-fastq-forward INPUT_FILE
--input-fastq-reverse INPUT_FILE
--input-fastq-interleaved INPUT_FILE
--output-fastq-forward OUTPUT_FILE
--output-fastq-reverse OUTPUT_FILE
--output-fastq-interleaved OUTPUT_FILE
Discarded sequences are optionnaly output with this command. If sequencing is paired, file produced is interleaved.
--output-fastq-discard OUTPUT_FILE
Several categories : quality, length and information.
Firstly trimmers based on information are applied, then based on quality finaly based on length.
Quality Tail
Based on a successive number of bases from end of read which are below a cut off.
Two parameters : quality, optionaly the number of bases below the quality firstly indicated (default 1 base) and the length percent cut off request to keep read if it was truncated (default not removed).
Format : <int>:<int>:<int>
--quality-tail STRING
Quality Sliding Window
Based on a sliding window of bases from end of read which are below a minimal mean.
Two parameters : mean quality and size of window.
Format : <int>:<int>
--quality-sliding-window STRING
Length Min
Minimal length to keep a read.
--length-min INTEGER
Information Dust
Based on Dust score.
--information-dust INTEGER
Report
Optionaly save a report, with differents statistics. Format Json.
--output-report OUTPUT_FILE
Threads
Specify number of threads to use.
--threads 1..8
Reads batch
Reads are read in batch. Defined size of batch.
--reads-batch 100..50000000
Verbose
Log level to use.
--verbose 1..6 (error..trace)
To create HTML report :
# Clone the repository
git clone [email protected]:guillaume-gricourt/HmnTrimmer.git
# Run
HmnTrimmerReport \
--template-file ./HmnTrimmer/script/template.html \
--input-file JSON_FILE \
--output-file HTML_FILE
Trimming
docker run \
-it \
--rm \
-v $PWD:$PWD \
hmntrimmer:<VERSION> \
--input-fastq-forward $PWD/test/GoldInput/BIG.R1.fastq \
--output-fastq-forward $PWD/test/DockerTest.R1.fastq.gz \
--output-report $PWD/test/DockerTest.json \
--length-min 50
Statistic report with docker
docker run \
-it \
--rm \
-v $PWD:$PWD \
--entrypoint /opt/HmnTrimmer/script/RenderingReportFile.py \
hmntrimmer:<VERSION> \
--input-file $PWD/test/DockerTest.json \
--output-file $PWD/test/DockerTest.html \
--template-file /opt/HmnTrimmer/script/template.html
- SeqAn - Essential library to work with HTS files, algorithms
- rapidjson - Read/Write Json files efficiently
- spdlog - Nice log manager
- igzip - Very fast deflate algorithm
SemVer is used for versioning.
- Guillaume Gricourt