Improved logging messages.
Improved progress reporting.
-
Added option
-r / --target-seq-ids
. One can use it to avoid wasting time annotating all sequences in a target fasta file if some of them are unwanted. Or to annotate sequences in a multi-fasta file separately (e.g. with different thresholds) without splitting the fasta file. -
Now,
con-hi.py
runssamtools depth
for each target sequence separately thus avoiding creation of large TSV files of coverage values. -
Now,
con-hi.py
runssamtools depth
with option-a
instead of-aa
. Indeed, this option means “Output absolutely all positions, including unused ref seqs”, which isn’t helpful.
Rename “consensus-highlighter” to “con-hi” for the sake of simplicity.
Now, when outputting sequence records, the program will just print emerging generic Biopython warning in a human-friendly way, without extra technical lines.
In other words, instead of this:
BiopythonWarning: Increasing length of locus line to allow long name. This will result in fields that are not in usual positions.
warnings.warn(
the program will print this warning message:
! Warning: Increasing length of locus line to allow long name. This will result in fields that are not in usual positions.
What is more important, sole catching Biopython warnings will not now cause empty output files.
-
Fix a bug that would cause the program to terminate if
-C
threshold is enabled and a high-coverage region starts at position 0 or ends at position (LENGTH-1) of the reference sequence. -
Now, the program prints its own warning if length of an output sequence name is too long for "pretty" GenBank representation, according to GenBank standard (Dec 15, 2018) 229.0. In previous versions of the program, BioPython warning used to be emitted, which is not very informative: "Increasing length of locus line to allow long name. This will result in fields that are not in usual positions.".
Fix a bug preventing con-hi
from parsing samtools version
output correctly if samtools
is compiled with a flag -ffile-prefix-map
. In that case, the samtools version
output contains some non-utf8 characters.
-
Add option
-l/min-feature-len
. It sets minimum length of an output feature. -
Add option
-C/upper-coverage-coefficients
. It sets threshold(s) for annotating high-coverage regions. For example, to annotate regions with coverage > 1.7×M, where M is median coverage, you should specify-C 1.7
. You can specify multiple coefficients:-C 1.5,1.7
, in the same way as for option-c
. -
Change long option name:
-c/coverage-thresholds
->-c/lower-coverage-thresholds
. -
Options
-c
and-C
can be disabled now: specify-c off
,-C off
, and low-coverage or high-coverage regions won't be annotated, respectively. -
Add option
-k/--keep-temp-cov-file
. If it is specified, temporary filecoverages.tsv
won't be deleted after work of the program.
-
Add recommendation "samtools
1.13
or later is recommended". This is the version, in whichsamtools depth
had beed completely rewritten. Since 1.13,samtools depth
calculates coverage more accurately. -
Now con-hi does not crash if samtools version is of the following format:
1.15.1
(three dot-separated numbers). Previously, only two dot-separated numbers were permitted.
Changes:
- Now con-hi removes its temporary file
coverages.tsv
, where coverage value of each base is stored. - A bug was fixed that would cause the program to write error message about an unmet dependency to stout instead of stderr.
Now con-hi adds a comment to output GenBank files. Here is the example of such a comment:
COMMENT ##Coverage-Data-START##
Minimum Coverage :: 0
Average Coverage :: 147.54
Median Coverage :: 70
Maximum Coverage :: 2009
Zero-coverage bases :: 278 bp
##Coverage-Data-END##
Now con-hi is compatible with samtools 1.13+.
Now con-hi calculates and prints average coverage.
Removed options -o/--outdir
and --prefix
.
Added option -o/--outfile
. And now con-hi.py writes all GenBank output records to this single output GenBank file.
con-hi
no more piles up coverage features with identical locations. It means that you will not see both "zero coverage" and "coverage < 10" features starting at the same positions and ending at the same positions.
Added warning messages for following cases:
- If the program cannot find ids of sequence(s) from
-f
fasta file in the coverage file. - If length of sequence in
-f
fasta file is not equal to number of coverage positions reported bysamtools depth
and stored in the coverage file.
Fixed bug that would cause the program to stumble on a non-extant directory in PATH while checking dependencies.
And then...
Fixed bug that cause the program to stumble on lowercase input sequences.
Init release. Version 1.0.a