Skip to content

Commit

Permalink
Fix wf failing when not a single input read re-aligns against draft c…
Browse files Browse the repository at this point in the history
…onsensus [CW-3078]
  • Loading branch information
julibeg committed Nov 17, 2023
1 parent 7c151b5 commit 2ce4cbe
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 6 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.6.2]
### Fixed
- The de-novo QC stage failing when not a single input read re-aligns against the draft consensus.

## [v0.6.1]
### Changed
- More informative warnings for failed samples in the report intro section.
Expand Down
24 changes: 19 additions & 5 deletions bin/workflow_glue/trim_and_qc.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,24 @@ def main(args):
args.depth, sep="\t", header=None, names=["ref", "start", "end", "depth"]
)

# calculate the mean depths of the candidate consensus contigs
mean_depths = depths.groupby("ref").apply(
lambda df: df.eval("(end - start) * depth").sum() / df["end"].iloc[-1]
)
with pysam.FastxFile(args.consensus) as f:
fastq = {entry.name: entry for entry in f}

# make sure we got the same seq IDs in flagstat and the FASTQ file
seq_ids = [x for x in flagstat.index if x != "*"]
if set(fastq.keys()) != set(seq_ids):
ValueError("Sequence IDs in FASTQ and in flagstat file don't agree.")

# if all reads in the BAM are unmapped, `mosdepth` annoyingly outputs an empty
# per-base depth file; we need to account for that
if depths.empty:
mean_depths = pd.Series(0, index=seq_ids)
else:
# we got a valid depth file; calculate the mean depths of the candidate
# consensus contigs
mean_depths = depths.groupby("ref").apply(
lambda df: df.eval("(end - start) * depth").sum() / df["end"].iloc[-1]
)

# perform QC checks on each contig
qc_stats = pd.DataFrame(
Expand All @@ -42,7 +56,7 @@ def main(args):
qc_stats.index.name = "contig"
for contig, mean_depth in mean_depths.items():
status = "passed"
contig_len = depths.query("ref == @contig")["end"].iloc[-1]
contig_len = len(fastq[contig].sequence)
fail_reasons = []
if mean_depth < args.minimum_depth:
status = "failed"
Expand Down
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ manifest {
description = 'Amplicon workflow'
mainScript = 'main.nf'
nextflowVersion = '>=23.04.2'
version = 'v0.6.1'
version = 'v0.6.2'
}

epi2melabs {
Expand Down

0 comments on commit 2ce4cbe

Please sign in to comment.