Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to overwrite reads with nextseq low quality #82

Open
jessrosenfield opened this issue Jun 28, 2019 · 1 comment
Open

Add option to overwrite reads with nextseq low quality #82

jessrosenfield opened this issue Jun 28, 2019 · 1 comment
Milestone

Comments

@jessrosenfield
Copy link

jessrosenfield commented Jun 28, 2019

One consequence of the recommended op-orders CGQAW and GAWCQ is that garbage reads may end up being trimmed to lengths of 0 or shorter than the provided window size in the --overwrite-low-quality option. Two example failure modes of completely unusable reads sequenced with Illumina NextSeq and NovaSeq come to mind in paired end sequencing for an unusable read 2:

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
################################

and

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

The function for overwriting reads of low quality only does so if the read is of length longer than the required window to measure quality scores. One way to remedy this is to change the --op-order flag to run the "overwrite poor quality reads" first. However, this only will overwrite reads with the former case, and not the latter. Two ways to address this are:

  1. Adding option to overwrite reads with discrepancies in length, to allow quality trimming to occur before read overwriting while gracefully handling the case where the low quality read is shorter than the required window for quality measurement while the high quality read is longer (in the cases I outlined below, checking for read pairs in which a single read was trimmed entirely is sufficient).
  2. Allowing the --overwrite-reads option to treat Gs as low quality. this may not work well because the --overwrite-reads option looks at the beginning of the read, which can have Gs with high PHRED scores that are from the DNA template and not artifacts of sequencing.
@jdidion
Copy link
Owner

jdidion commented Jun 29, 2019

Thanks for this report. On first glance, I like option 1 but I'll need to consider it a bit.

In the meantime, a workaround is to write the orphaned reads to a separate file and align them separately.

@jdidion jdidion added this to the 1.2 milestone Jun 29, 2019
@jdidion jdidion modified the milestones: 1.2, 2.0, 2.1 Dec 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants