Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aDNA-trim produces invalid merged FASTQ files in some cases #3

Open
shyama-mama opened this issue Mar 28, 2024 · 3 comments
Open

aDNA-trim produces invalid merged FASTQ files in some cases #3

shyama-mama opened this issue Mar 28, 2024 · 3 comments

Comments

@shyama-mama
Copy link

Hi Guys,

I am using aDNA trim as follows:
seqtk mergepe R1.fastq R2.fastq | adna-trim -p aDNA_trim_pe - > aDNA_trim_merged.fastq
The data is a NovaSeq sample pre-processed with FasP to trim polyG tails.

This is the original read
R1

@A00488:28:HJ3THDSXX:2:1146:24189:25441 1:N:0:ACCAACT
TCCAGAGTTATTGCTGTGATACAGGCAGAGATGCTATAACTGAGTTTGTATTCTAGGGGGGGGGGGCCGATGTTAACGGGGAAAAGATAAAAATTTAACTTAATTGATACAGTGATATTAAATACGGACGAGCACACGACTAAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:,,,,:F,F,:,FF,:FF,,:,F,,FF:,,F,,FF,,:F,:,,,,F,,::,,,F:,,:FF,F,,F,F,F,F,F,::F,

R2

@A00488:28:HJ3THDSXX:2:1146:24189:25441 2:N:0:ACCAACT
CCTGTATCACTAAAGTTACATTATTATCTTTTCCCTGTTAACGTCGGGGGGGGCGGGGGGGGTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
FFFFFFFFFFFFFFFF,,FFFFFFF,FFFFFFFFF,F,,,,:,,,,F,F,:,,,,,,::FFF,::,,,:FF::,:F:FFF:,:FF:FF,FFF:,F:F,F,,::,,,FFF,FF,::,:,:,F:,FFFF,FF,,:F:FFF:F:FF:

Using aDNA trim on the fastq directly does not merge the reads. So I ran FastP to trim the PolyG tail. This is the modified fastq
R1

@A00488:28:HJ3THDSXX:2:1146:24189:25441 1:N:0:ACCAACT
TCCAGAGTTATTGCTGTGATACAGGCAGAGATGCTATAACTGAGTTTGTATTCTAGGGGGGGGGGGCCGATGTTAACGGGGAAAAGATAAAAATTTAACTTAATTGATACAGTGATATTAAATACGGACGAGCACACGACTAAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:,,,,:F,F,:,FF,:FF,,:,F,,FF:,,F,,FF,,:F,:,,,,F,,::,,,F:,,:FF,F,,F,F,F,F,F,::F,

R2

@A00488:28:HJ3THDSXX:2:1146:24189:25441 2:N:0:ACCAACT
CCTGTATCACTAAAGTTACATTATTATCTTTTCCCTGTTAAC
+
FFFFFFFFFFFFFFFF,,FFFFFFF,FFFFFFFFF,F,,,,:

Using aDNA-trim produces the following invalid read.

@A00488:28:HJ3THDSXX:2:1146:24189:25441_22:*
,F,::F,
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:,,,,:F,F,F!FFFFFFFFF,FF;FFF;,FFFFF;F-FFFFFFFF;FFFFFFFFFFF:FFFFFFFFFFFF:,,,,:F,F,:,FF,:FF,,:,F,,FF:,,F,,FF,,:F,:,,,,F,,::,,,F:,,:FF,F,,F,F,F,F,F,::F,

The read should be able to be merged to produce a valid read. See FastP read below:

@A00488:28:HJ3THDSXX:2:1146:24189:25441 1:N:0:ACCAACT merged_113_0
TCCAGAGTTATTGCTGTGATACAGGCAGAGATGCTATAACTGAGTTTGTATTCTAGGGGGGGGGGGCCGATGTTAACGGGGAAAAGATAATAATGTAACTTTATTGATACAGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:,,,,:F,F,:,FF,:FF,,:,F,FFF:F,F,,FFF,:F,:,,,,FF
@MatthewMah
Copy link

MatthewMah commented Mar 28, 2024

This repository is unsupported. The tool currently performing this merging process for the Reich lab is in https://github.com/DReichLab/ADNA-Tools

It's currently more cumbersome to perform the merge alone because demultiplexing is built into the merge step, but you can run it similar to this:

java -jar adnatools-1.11.6-SNAPSHOT.jar IndexAndBarcodeScreener --i5-indices indices --i7-indices indices --fixed-i5 AAAAAAA --fixed-i7 AAAAAAA -b barcodes -n 1 r1.gz r2.gz output_filename_stem

where indices is a file containing:

AAAAAAA AAAAAAA

and barcodes is an empty file.

Additionally, you may have to tweak the merge parameters because your sequence has 6 mismatches in the overlap region.

@shyama-mama
Copy link
Author

Thanks @MatthewMah! Can you please advice me on how the mismatch-penalty-[max, high and low] are meant to be used. What are some parameters that give sensible outputs.

@MatthewMah
Copy link

The defaults are:
maxPenalty = 3;
mismatchPenaltyHigh = 3;
mismatchPenaltyLow= 1;
mismatchBaseQualityThreshold = 20;
minOverlap = 15;
minMergedLength = 30;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants