Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming with custom sequences #1174

Open
sarah-buddle opened this issue Dec 13, 2024 · 1 comment
Open

Trimming with custom sequences #1174

sarah-buddle opened this issue Dec 13, 2024 · 1 comment

Comments

@sarah-buddle
Copy link

Issue Report

I have run dorado demux and I've noticed that some primer/adapter sequences are left at the start the reads, so I have tried to run dorado trim to remove them. I've tried providing the sequence of the primers using --primer-sequences and also without this parameter. In both cases, some reads are trimmed but I find that in the output file many reads (100s out of 3195 total reads) still contain these sequences at the start. Is there a setting I should change so the trimming works for all the reads?

Steps to reproduce the issue:

The command I am using is:
dorado trim input.fastq --emit-fastq --primer-sequences primers.fasta > output.fastq

The input file was the output of the dorado demux command (with trimming enabled) run on a previously basecalled fastq file.

Custom primer sequences that are still found in output:

>seq1
TTCAGACGTGTGCTCTTCCGATCT
>seq2
CCTACACGACGCTCTTCCGATCT
>seq3
CCTACACGACG

Run environment:

  • Dorado version: 0.8.3+98456f7
  • Dorado command: see above
  • Operating system: CentOS Linux 7 (Core)
  • Hardware (CPUs, Memory, GPUs): CPUs, 10G memory currently (I'm running a small file for testing)
  • Source data type: fastq, previously basecalled from pod5
  • Source data location (on device or networked drive - NFS, etc.): On HPC
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): Test fastq file is 1.3M

Logs

[2024-12-13 16:18:57.672] [info] Running: "trim" "input.fastq" "--primer-sequences" "primers.fasta" "--emit-fastq" "-v"
[2024-12-13 16:18:57.672] [debug] > adapter/primer trimming threads 58, writer threads 6
[2024-12-13 16:18:57.673] [info] - Note: FASTQ output is not recommended as not all data can be preserved.
[2024-12-13 16:18:57.677] [info] > starting adapter/primer trimming
[2024-12-13 16:18:57.727] [debug] Total reads processed: 3195
[2024-12-13 16:18:57.781] [info] > Simplex reads basecalled: 3195
[2024-12-13 16:18:57.781] [info] > finished adapter/primer trimming

@malton-ont
Copy link
Collaborator

Hi @sarah-buddle,

There's nothing obvious in your command that should stop this from working. Are you able to share the test data file with us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants