Run Meryl on multiple fastq files #49

NicMAlexandre · 2024-07-12T16:26:30Z

I have a large list of fastq files that will be used for the same database.

Is there a simple way to run the command on a list of files for the same Meryl database or do I need to concatenate all of them and make a large file.

I'm currently trying this:

while read i; do echo "meryl k=21 count $i output Species.meryl"; done >> job.list < Species_fastq.list

brianwalenz · 2024-07-12T22:08:01Z

If your large list isn't too large, you can do:

meryl k=21 count *fastq.gz output Species.meryl

Where 'too large' would generate a complaint about 'line too long' from bash not meryl.

What you're doing in the while loop above would count k-mers in each file individually, but overwrite the results each time. To do this method properly, you'd want something like count $i output tmp$i.meryl then follow it up with meryl union tmp*.meryl output all.meryl.

NicMAlexandre · 2024-07-13T01:36:54Z

Thank you!!! This works like a charm. Best, Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley ***@***.*** ***@***.***>

…

On Fri, Jul 12, 2024 at 5:08 PM Brian Walenz ***@***.***> wrote: If your large list isn't too large, you can do: meryl k=21 count *fastq.gz output Species.meryl Where 'too large' would generate a complaint about 'line too long' from bash not meryl. What you're doing in the while loop above would count k-mers in each file individually, but *overwrite* the results each time. To do this method properly, you'd want something like count $i output tmp$i.meryl then follow it up with meryl union tmp*.meryl output all.meryl. — Reply to this email directly, view it on GitHub <#49 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB633423XNEBMVIRLLPYJDZMBHVPAVCNFSM6AAAAABKZHDGLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGQYTONRZHE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

damizuka · 2024-10-09T14:58:35Z

Hi, great program
Im triying the exact way above of counting kmers in multiple fastq files it but when a do meryl union I have the following output
PROCESSING TREE #1 using 88 threads. opUnion D2_S1_L001_R1_001.fastq.gz.meryl D2_S1_L001_R1_002.fastq.gz.meryl D2_S1_L001_R1_003.fastq.gz.meryl D2_S1_L001_R1_004.fastq.gz.meryl D2_S1_L001_R1_005.fastq.gz.meryl D2_S1_L001_R2_001.fastq.gz.meryl D2_S1_L001_R2_002.fastq.gz.meryl D2_S1_L001_R2_003.fastq.gz.meryl D2_S1_L001_R2_004.fastq.gz.meryl D2_S1_L001_R2_005.fastq.gz.meryl D2_S1_L002_R1_001.fastq.gz.meryl D2_S1_L002_R1_002.fastq.gz.meryl D2_S1_L002_R1_003.fastq.gz.meryl D2_S1_L002_R1_004.fastq.gz.meryl D2_S1_L002_R1_005.fastq.gz.meryl D2_S1_L002_R2_001.fastq.gz.meryl D2_S1_L002_R2_002.fastq.gz.meryl D2_S1_L002_R2_003.fastq.gz.meryl D2_S1_L002_R2_004.fastq.gz.meryl D2_S1_L002_R2_005.fastq.gz.meryl D2_S2_L001_R1_001.fastq.gz.meryl D2_S2_L001_R1_002.fastq.gz.meryl D2_S2_L001_R1_003.fastq.gz.meryl D2_S2_L001_R1_004.fastq.gz.meryl D2_S2_L001_R2_001.fastq.gz.meryl D2_S2_L001_R2_002.fastq.gz.meryl D2_S2_L001_R2_003.fastq.gz.meryl D2_S2_L001_R2_004.fastq.gz.meryl D2_S2_L002_R1_001.fastq.gz.meryl D2_S2_L002_R1_002.fastq.gz.meryl D2_S2_L002_R1_003.fastq.gz.meryl D2_S2_L002_R1_004.fastq.gz.meryl D2_S2_L002_R2_001.fastq.gz.meryl D2_S2_L002_R2_002.fastq.gz.meryl D2_S2_L002_R2_003.fastq.gz.meryl D2_S2_L002_R2_004.fastq.gz.meryl output to all.meryl Failed to open 'D2_S1_L001_R1_005.fastq.gz.meryl/0x100000.merylData' for reading: Too many open files Failed to open 'D2_S1_L002_R2_005.fastq.gz.meryl/0x111101.merylData' for reading: Too many open files how can I solve it?
Best

Dami Gonzalez, Bioinformatic
PhD candidate

brianwalenz · 2024-10-09T15:09:08Z

looks like it is defaulting to using all CPUs on the machine. each CPU will open one set of input/output files. simple solution is to decrease (or explicitly set) the number of CPUs in use with threads=8. (or 4, or 16, or 32, etc).

damizuka · 2024-10-09T16:44:15Z

thanks for the quick response! im running again meryl per every fastq, will try the new solution and let you know
best regards

damizuka · 2024-10-10T17:13:44Z

Hi, the code works fine! I was wondering if it is normal that the resulting output folder from meryl union-sum weight the same as that every individual database
best regards! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Meryl on multiple fastq files #49

Run Meryl on multiple fastq files #49

NicMAlexandre commented Jul 12, 2024

brianwalenz commented Jul 12, 2024

NicMAlexandre commented Jul 13, 2024 via email

damizuka commented Oct 9, 2024

brianwalenz commented Oct 9, 2024

damizuka commented Oct 9, 2024

damizuka commented Oct 10, 2024

Run Meryl on multiple fastq files #49

Run Meryl on multiple fastq files #49

Comments

NicMAlexandre commented Jul 12, 2024

brianwalenz commented Jul 12, 2024

NicMAlexandre commented Jul 13, 2024 via email

damizuka commented Oct 9, 2024

brianwalenz commented Oct 9, 2024

damizuka commented Oct 9, 2024

damizuka commented Oct 10, 2024