-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Meryl on multiple fastq files #49
Comments
If your large list isn't too large, you can do:
Where 'too large' would generate a complaint about 'line too long' from bash not meryl. What you're doing in the while loop above would count k-mers in each file individually, but overwrite the results each time. To do this method properly, you'd want something like |
Thank you!!! This works like a charm.
Best,
Nicolas Alexandre
PhD Candidate, Integrative Biology
Whiteman Lab
University of California - Berkeley
***@***.*** ***@***.***>
…On Fri, Jul 12, 2024 at 5:08 PM Brian Walenz ***@***.***> wrote:
If your large list isn't too large, you can do:
meryl k=21 count *fastq.gz output Species.meryl
Where 'too large' would generate a complaint about 'line too long' from
bash not meryl.
What you're doing in the while loop above would count k-mers in each file
individually, but *overwrite* the results each time. To do this method
properly, you'd want something like count $i output tmp$i.meryl then
follow it up with meryl union tmp*.meryl output all.meryl.
—
Reply to this email directly, view it on GitHub
<#49 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFB633423XNEBMVIRLLPYJDZMBHVPAVCNFSM6AAAAABKZHDGLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGQYTONRZHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, great program Dami Gonzalez, Bioinformatic |
looks like it is defaulting to using all CPUs on the machine. each CPU will open one set of input/output files. simple solution is to decrease (or explicitly set) the number of CPUs in use with |
thanks for the quick response! im running again meryl per every fastq, will try the new solution and let you know |
Hi, the code works fine! I was wondering if it is normal that the resulting output folder from meryl union-sum weight the same as that every individual database |
I have a large list of fastq files that will be used for the same database.
Is there a simple way to run the command on a list of files for the same Meryl database or do I need to concatenate all of them and make a large file.
I'm currently trying this:
while read i; do echo "meryl k=21 count $i output Species.meryl"; done >> job.list < Species_fastq.list
The text was updated successfully, but these errors were encountered: