hawk.out run too long #20

SC-Duan · 2020-05-08T05:26:42Z

Hi,
I have 91 samples and it takes too long time (already 18 days) to run "hawk.out 42 49", the hawk_out.txt file is still empty, I set noThread=10, what is wrong with that? The read coverage in each sample is about 8x and the size of the genome is 2.2Gb.
Thank you!

atifrahman · 2020-05-10T07:47:45Z

That is really surprising. For the datasets we have analyzed, hawk takes much less time compared to the time needed for counting k-mers. How long did k-mer counting take?

We ran hawk on ~200 human samples and it took about a day (with 32 threads).

SC-Duan · 2020-05-11T01:41:36Z

k-mer counting are very fast, is there some methods to check the problem of hawk.out？

atifrahman · 2020-05-13T09:11:35Z

Can you please share the 'hawk_out.txt' file?

SC-Duan · 2020-05-18T02:19:07Z

hawk_out.txt file is empty.

SonjaKersten · 2020-12-27T16:36:53Z

Hi, I have the same issue. I'm running runHawk on Reads_case_sorted.txt and Reads_control_sorted.txt of 124Gb each for 13 days now. The hawk_out.txt file is still empty. What could be potentially wrong?

robertwhbaldwin · 2021-02-17T17:31:44Z

Did someone figure out what the problem was? I may be having the same issue. thanks - RObert

robertwhbaldwin · 2021-02-17T17:52:34Z

I think that i may have a similar problem. I'm running runHawk with kmer counts from 50 samples sequenced to ~9x coverage (genome size 2 g). Each kmer file was 20-25G.
Anyways, I'm running Hawk on an AWS EC2 instance and have already racked up $200 in charges I'd like to know if I should kill it or let it run. The Kmer counting step took about 2 days (1 hour per sample). The runHawk has been going for 23 hrs. I was expecting it to have finished by now. The instance type is m5ad.4xlarge with 16 vCPU and 64 G RAM. thanks - RObert

atifrahman · 2021-02-17T18:06:18Z

Has anything been written to case_out_wo_bonf.kmerDiff and control_out_wo_bonf.kmerDiff? If not, you can probably kill it.

robertwhbaldwin · 2021-02-17T18:19:48Z

No neither file has anything in it. I'll have to stop the job. It would be good to know what the problem was (too few resources) and how to spot it early on. None of the files initially created by runHawk had anything being added to them over the run. If there's some way to check to see if things are progressing or not that would be helpful. Also any recommendations on what compute resources should be used (more threads, faster cpu, more RAM etc) because I could launch it with a new instance. I should also point out that I had to keep the input files on the ESB (remote) storage and not the local storage which would hamper performance but I don't think that's the issue here. - Thanks - RObert

atifrahman · 2021-02-17T18:26:54Z

Sorry about that!

We'll look into it. We never encountered this on any of the datasets we used. The datasets are so large that it's difficult for others to share them so that we can debug. We'll give it another shot.

robertwhbaldwin · 2021-02-18T14:41:43Z

I tried runHawk again on a different instance. I started over from the beginning reinstalling all software. I ran it with 35 threads and ~80Gigs RAM and moved the input sorted kmer files to local SSD storage. I left it over night to run and after ~8hrs found it still running with no change to the output files. I saved the AMI but will not be attempting this again unless the issue is resolved. My input sorted Kmer files were 25 gigs each (50 samples). Does that seem too large for a 2 gig diploid genome at 9X coverage? Let me know if I can help in any way to resolve this issue.

atifrahman · 2021-02-18T14:55:33Z

Can you please share one or two of the k-mer count files? We can check whether they are in the expected format. When I tried it on ~200 human samples, the total size of k-mer count files was >5TB. So, they don't seem unreasonably large.

robertwhbaldwin · 2021-02-18T15:03:07Z

Do you want the whole files?

atifrahman · 2021-02-18T15:07:13Z

If possible, yes. You can upload them somewhere and share the link by emailing me at [email protected]

SonjaKersten · 2021-02-19T12:39:53Z

I'm also still on the same issue. However, I don't know, whether it is the file size or the fact that I run it on only two pools of a bulked segregant experiment (two samples). I would appreciate if you let me know how the issue got resolved.
Thanks Sonja

robertwhbaldwin · 2021-02-23T22:03:49Z

For those still dealing with this problem, it turns out that my kmer file had the incorrect format. Check your kmer file.
I ran the kmer step with the unmodified version of jellyfish2 but applied the patch provided by HAWK. My kmer file was incorrect because the first column had the kmer strings. The first column should be a number representing the kmer string and not the kmer string itself. The second column should be the count for the kmer.

robertwhbaldwin · 2021-02-25T00:19:51Z

And I'll add that you can install jellyfish unmodified version but you may need to use version 2.2.10 as suggested in the HAWK documentation. I tried using the patch to a more recent version of jellyfish2 and the output was not formatted properly. WHen I applied to 2.2.10 it was fixed.

SonjaKersten · 2021-02-26T13:40:31Z

Thanks Robert, I will check and try it out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hawk.out run too long #20

hawk.out run too long #20

SC-Duan commented May 8, 2020 •

edited

Loading

atifrahman commented May 10, 2020

SC-Duan commented May 11, 2020

atifrahman commented May 13, 2020

SC-Duan commented May 18, 2020

SonjaKersten commented Dec 27, 2020

robertwhbaldwin commented Feb 17, 2021

robertwhbaldwin commented Feb 17, 2021

atifrahman commented Feb 17, 2021

robertwhbaldwin commented Feb 17, 2021

atifrahman commented Feb 17, 2021

robertwhbaldwin commented Feb 18, 2021

atifrahman commented Feb 18, 2021

robertwhbaldwin commented Feb 18, 2021

atifrahman commented Feb 18, 2021

SonjaKersten commented Feb 19, 2021

robertwhbaldwin commented Feb 23, 2021

robertwhbaldwin commented Feb 25, 2021

SonjaKersten commented Feb 26, 2021

hawk.out run too long #20

hawk.out run too long #20

Comments

SC-Duan commented May 8, 2020 • edited Loading

atifrahman commented May 10, 2020

SC-Duan commented May 11, 2020

atifrahman commented May 13, 2020

SC-Duan commented May 18, 2020

SonjaKersten commented Dec 27, 2020

robertwhbaldwin commented Feb 17, 2021

robertwhbaldwin commented Feb 17, 2021

atifrahman commented Feb 17, 2021

robertwhbaldwin commented Feb 17, 2021

atifrahman commented Feb 17, 2021

robertwhbaldwin commented Feb 18, 2021

atifrahman commented Feb 18, 2021

robertwhbaldwin commented Feb 18, 2021

atifrahman commented Feb 18, 2021

SonjaKersten commented Feb 19, 2021

robertwhbaldwin commented Feb 23, 2021

robertwhbaldwin commented Feb 25, 2021

SonjaKersten commented Feb 26, 2021

SC-Duan commented May 8, 2020 •

edited

Loading