Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max no. of fastqfiles which can be QC ed. #136

Open
Gokula139 opened this issue Jun 18, 2024 · 9 comments
Open

Max no. of fastqfiles which can be QC ed. #136

Gokula139 opened this issue Jun 18, 2024 · 9 comments

Comments

@Gokula139
Copy link

Hello,

I have running FASTQC 90 fastq files in non-interactive way. My instance running for a long time, and it seems like not stopping. When I monitor the process, it shows the CPU and network usage was good for 2 hrs. from starting after that the CPU and network usage was negligible. So, to find the reason behind this, I am raising this issue. Please help me to rectify it.

Thanks.

@s-andrews
Copy link
Owner

To get a more definitive answer to this I'd need to know

  • The version of fastqc you were running
  • The command you used to launch the analysis
  • What was in the logs printed out by the program when it ran

Processing 90 files should be fine, and if you have multiple CPUs to throw at the job you can run analyses in parallel by specifying the ---threads option when running the program.

There was a known problem in a previous version of fastqc where if one file out of a large set had a problem then the program wouldn't exit correctly at the end of the processing, so it would appear to still be running even though all of the QC reports (apart from the broken input file) had been generated already, so you might have hit this. This problem was fixed in the latest release though.

When you look in the folder which had the data in it, do you see the HTML files for the individual QC reports?

@Gokula139
Copy link
Author

  • The version I am using - fastqc_v0.12.1
  • I don't have logs generated in the process.
  • Yes, I can see the HTML files of the individual QC reports.

@s-andrews
Copy link
Owner

OK so you have the latest version which shouldn't have the problem with the stalling upon failure. If all of the files are there then something else must be causing the program to stay open but without being able to see the output it wrote I really have nothing to go on to try to diagnose this I'm afraid.

@Gokula139
Copy link
Author

Not the html file is generated for all the fastq files. HTML file is created only for 40 and then there is no response from the container. It is running for a long time, and it is not stopping as well.

@s-andrews
Copy link
Owner

If you've run this as a single-threaded process then you should be able to figure out which file is causing the crash/stall as it would be the next one in the list which didn't get processed. If you can then try to run that file and see what output is generated we can try to track this down. Alternatively if you can share the file which is failing with me then I can run it and see what happens. If the process has been stalled for more than a few minutes with no additional output then there's no point leaving it running so you can kill the process which is there and start again on the problematic file.

@Gokula139
Copy link
Author

Hello,

I am trying to find logs for this process. But I couldn't find the logs, where the logs will be written to? Directly in the console or somewhere. If it is somewhere, could you please tell me where I can find it?

Thanks

@Gokula139
Copy link
Author

Will the issue raise because of memory! what is the default memory the JVM uses here? Do we have any ways to increase it through parameters?

@s-andrews
Copy link
Owner

If you're using the latest FastQC then it will assign 512MB to each thread you launch. This should be enough for pretty much all libraries, but if you need more then you can increase this using the --mem command line option.

When launching this on a cluster you will need to assign slightly more than 512MB per thread as there will be a memory overhead from the JVM itself. If you assign a few GB that should cause no issues and should be more than enough.

@s-andrews
Copy link
Owner

I am trying to find logs for this process. But I couldn't find the logs, where the logs will be written to? Directly in the console or somewhere. If it is somewhere, could you please tell me where I can find it?

It really depends how you've run this. FastQC itself just writes this information to stdout and stderr, so it will be wherever you are sending that in the way you launch the program. If you just did:

fastqc mydata.fq.gz

Then it will just print to the terminal in which the process was launched. If you did a redirection such as:

fastqc mydata.fq.gz > log.txt 2>errors.txt

Then the output will go into the log.txt and errors.txt

If you've run this on a cluster with something like:

ssub -o log.txt fastqc mydata.fq.gz

Then it will go into the log file you specified. If you are using slurm and don't specify a log file then it goes to a file in your home directory named after the job id, so something like 1234.o or 1234.e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants