Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline() on closed filehandle _IN at METABOLIC-C.pl line 1909. Cannot fork: Cannot allocate memory at /home/hyShen/miniconda3/envs/METABOLIC #193

Open
hyShen-hzau opened this issue Aug 9, 2024 · 6 comments

Comments

@hyShen-hzau
Copy link

hyShen-hzau commented Aug 9, 2024

Describe the bug
After successfully running the test, I cannot run the process with my data. And the same error is reported many times.

To Reproduce
Steps to reproduce the behavior:

  1. Go to ' perl METABOLIC-C.pl -in-gn /home/hyShen/Metabolic/Sharp/test1 -r /home/hyShen/Metabolic/test.txt -o result1 -t 60'
  2. Click on '....'
  3. Scroll down to '[2024-08-08 22:21:13] The Prodigal annotation is running...
    [2024-08-09 00:12:11] The Prodigal annotation is finished
    readline() on closed filehandle _IN at METABOLIC-C.pl line 1909.
    [2024-08-09 00:23:36] The hmmsearch is running with 60 cpu threads...
    Cannot fork: Cannot allocate memory at /home/hyShen/miniconda3/envs/METABOLIC_v4.0/lib/perl5/site_perl/5.22.0/Parallel/ForkManager.pm line 52.
    '
  4. See error
    The hmmsearch is running with 60 cpu threads...
    Cannot fork: Cannot allocate memory at /home/hyShen/miniconda3/envs/METABOLIC_v4.0/lib/perl5/site_perl/5.22.0/Parallel/ForkManager.pm line 52.
@snpone
Copy link

snpone commented Aug 10, 2024

A potentially useful modification when system resources are low is to alter the code slightly. By using search and replace, you can change the original line that looks like this:

_run_parallel("$output/tmp_calculate_depth.sh", $i); `rm $output/tmp_calculate_depth.sh`;

to:

system("bash $output/tmp_calculate_depth.sh"); `rm $output/tmp_calculate_depth.sh`;

This can greatly reduce system load.

Possible reason:
I'm an experienced bioinformatics worker who has been working with Perl for a long time. This code is very well written. However, the author might have overlooked something. The issue lies around line 2380 in the code with this function:

sub _run_parallel{
	my $file = $_[0];
	my $cpu_numbers_ = $_[1];
	my @Runs; 
	open ___IN, $file;
	while (<___IN>){
		chomp;
		push @Runs, $_;
	}
	close ___IN;

	my $pm = Parallel::ForkManager->new($cpu_numbers_);
	foreach my $run (@Runs){
		my $pid = $pm->start and next;
		`$run`;
		$pm->finish;
	}
	$pm->wait_all_children;
}

This function defines the initial number of threads to ensure that many threads are running simultaneously, similar to the parallel command in shell scripting, which in itself is not a problem. However, the issue arises because each sub-command in the generated temporary tmp_xxx.sh scripts specifies something like samtools sort -@ 40 -- for instance, 40 threads; this means that if you have 100 fq data files, then the actual CPU resources being called upon are 40x40 = 1600 threads running at the same time. This can severely impact disk I/O and system load balance, potentially leading to system paralysis.

@hyShen-hzau
Copy link
Author

Thanks for your help. I found the two lines "_run_parallel("$output/tmp_calculate_depth.sh", $i); rm $output/tmp_calculate_depth.sh;" in METABOLIC-C.pl and replaced them. I am currently testing with a small batch of data. Thanks for your reply and I wish you all the best.

@hyShen-hzau
Copy link
Author

Hello, after replacing the original line, a similar error still occurs. My server has 64 threads. Is it not possible to set the progress of 40 threads?
(METABOLIC_v4.0) [hyShen@zhaolabserver METABOLIC]$ perl METABOLIC-C.pl -in-gn /home/hyShen/Metabolic/Sharp/test1 -r /home/hyShen/Metabolic/test.txt -o result1 -t 40
[2024-08-11 11:45:35] The Prodigal annotation is running...
[2024-08-11 13:37:04] The Prodigal annotation is finished
readline() on closed filehandle _IN at METABOLIC-C.pl line 1909.
[2024-08-11 13:48:17] The hmmsearch is running with 40 cpu threads...
Cannot fork: Cannot allocate memory at /home/hyShen/miniconda3/envs/METABOLIC_v4.0/lib/perl5/site_perl/5.22.0/Parallel/ForkManager.pm line 52.

@snpone
Copy link

snpone commented Aug 11, 2024

It seems like your .faa files maybe too large, or input files were demaged.

 readline() on closed filehandle _IN at METABOLIC-C.pl line 1909.

## This means function _get_faa_seq()  failed.

# Store faa file into a hash
	%Total_faa_seq = (%Total_faa_seq, _get_faa_seq($file));
	if ($input_genome_folder){
		%Total_gene_seq = (%Total_gene_seq, _get_gene_seq("$file_name\.gene"));
	}	

Possible your faa files too large, to store into RAM.
Here is a tiny script to report your avaliable RAM every 30s.
You can run in terminal as command bash /PATH/TO/check_momory.sh , then re-run the .pl script.

If you find that there is always ample remaining memory during the program's execution, but you still receive the same error message, then please check if your input file is corrupted.
check_memory.sh.gz

@hyShen-hzau
Copy link
Author

hmmsearch takes too long to run (7g data over 15h)
Hello, after changing the number of threads, my memory space is sufficient. However, this is the third time that the hmmsearch phase takes more than 24 hours, and I only input 7G sequencing data, but when I run -test, it only takes 1.5 hours. I checked the -test data and it is 5G in size, which is not much different from my data size.

@snpone
Copy link

snpone commented Aug 12, 2024

This is abnormal. Normally, hmmsearch with 40 threads processing 10GB/sample, for dozens of samples, would take only about a dozen hours. Since I'm not clear about your work environment, I can only guess that it might be a system issue. Try using the glances command or sudo iotop command to check the system disk load. Normal I/O throughput is around 100MBps or even higher. If it's consistently below 50MBps, it could be that another program is occupying the disk or it's a precursor to physical disk damage. There are no other bugs in the program itself. Additionally, if the original input folder contains .fastq.gz files, please decompress them before running the program, which can also save runtime. 
Good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants