Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of reads in fastq files not detected correctly. #112

Open
Ni-Ar opened this issue Feb 24, 2023 · 6 comments
Open

Number of reads in fastq files not detected correctly. #112

Ni-Ar opened this issue Feb 24, 2023 · 6 comments

Comments

@Ni-Ar
Copy link

Ni-Ar commented Feb 24, 2023

Hi Manu,

If doing wc -l file.fastq you get 12456 /path/to/file.fastq but if doing wc -l file.fastq.gz you get 12345.
This gives an error that doesn't break vast-tool align that looks like this:

[vast align]: VASTDB Version: vastdb.hs2.23.06.20
Argument "123456 /path/to/fil..." isn't numeric in division (/) at vast-tools/bin/RunDBS_1.pl line 432.

The code below shows the problem.

    ### Check for read numbers in R1 and R2
    my $N_fq1;
    my $N_fq2;
    if ($fq1=~/\.gz/){
	$N_fq1=`gzip -dc $fq1 \| wc -l`;
	$N_fq2=`gzip -dc $fq2 \| wc -l`;
    }
    else {
	$N_fq1=`wc -l $fq1`;
	$N_fq2=`wc -l $fq2`;
    }
    chomp($N_fq1);
    chomp($N_fq2);
    unless (defined $trimmed && !defined $fastaOnly){
	$N_fq1=$N_fq1/4;
	$N_fq2=$N_fq2/4;
    }
    else {
	$N_fq1=$N_fq1/2;
	$N_fq2=$N_fq2/2;
    }
    errPrintDie "The number of R1 and R2 reads does not match ($N_fq1 vs $N_fq2)\n" if $N_fq1 != $N_fq2;
}

I think a simple fix would be to pipe the wc -l into a grep and add an if statement to check that $N_fq1 and $N_fq2 are numbers.

my $N_fq1;
my $N_fq2;
if ($fq1=~/\.gz/){
$N_fq1=`gzip -dc $fq1 \| wc -l \| grep -oP "^[0-9]+" \| tr -d " "`;
$N_fq2=`gzip -dc $fq2 \| wc -l \| grep -oP "^[0-9]+" \| tr -d " "`;
  }
  else {
$N_fq1=`wc -l $fq1 \| grep -oP "^[0-9]+" \| tr -d " "`;
$N_fq2=`wc -l $fq2 \| grep -oP "^[0-9]+" \| tr -d " "`;
  }
  if ($N_fq1 =~ /^\d+$/ && $N_fq2 =~ /^\d+$/) {
    # The variables contain integer numbers
    chomp($N_fq1);
    chomp($N_fq2);
    unless (defined $trimmed && !defined $fastaOnly){
     $N_fq1=$N_fq1/4;
     $N_fq2=$N_fq2/4;
   } else {
     $N_fq1=$N_fq1/2;
     $N_fq2=$N_fq2/2;
  }
   errPrintDie "The number of R1 and R2 reads does not match ($N_fq1 vs $N_fq2)\n" if $N_fq1 != $N_fq2;
  } else {
      die "$N_fq1 and $N_fq2 are not integers"
  }

but double check to make sure that's right.

Nicco

@mirimia
Copy link
Contributor

mirimia commented May 29, 2023

Hello,

This would only be a problem in the fastq is compressed but has not .gz at the end, right?

M

@Ni-Ar
Copy link
Author

Ni-Ar commented May 29, 2023

I was getting this error when using uncompressed fastqs as input. I never tried the case of a compressed fastq without the .gz suffix. I guess that would break something else.

@mirimia
Copy link
Contributor

mirimia commented May 29, 2023

I see. I'll add your fix for uncompressed reads (which I have never used as input).

@Ni-Ar
Copy link
Author

Ni-Ar commented May 29, 2023

I mean try it first, not sure my perl code is that good ;)

I sometimes use uncompressed fastqs as it saves time when processing fastqs from fasterq-dump and sending them directly into vast-tools align.

@joaomatos02
Copy link

I'm having the same error with uncompressed files. The output of wc -l is a string with count your/file/path so those variables will never be integers and throws a division error. I think you could do wc -l $fq1 \| awk '{print \$1}' to get the integer only. I guess the final equality test will never fail because both variables are always becoming undef and undef == undef so Perl is happy.

@mirimia
Copy link
Contributor

mirimia commented Jul 15, 2024

Sorry, my bad, I never dealt with this, as we never used uncompressed reads.

I have implemented your solution, Joao, and committed the change. Could you please test it?

For compressed reads, it works already (we ran it a billion times).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants