Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRR15731653)` terminated with an error exit status (1) #1348

Open
Dokmen opened this issue Jul 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Dokmen
Copy link

Dokmen commented Jul 25, 2024

Description of the bug

Hi,
I have been using nf-core/rnasq for a long time, but with the update of Nextflow, I am receiving the following error messages. I downloaded my fastq files with aspera and there was no breakage. I don't understand why this is happening. I've never encountered this error before. I downloaded my fastq files again and again and installed nextflow on conda again and I get this error every time I run it. Does this error have anything to do with the new update? I am using HPC, maximum cpu is 40 or 56 and maximum ram is 190 or 380. I send the slurm job to the queue and use 3 nodes. Can you help with this issue?

Command used and terminal output

$
#SBATCH -N 3
#SBATCH --ntasks-per-node=56
#SBATCH --time=72:00:00
#SBATCH --output=/truba_scratch/dokmen/StarRsem/test.log
#SBATCH --error=/truba_scratch/dokmen/StarRsem/test.err 
echo "SLURM_NODELIST $SLURM_NODELIST"
echo "NUMBER OF CORES $SLURM_NTASKS"
echo "SLURM_CPUS_PER_TASK $SLURM_CPUS_PER_TASK"
eval "$(/truba/home/dokmen/miniconda3/bin/conda shell.bash hook)"
conda activate nf-core
export NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)
export NXF_OPTS="-Xms500M -Xmx2G"
wdir=/truba_scratch/dokmen/StarRsem/ 
cd $wdir
nextflow run \
   nf-core/rnaseq -r 3.14.0\
     -profile singularity \
     -params-file nf-params.json \
     -c GSE.config        
exit

config file: 

params {
    config_profile_name        = 'GSE183533 profile'
    // Limit resources so that this can run on GitHub Actions
    max_cpus   = 56
    max_memory = '190.GB'
    max_time   = '72.h'
}

process {
    errorStrategy = { task.exitStatus in [143,137,104,134,139,140,247] ? 'retry' : 'finish' }
    maxRetries    = 2

    // process labels

}


singularity {
  enabled = true
  autoMounts = true
  cacheDir = '/truba/home/dokmen/.singularity'
}

process {
executer ='slurm'
scratch = true
submitRateLimit = '10 sec'
queueSize = 50
}

// When using RSEM, remove warning from STAR whilst building tiny indices
process {
    withName: 'RSEM_PREPAREREFERENCE_GENOME' {
        ext.args2 = "--genomeSAindexNbases 7"
    }
}

Output:

executor >  local (5)
[-        ] NFC…REPARE_GENOME:GUNZIP_FASTA | 0 of 1
[-        ] NFC…:PREPARE_GENOME:GUNZIP_GTF | 0 of 1
[-        ] NFC…:PREPARE_GENOME:GTF_FILTER -
[-        ] NFC…ME:GUNZIP_TRANSCRIPT_FASTA | 0 of 1
[-        ] NFC…_TRANSCRIPTS_FASTA_GENCODE -
[-        ] NFC…ENOME:CUSTOM_GETCHROMSIZES -
[-        ] NFC…EM_PREPAREREFERENCE_GENOME -
[-        ] NFCORE_RNASEQ:RNASEQ:CAT_FASTQ -
[7e/0ad9d7] NFC…ALORE:FASTQC (SRR15731644) | 4 of 41, failed: 4
[fb/bc2726] NFC…E:TRIMGALORE (SRR15731658) | 1 of 41, failed: 1
[-        ] NFC…PLE_FQ_SALMON:SALMON_INDEX -
[-        ] NFC…PLE_FQ_SALMON:FQ_SUBSAMPLE -
[-        ] NFC…PLE_FQ_SALMON:SALMON_QUANT -
[-        ] NFC…M:RSEM_CALCULATEEXPRESSION -
[-        ] NFC…ATS_SAMTOOLS:SAMTOOLS_SORT -
[-        ] NFC…TS_SAMTOOLS:SAMTOOLS_INDEX -
[-        ] NFC…TS_SAMTOOLS:SAMTOOLS_STATS -
[-        ] NFC…SAMTOOLS:SAMTOOLS_FLAGSTAT -
[-        ] NFC…SAMTOOLS:SAMTOOLS_IDXSTATS -
[-        ] NFC…IFY_RSEM:RSEM_MERGE_COUNTS -
Plus 25 more processes waiting for tasks…
-[nf-core/rnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRR15731653)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRR15731653)` terminated with an error exit status (1)


Command executed:

  printf "%s %s\n" SRR15731653_1.fastq.gz SRR15731653_1.gz SRR15731653_2.fastq.gz SRR15731653_2.gz | while read old_name new_name; do
      [ -f "${new_name}" ] || ln -s $old_name $new_name
  done
  
  fastqc \
      --quiet \
      --threads 6 \
      SRR15731653_1.gz SRR15731653_2.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC":
      fastqc: $( fastqc --version | sed '/FastQC v/!d; s/.*v//' )
  END_VERSIONS

Command exit status:
  1

Command output:
  application/gzip
  application/gzip

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  application/gzip
  application/gzip
  Failed to process file SRR15731653_2.gz
  uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline 'F,FFFFFFCCCATGGA868:186:H2G5KDSX2:2:1105:26865:8343/2' didn't start with '+' at 885703
  	at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
  	at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:129)
  	at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
  	at java.base/java.lang.Thread.run(Thread.java:833)
  Failed to process file SRR15731653_1.gz
  uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline 'FFFFFACTCTTC111:29957:19617/1' didn't start with '+' at 2218703
  	at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
  	at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:129)
  	at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
  	at java.base/java.lang.Thread.run(Thread.java:833)

Relevant files

.nextflow.log

System information

No response

Tasks

No tasks being tracked yet.
@Dokmen Dokmen added the bug Something isn't working label Jul 25, 2024
@MatthiasZepper
Copy link
Member

I don't think this is a pipeline error:

  Failed to process file SRR15731653_2.gz
  uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline 'F,FFFFFFCCCATGGA868:186:H2G5KDSX2:2:1105:26865:8343/2' didn't start with '+' at 885703
  Failed to process file SRR15731653_1.gz
  uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline 'FFFFFACTCTTC111:29957:19617/1' didn't start with '+' at 2218703

To me, it seems that the input FastQ is not formatted correctly. Particularly the second error to me seems as if the sequence directly blends into the ID of the next read without quality scores.

If you have downloaded the files multiple times already, it might be that they are already corrupted at submission? I recommend some additional data integrity checks first, e.g. with seqfu check or fq lint. You can then for example use seqkit sana to fix errors and drop malformed reads.

PS: We also have an #rnaseq channel in the nf-core Slack space where you could get faster help on issues like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants