Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT #1104

Closed
alexmascension opened this issue Oct 26, 2023 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@alexmascension
Copy link

Description of the bug

I am running the nfcore/rnaseq pipeline with human samples to do a next step in the recognition of bacterial reads. In this analysis I include several human samples, and two control samples with only bacterial reads. The aim of this step is (in part) to use the unmapped reads for further processing. The problem comes with the controls with bacterial reads that, unexpectedly, have only a few human reads, and I think this step conflicts with quantification by salmon. The error I receive is the following:

ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (BLACTIS)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (BLACTIS)` terminated with an error exit status (1)

Command executed:

  salmon quant \
      --geneMap Homo_sapiens.GRCh38.109.gtf \
      --threads 6 \
      --libType=A \
      --index salmon \
      -1 BLACTIS.subsampled_R1.fastq.gz -2 BLACTIS.subsampled_R2.fastq.gz \
      --skipQuant \
      -o BLACTIS
  
  if [ -f BLACTIS/aux_info/meta_info.json ]; then
      cp BLACTIS/aux_info/meta_info.json "BLACTIS_meta_info.json"
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT":
      salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  -----------------------------------------
  | Loading contig table | Time = 7.4997 s
  -----------------------------------------
  size = 37303049
  -----------------------------------------
  | Loading contig offsets | Time = 68.5 ms
  -----------------------------------------
  -----------------------------------------
  | Loading reference lengths | Time = 408.96 us
  -----------------------------------------
  -----------------------------------------
  | Loading mphf table | Time = 644.68 ms
  -----------------------------------------
  size = 3784757355
  Number of ones: 37303048
  Number of ones per inventory item: 512
  Inventory entries filled: 72858
  -----------------------------------------
  | Loading contig boundaries | Time = 2.8887 s
  -----------------------------------------
  size = 3784757355
  -----------------------------------------
  | Loading sequence | Time = 459.83 ms
  -----------------------------------------
  size = 2665665915
  -----------------------------------------
  | Loading positions | Time = 4.9999 s
  -----------------------------------------
  size = 3536480116
  -----------------------------------------
  | Loading reference sequence | Time = 549.67 ms
  -----------------------------------------
  -----------------------------------------
  | Loading reference accumulative lengths | Time = 1.0233 ms
  -----------------------------------------
  [2023-10-25 14:40:38.647] [jointLog] [info] done
  [2023-10-25 14:40:38.700] [jointLog] [info] Index contained 252334 targets
  [2023-10-25 14:40:38.745] [jointLog] [info] Number of decoys : 194
  [2023-10-25 14:40:38.745] [jointLog] [info] First decoy index : 252099 

Additionally, it states

hits: 27, hits per frag:  2.74969e-05[2023-10-25 14:40:46.365] [jointLog] [warning] salmon was only able to assign 8 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample.  You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).

Work dir:
  /data/Proyectos/EVs/work/93/093cbb3b93b8a604fe15259850ed46

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

I don't really know what can I do, aside from removing the bacterial controls. I tried --aligner star_rsem, --skip_qualimap and --skip_pseudo_aligment and the error is the same. Can I run the pipeline without running salmon, or something like that?

Command used and terminal output

nextflow run \
    nf-core/rnaseq \
    -r 3.12.0 \
    -profile docker \
    --input $CWD/samples_rnaseq.csv \
    --outdir $RESULTS_RNASEQ \
    --aligner star_salmon \
    --skip_bbsplit \
    --star_index $DATABASE_DIR/genome/index/star \
    --salmon_index $DATABASE_DIR/genome/index/salmon \
    --extra_salmon_quant_args "--minAssignedFrags 1" \
    --rsem_index $DATABASE_DIR/genome/rsem \
    --save_unaligned \
    --skip_qualimap \
    --skip_pseudo_alignment \
    --max_cpus $NUM_CPUS \
    --max_memory $NUM_RAM \
    --max_time $MAX_TIME \
    --fasta $DATABASE_DIR/genome/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa \
    --gtf $DATABASE_DIR/genome/Homo_sapiens.GRCh38.$VERSION_ENSEMBLE_GENOME.gtf \
    --save_reference

Relevant files

nextflow.log

System information

Nextflow version: 23.04.1.5866
Hardware: Desktop
Executor: local
Container engine: Docker
OS: Ubuntu 22.04
Version of nf-core/rnaseq: v3.12.0

@alexmascension alexmascension added the bug Something isn't working label Oct 26, 2023
@MatthiasZepper
Copy link
Member

MatthiasZepper commented Oct 26, 2023

The reason why your attempts to override --minAssignedFrags 1 via extra_salmon_quant_args fail is, that the pipeline stops much earlier, namely during the infer strandedness step.

To skip that step, you just have to provide the strandedness of your samples upfront in the samplesheet (forward or reverse instead of auto).

If the strandedness is unknown to you and you must run it, then provide a custom config with a process selector

    withName: '.*:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT' {
        ext.args   = '--skipQuant --minAssignedFrags 1'
        }

Apart from this problem specifically, I wonder if the dualrnaseq pipeline wouldn't be more applicable to the research question you are seemingly working on?

@alexmascension
Copy link
Author

Hi @MatthiasZepper , so far adding strandness seems to be doing the trick. I will keep you informed if anything happens.
Regarding the dualrnaseq, seems like a good option, although I need to do a more specific analysis and does not quite fit.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants