sidle - Plugin error from feature-classifier #778

Juke34 · 2024-09-26T14:11:18Z

Description of the bug

Dear authors,

We were trying to use the ampliseq pipeline in sidle mode but get stuck at the NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT step.

Looking at the .command.log it is written

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Plugin error from feature-classifier:

  No matches found

Debug info has been saved to /tmp/qiime2-q2cli-err-4h0f7f9v.log

And when we look at the qiime2-q2cli-err-4h0f7f9v.log log file we have this message:

 Traceback (most recent call last):
  File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-119>", line 2, in extract_reads
  File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/q2_feature_classifier/_cutter.py", line 215, in extract_reads
    raise RuntimeError("No matches found")
RuntimeError: No matches found

Finally if we look at the line 215 of _cutter.py we can see the python code related to the extract_reads function:

def extract_reads(sequences: DNASequencesDirectoryFormat, f_primer: str,
                  r_primer: str, trim_right: int = 0,
                  trunc_len: int = 0, trim_left: int = 0,
                  identity: float = 0.8, min_length: int = 50,
                  max_length: int = 0, n_jobs: int = 1,
                  batch_size: int = 'auto', read_orientation: str = 'both') \
                  -> DNAFASTAFormat:
    """Extract the read selected by a primer or primer pair. Only sequences
    which match the primers at greater than the specified identity are returned

    Parameters
    ----------
    sequences : DNASequencesDirectoryFormat
        An aligned list of skbio.sequence.DNA query sequences
    f_primer : skbio.sequence.DNA
        Forward primer sequence
    r_primer : skbio.sequence.DNA
        Reverse primer sequence
    trim_right : int, optional
        `trim_right` nucleotides are removed from the 3' end if trim_right is
        positive. Applied before trunc_len.
    trunc_len : int, optional
        Read is cut to trunc_len if trunc_len is positive. Applied after
        trim_right.
    trim_left : int, optional
        `trim_left` nucleotides are removed from the 5' end if trim_left is
        positive. Applied after trim_right and trunc_len.
    identity : float, optional
        Minimum combined primer match identity threshold. Default: 0.8
    min_length: int, optional
        Minimum amplicon length. Shorter amplicons are discarded. Default: 50
    max_length: int, optional
        Maximum amplicon length. Longer amplicons are discarded.
    n_jobs: int, optional
        Number of seperate processes to break the task into.
    batch_size: int, optional
        Number of samples to be processed in one batch.
    read_orientation: str, optional
        'Orientation of primers relative to the sequences: "forward" searches '
        'for primer hits in the forward direction, "reverse" searches the '
        'reverse-complement, and "both" searches both directions.'
    Returns
    -------
    q2_types.DNAFASTAFormat
        containing the reads
    """
    if min_length > trunc_len - (trim_left + trim_right) and trunc_len > 0:
        raise ValueError('The minimum length setting is greater than the '
                         'length of the truncated sequences. This will cause '
                         'all sequences to be removed from the dataset. To '
                         'proceed, set '
                         'min_length ≤ trunc_len - (trim_left  + '
                         'trim_right).')

    n_jobs = effective_n_jobs(n_jobs)
    if batch_size == 'auto':
        batch_size = _autotune_reads_per_batch(
            sequences.file.view(DNAFASTAFormat), n_jobs)
    sequences = sequences.file.view(DNAIterator)
    ff = DNAFASTAFormat()
    with open(str(ff), 'a') as fh:
        with Parallel(n_jobs) as parallel:
            for chunk in _chunks(sequences, batch_size):
                amplicons = parallel(delayed(_gen_reads)(sequence, f_primer,
                                                         r_primer,
                                                         trim_right,
                                                         trunc_len,
                                                         trim_left,
                                                         identity,
                                                         min_length,
                                                         max_length,
                                                         read_orientation)
                                     for sequence in chunk)
                for amplicon in amplicons:
                    if amplicon is not None:
                        skbio.write(amplicon, format='fasta', into=fh)
    if os.stat(str(ff)).st_size == 0:
        raise RuntimeError("No matches found")
    return ff

Do you think the error is due to parameters sent to q2-sidle or it is related to a bug in q2-sidle?
In the second case I guess we might be luckier asking directly to the q2-sidle developer repostory.
Any help is very much appreciated. Thx

Command used and terminal output

nextflow run nf-core/ampliseq -r 2.11.0 -profile singularity -c local.config -resume --input_folder test_data/ --multiregion ITS_2.tsv --sidle_ref_taxonomy silva=128 --outdir /scratch/lasica//myGoodResult --skip_dada_taxonomy


ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT (ITS1,500)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT (ITS1,500)` terminated with an error exit status (1)

Command executed:

  # https://q2-sidle.readthedocs.io/en/latest/database_preparation.html#prepare-a-regional-database-for-each-primer-set
  export XDG_CONFIG_HOME="./xdgconfig"
  export MPLCONFIGDIR="./mplconfigdir"
  export NUMBA_CACHE_DIR="./numbacache"

  #extract sequences
  qiime feature-classifier extract-reads \
      --p-n-jobs 6 \
      --i-sequences db_filtered_sequences.qza \
      --p-identity 2 \
      --p-f-primer TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNCGCGCATGGTGGATTCACAATCC \
      --p-r-primer GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNGTTATGCATGAACGTAATGCT \
      --o-reads db_ITS1.qza

  #prepare to be used in alignment
  qiime sidle prepare-extracted-region \
      --p-n-workers 6 \
      --i-sequences db_ITS1.qza \
      --p-region "ITS1" \
      --p-fwd-primer TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNCGCGCATGGTGGATTCACAATCC \
      --p-rev-primer GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNGTTATGCATGAACGTAATGCT \
      --p-trim-length 500 \
      --o-collapsed-kmers db_ITS1_500_kmers.qza \
      --o-kmer-map db_ITS1_500_map.qza

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT":
      qiime2: $( qiime --version | sed '1!d;s/.* //' )
      qiime2 plugin sidle: $( qiime sidle --version | sed 's/ (.*//' | sed 's/.*version //' )
      q2-sidle: $( qiime sidle --version | sed 's/.*version //' | sed 's/)//' )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Plugin error from feature-classifier:

    No matches found

  Debug info has been saved to NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT

Work dir:
  /scratch/lasica/work/91/ed67dbca90b0e822d062c326c3aeca

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

d4straub · 2024-09-27T06:51:49Z

Hi there,
thanks for the detailed report!

Do you think the error is due to parameters sent to q2-sidle or it is related to a bug in q2-sidle?

It seems to me that the chosen primer pairs do not match any sequence in the chosen database. The silva database that you chose by --sidle_ref_taxonomy silva=128 contains 16S sequences, however the parameter --multiregion ITS_2.tsv indicates that you used ITS primers, that seems incompatible and the error message actually seems to make sense.
The only other readily database is GreenGenes2, which is also for 16S, afaik, so you would need to use --sidle_ref_tax_custom which requires appropriate files (maybe easiest to ask is SIDLE devs have those available).

Juke34 · 2024-09-27T15:47:10Z

Thank you for your prompt and wise answer.
You are right we are not using 16S sequences. The provided DB are not compatible for what we want to achieve. We will investigate the --sidle_ref_tax_custom and contact the SIDLE devs.
Thank again.

cheers

Juke34 added the bug Something isn't working label Sep 26, 2024

Juke34 closed this as completed Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sidle - Plugin error from feature-classifier #778

sidle - Plugin error from feature-classifier #778

Juke34 commented Sep 26, 2024

d4straub commented Sep 27, 2024

Juke34 commented Sep 27, 2024

sidle - Plugin error from feature-classifier #778

sidle - Plugin error from feature-classifier #778

Comments

Juke34 commented Sep 26, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

d4straub commented Sep 27, 2024

Juke34 commented Sep 27, 2024