Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

325 mixed model in limma #339

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open

Conversation

KamilMaliszArdigen
Copy link

@KamilMaliszArdigen KamilMaliszArdigen commented Nov 5, 2024

This is a limma module update to provide new logic related to mixed models

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@KamilMaliszArdigen KamilMaliszArdigen linked an issue Nov 5, 2024 that may be closed by this pull request
@KamilMaliszArdigen KamilMaliszArdigen changed the base branch from master to dev November 5, 2024 15:04
@nf-core nf-core deleted a comment from github-actions bot Nov 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit dc88a9b

+| ✅ 300 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-11-26 09:26:23

@KamilMaliszArdigen
Copy link
Author

Hi @pinin4fjords, This PR is updating limma module with it's latest version. I will be more than happy to provide any additional information if needed.

@KamilMaliszArdigen
Copy link
Author

@pinin4fjords I would really appreciate your input in this topic. Thank you in advance.

README.md Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@DSchreyer
Copy link

DSchreyer commented Nov 15, 2024

I am getting this error when running a simple comparison. It seems that in FILTER_DIFFTABLE it expects a log2FoldChange column which does not exist.

This is the command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq \
          --differential_use_limma 
Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: 1

Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)'

Caused by:
  Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)` terminated with an error exit status (1)


Command executed:

  #!/usr/bin/env python
  
  from math import log2
  from os import path
  import pandas as pd
  import platform
  from sys import exit
  
  # 1. Check that the current logFC/padj is not NA
  # 2. Check that the current logFC is >= threshold (abs does not work, so use a workaround)
  # 3. Check that the current padj is <= threshold
  # If this is true, the row is written to the new file, otherwise not
  if not any("VISIT_V05_vs_V02.limma.results.tsv".endswith(ext) for ext in [".csv", ".tsv", ".txt"]):
      exit("Please provide a .csv, .tsv or .txt file!")
  
  table = pd.read_csv("VISIT_V05_vs_V02.limma.results.tsv", sep=("," if "VISIT_V05_vs_V02.limma.results.tsv".endswith(".csv") else "	"), header=0)
  logFC_threshold = log2(float("2.0"))
  table = table[~table["log2FoldChange"].isna() &
              ~table["padj"].isna() &
              (pd.to_numeric(table["log2FoldChange"], errors='coerce').abs() >= float(logFC_threshold)) &
              (pd.to_numeric(table["padj"], errors='coerce') <= float("0.05"))]
  
  table.to_csv(path.splitext(path.basename("VISIT_V05_vs_V02.limma.results.tsv"))[0]+"_filtered.tsv", sep="	", index=False)
  
  with open('versions.yml', 'a') as version_file:
      version_file.write('"NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE":' + "\n")
      version_file.write("    pandas: " + str(pd.__version__) + "\n")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
      return self._engine.get_loc(casted_key)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'log2FoldChange'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File ".command.sh", line 18, in <module>
      table = table[~table["log2FoldChange"].isna() &
                     ~~~~~^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3805, in __getitem__
      indexer = self.columns.get_loc(key)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
      raise KeyError(key) from err
  KeyError: 'log2FoldChange'

@KamilMaliszArdigen
Copy link
Author

KamilMaliszArdigen commented Nov 15, 2024

@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here:

// Differential options

Please try to launch pipeline with following command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq_limma

@DSchreyer
Copy link

@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here:

// Differential options

Please try to launch pipeline with following command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq_limma

@KamilMaliszArdigen Thanks, yes this helped to solve some of the issues. However, I am still getting the following error in the PLOT_EXPLORATORY process with my own dataset.

  In cond_log2_transform_matrix(matrix_data = assay_data[[index]],  :
    NaNs produced
  Fontconfig error: No writable cache directories
  Error in density.default(cond_log2_transform_matrix(plotmatrices[[pm]][,  : 
    'x' contains missing values
  Calls: ggplot_densityplot ... lapply -> FUN -> density -> density -> density.default
  Execution halted

could this come from the issue that we can have negative values in the normalised count matrix ? I also tested it with the test_rnaseq_limma profile which works as expected. However, i checked the normalised count tables from the test data and there are no negative values. Could this be an issue ?

@KamilMaliszArdigen
Copy link
Author

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

@KamilMaliszArdigen
Copy link
Author

@nf-core-bot fix linting

@DSchreyer
Copy link

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

@KamilMaliszArdigen
Copy link
Author

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

Well this will not work as expected voom is responsible for normalisation of rna_seq data. So this will result in not normalised data processing. I recommend to set the --exploratory_log2_assays = '' and than simply during plot generation log2 will not be generated.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this go into the usage docs please?

README.md Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mixed-model in Limma
4 participants