Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve spia script and report #83

Merged
merged 9 commits into from
Jan 31, 2024
16 changes: 14 additions & 2 deletions workflow/report/spia.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
**Pathway enrichment** performed with SPIA, using the model ``{{ snakemake.config["diffexp"]["models"][snakemake.wildcards.model]["full"] }}``.

The table contains the following columns (also see the `SPIA docs <https://rdrr.io/bioc/SPIA/man/spia.html>`_):
``pSize`` is the number of genes on the pathway; ``NDE`` is the number of DE genes per pathway; ``tA`` is the observed total perturbation accumulation in the pathway; ``pNDE`` is the probability to observe at least NDE genes on the pathway using a hypergeometric model; ``pPERT`` is the probability to observe a total accumulation more extreme than tA only by chance; ``pG`` is the p-value obtained by combining pNDE and pPERT; ``pGFdr`` and ``pGFWER`` are the False Discovery Rate and respectively Bonferroni adjusted global p-values; and the ``Status`` gives the direction in which the pathway is perturbed (``activated`` or ``inhibited``). KEGGLINK gives a web link to the KEGG website that displays the pathway image with the differentially expressed genes highlighted in red.
The table contains the following columns that have been renamed for descriptive titles (also see the `SPIA docs <https://rdrr.io/bioc/SPIA/man/spia.html>`_; for renamed columns, original spia column names are mentioned in parentheses):
**Name** of the pathway;
**number of genes on the pathway** (``pSize``);
**number of DE genes per pathway** where DE signifies "differentially expressed" (``NDE``);
**total perturbation accumulation** (``tA``);
**Combined FDR** where FDR signifies "false discovery rate" (``pGFdr``);
**Status** of the pathway, inhibited vs. activated.

The following columns (available from spia output), are hidden in this table in favour of the combined FDR as an overall assessment of the reliability of a pathway's perturbation.
You can access them per pathway by clicking on the leading ``+`` symbol of a row:
**p-value for at least NDE genes** where NDE signifies "n differentially expressed" (``pNDE``);
**p-value to observe a total accumulation** (``pPERT``);
**Combined p-value** (``pG``);
**Combined Bonferroni p-values** (``pGFWER``);
**pathway id** provided by the pathway database used.
161 changes: 24 additions & 137 deletions workflow/resources/datavzrd/spia-template.yaml
Original file line number Diff line number Diff line change
@@ -1,181 +1,68 @@
name: ?f"Pathway impact analysis for model {wildcards.model}"
name: ?f"spia pathway impact analysis for model {wildcards.model}"
datasets:
spia_table:
path: ?input.spia_table
offer-excel: true
separator: "\t"
spia_table_activated:
path: ?input.spia_table_activated
offer-excel: true
separator: "\t"
spia_table_inhibited:
path: ?input.spia_table_inhibited
offer-excel: true
separator: "\t"
default-view: spia_table
views:
spia_table:
dataset: spia_table
desc: |
The table contains the following columns pSize is the number of genes on the pathway; NDE is the number of DE genes per pathway; tA is the observed total perturbation accumulation in the pathway; pNDE is the probability to observe at least NDE genes on the pathway using a hypergeometric model; pPERT is the probability to observe a total accumulation more extreme than tA only by chance; pG is the p-value obtained by combining pNDE and pPERT; pGFdr and pGFWER are the False Discovery Rate and respectively Bonferroni adjusted global p-values; and the Status gives the direction in which the pathway is perturbed (activated or inhibited).
?f"spia pathway impact analysis for model {wildcards.model}"
page-size: 25
render-table:
columns:
Name:
display-mode: normal
link-to-url:
reactome:
url: "http://reactome.org/PathwayBrowser/#/{Ids}"
pathway:
?if params.pathway_db == "reactome":
url: "http://reactome.org/PathwayBrowser/#/{pathway id}"
?elif params.pathway_db == "panther":
url: "https://www.pantherdb.org/pathway/pathwayDiagram.jsp?catAccession={pathway id}"
# we should add all the pathway databases that bioconductor-graphite enables (see its `pathwayDatabases()` function)
?else: # not sure what a good fallback would be here
url: "http://reactome.org/PathwayBrowser/#/{pathway id}"
number of genes on the pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
- "#F7F7F7"
- "#B2182B"
number of DE genes per pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
- "#F7F7F7"
- "#B2182B"
p-value for at least NDE genes:
display-mode: hidden
display-mode: detail
total perturbation accumulation:
plot:
heatmap:
scale: linear
range:
- "#e6550d"
- "white"
- "#6baed6"
domain:
- -1
- 0
- 1
- "#B2182B"
- "#F7F7F7"
- "#2166AC"
domain-mid: 0
p-value to observe a total accumulation:
display-mode: hidden
display-mode: detail
Combined p-value:
display-mode: hidden
display-mode: detail
Combined FDR:
plot:
bars:
scale: linear
Combined Bonferroni p-values:
display-mode: hidden
display-mode: detail
Status:
plot:
heatmap:
scale: ordinal
color-scheme: accent
Ids:
display-mode: hidden
spia_table_activated:
dataset: spia_table_activated
desc: |
The table (sorted by "Status:Activated") contains the following columns pSize is the number of genes on the pathway; NDE is the number of DE genes per pathway; tA is the observed total perturbation accumulation in the pathway; pNDE is the probability to observe at least NDE genes on the pathway using a hypergeometric model; pPERT is the probability to observe a total accumulation more extreme than tA only by chance; pG is the p-value obtained by combining pNDE and pPERT; pGFdr and pGFWER are the False Discovery Rate and respectively Bonferroni adjusted global p-values; and the Status gives the direction in which the pathway is perturbed (activated).
page-size: 25
render-table:
columns:
Name:
display-mode: normal
link-to-url:
reactome:
url: "http://reactome.org/PathwayBrowser/#/{Ids}"
number of genes on the pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
number of DE genes per pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
p-value for at least NDE genes:
display-mode: hidden
total perturbation accumulation:
plot:
heatmap:
scale: linear
range:
- "#e6550d"
- "white"
- "#6baed6"
domain:
- -1
- 0
- 1
p-value to observe a total accumulation:
display-mode: hidden
Combined p-value:
display-mode: hidden
Combined FDR:
plot:
bars:
scale: linear
Combined Bonferroni p-values:
display-mode: hidden
Status:
display-mode: normal
Ids:
display-mode: hidden
spia_table_inhibited:
dataset: spia_table_inhibited
desc: |
The table (sorted by "Status:Inhibited") contains the following columns pSize is the number of genes on the pathway; NDE is the number of DE genes per pathway; tA is the observed total perturbation accumulation in the pathway; pNDE is the probability to observe at least NDE genes on the pathway using a hypergeometric model; pPERT is the probability to observe a total accumulation more extreme than tA only by chance; pG is the p-value obtained by combining pNDE and pPERT; pGFdr and pGFWER are the False Discovery Rate and respectively Bonferroni adjusted global p-values; and the Status gives the direction in which the pathway is perturbed (inhibited).
page-size: 25
render-table:
columns:
Name:
display-mode: normal
link-to-url:
reactome:
url: "http://reactome.org/PathwayBrowser/#/{Ids}"
number of genes on the pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
number of DE genes per pathway:
plot:
heatmap:
scale: linear
range:
- white
- "#186904"
p-value for at least NDE genes:
display-mode: hidden
total perturbation accumulation:
plot:
heatmap:
scale: linear
range:
- "#e6550d"
- "white"
- "#6baed6"
domain:
- -1
- 0
- 1
p-value to observe a total accumulation:
display-mode: hidden
Combined p-value:
display-mode: hidden
Combined FDR:
plot:
bars:
scale: linear
Combined Bonferroni p-values:
display-mode: hidden
Status:
display-mode: normal
Ids:
display-mode: hidden
pathway id:
display-mode: detail
15 changes: 5 additions & 10 deletions workflow/rules/datavzrd.smk
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ rule render_datavzrd_config_spia:
input:
template=workflow.source_path("../resources/datavzrd/spia-template.yaml"),
spia_table="results/tables/pathways/{model}.pathways.tsv",
spia_table_activated="results/tables/pathways/{model}.activated-pathways.tsv",
spia_table_inhibited="results/tables/pathways/{model}.inhibited-pathways.tsv",
output:
"results/datavzrd/spia/{model}.yaml",
log:
"logs/yte/render-datavzrd-config-spia/{model}.log",
params:
pathway_db=config["enrichment"]["spia"]["pathway_database"],
template_engine:
"yte"

Expand Down Expand Up @@ -53,8 +53,6 @@ rule spia_datavzrd:
config="results/datavzrd/spia/{model}.yaml",
# files required for rendering the given configs
spia_table="results/tables/pathways/{model}.pathways.tsv",
spia_table_activated="results/tables/pathways/{model}.activated-pathways.tsv",
spia_table_inhibited="results/tables/pathways/{model}.inhibited-pathways.tsv",
output:
report(
directory("results/datavzrd-reports/spia-{model}"),
Expand All @@ -67,8 +65,7 @@ rule spia_datavzrd:
log:
"logs/datavzrd-report/spia-{model}/spia-{model}.log",
wrapper:
# "v2.6.0/utils/datavzrd"
"v3.3.5-1-gd73914d/utils/datavzrd"
"v3.3.6/utils/datavzrd"


rule diffexp_datavzrd:
Expand All @@ -93,8 +90,7 @@ rule diffexp_datavzrd:
log:
"logs/datavzrd-report/diffexp.{model}/diffexp.{model}.log",
wrapper:
# "v2.6.0/utils/datavzrd"
"v3.3.5-1-gd73914d/utils/datavzrd"
"v3.3.6/utils/datavzrd"


rule go_enrichment_datavzrd:
Expand All @@ -121,5 +117,4 @@ rule go_enrichment_datavzrd:
log:
"logs/datavzrd-report/go_enrichment-{model}/go_enrichment-{model}_{gene_fdr}.go_term_fdr_{go_term_fdr}.log",
wrapper:
# "v2.6.0/utils/datavzrd"
"v3.3.5-1-gd73914d/utils/datavzrd"
"v3.3.6/utils/datavzrd"
2 changes: 0 additions & 2 deletions workflow/rules/enrichment.smk
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ rule spia:
spia_db="resources/spia-db.rds",
output:
table="results/tables/pathways/{model}.pathways.tsv",
table_activated="results/tables/pathways/{model}.activated-pathways.tsv",
table_inhibited="results/tables/pathways/{model}.inhibited-pathways.tsv",
plots="results/plots/pathways/{model}.spia-perturbation-plots.pdf",
params:
bioc_species_pkg=bioc_species_pkg,
Expand Down
Loading
Loading