The Open Source ImmGen project (GSE122108) is a collaborative effort devoted to RNAseq profiling of ex vivo sorted mononuclear phagocytes.
GAM-clustering provides metabolic variability within dataset using a novel network-based computational approach that utilizes cellular transcriptional profiles as proxies. The metabolic network of reactions from KEGG database is presented as a graph that has vertices corresponding to metabolites and the edges corresponding to the reactions with the expressed genes. In the graph the method tries to find a set of connected subgraphs, with each corresponding well to a certain gene expression pattern. Curret analysis reveals the major metabolic features associated with different subpopulations and highlights a number of metabolic modules that are specific to individual cell types, tissues of residence, or developmental stages.
To explore data visit the following links:
Raw counts are processed by rawDataProcessing.R script and the output object es.top12k
has the following structure:
> load("Data/337_es.top12k.Rda")
> dplyr::glimpse(exprs(es.top12k))
## num [1:12000, 1:337] 14.7 12.7 13 12.1 13.4 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:12000] "Actb" "Cst3" "Fth1" "Eef1a1" ...
## ..$ : chr [1:337] "MF.64pLYVEpIIn.Ao.1" "MF.64pLYVEpIIn.Ao.2" "MF.64pLYVEpIIn.Ao.3" ...
> Biobase::exprs(es.top12k)[1:3,1:3]
## MF.64pLYVEpIIn.Ao.1 MF.64pLYVEpIIn.Ao.2 MF.64pLYVEpIIn.Ao.3
## Actb 14.71612 15.00513 14.76655
## Cst3 12.67268 12.83560 12.58577
## Fth1 12.95467 13.27726 13.06159
The initial patterns are defined using k-means clustering on gene expression matrix and then are refined in an iterative process using the network connections (modulesDeriving.R). The final output presents a set of specific subnetworks (also called metabolic modules) that reflect metabolic variability within a given transcriptional dataset. Each metabolic module is a piece of metabolic network whose gene expression has correlated expression pattern across all dataset. The following graph and heatmap represent network and constituting genes' expression for module 5, correspondingly: Averaged gene expression of all modules is represented at the following summary heatmap:
Functional annotation of obtained modules is based on KEGG and Reactome canonical pathways (modulesAnnotation.R). The following example is devoted to module 5 (k - number of module genes in a particular pathway, K - number of genen in a particular pathway):
> paths <- data.table::fread("Data/m.5.pathways_mod.tsv")
> paths[1:3,]
## PATHID pval k K padj PATHNAME genes
## 1: R-MMU-191273 1.009604e-48 17 24 1.237774e-45 Cholesterol biosynthesis Hmgcs1 Hmgcr Msmo1 Cyp51 Mvd ...
## 2: R-MMU-8957322 9.801101e-39 17 67 6.008075e-36 Metabolism of steroids Hmgcs1 Hmgcr Msmo1 Cyp51 Mvd ...
## 3: R-MMU-556833 1.406903e-27 18 395 5.749543e-25 Metabolism of lipids Hmgcs1 Hmgcr Msmo1 Cyp51 Aacs ...