Skip to content

Detailed outline to use the individual function of cellSight on available data:

Ranojoy edited this page Oct 17, 2024 · 28 revisions

In this vignette, we highlight the detailed steps of using the individual functions available in cellSight. The data used for this example is taken from NCBI genome expression omnibus(GEO). A total of five samples were examined in this analysis. Two samples were obtained from young subjects, serving as replicates (y1=25y, y2=27y), and three samples were obtained from older subjects, also serving as replicates (o1=53y, o2=70y, o3=69y). The hypothesis in this study was to show fibroblasts play a crucial role in maintaining the structure and functionality of human skin, with distinct variations observed in different dermal layers. Despite their diverse functions, a comprehensive analysis of these variations is lacking.

Install and load packages

Install Bioconductor packages before installing cellSight on the R console:

# Install Bioconductor Manager and required packages
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("DelayedMatrixStats", "glmGamPoi", "metap", "multtest"))

To install cellSight using devtools:

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install cellSight from GitHub
devtools::install_github("omicsEye/cellSight")
#Load cellSight
library(cellSight)

Loading the data from NCBI GEO:

#Installthe required package if missing
install.packages("R.utils")
library(R.utils)
library(Seurat)
library(httr)
url <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE130973&format=file&file=GSE130973%5Fseurat%5Fanalysis%5Flyko%2Erds%2Egz"
filename <- "GSE130973_seurat_analysis_lyko.rds.gz"

# Define the destination directory
destination_dir <- "C:/Users/ranoj/Desktop/test"

# Create the destination path
destination_path <- file.path(destination_dir, filename)

# Download the file
GET(url, write_disk(destination_path,overwrite = TRUE))

# Unzip the file
gunzip(destination_path)

# Optional: Remove the original compressed file
file.remove(destination_path)
# Define the path to the unzipped RDS file
unzipped_file <- "C:/Users/ranoj/Desktop/test/GSE130973_seurat_analysis_lyko.rds"

# Read the RDS file
seurat_data <- readRDS(unzipped_file)

# Now, 'seurat_data' contains the contents of the unzipped RDS file
# You can work with the Seurat object or data as needed

image

integrated_seurat_object <-UpdateSeuratObject(seurat_data)
# Assuming you have a Seurat object named integrated_seurat_object

# Assuming 'integrated_seurat_object' is your integrated Seurat object

# Assuming 'integrated_seurat_object' is your integrated Seurat object

# Get unique sample IDs
subjects <- unique(integrated_seurat_object$subj)

# Create a list to store individual Seurat objects
individual_seurat_objects <- list()

# Loop through each sample ID and extract the data
for (subject in subjects) {
  # Subsetting the integrated Seurat object for each sample
   subset_seurat_object <- SplitObject(integrated_seurat_object, split.by = "subj")[[subject]]

  # Store the subset Seurat object in the list
  individual_seurat_objects[[subject]] <- subset_seurat_object
  Idents(individual_seurat_objects[[subject]]) <- "SeuratProject"

}

# Access individual Seurat objects by sample ID
# For example, individual_seurat_objects[["Sample1"]], individual_seurat_objects[["Sample2"]], etc.

image image

Running individual function in cellSight:

library(cellSight)
qc_plots<- qc_plots(individual_seurat_objects,"C:/Users/ranoj/Desktop/second_example/")

image

Quality Control (QC) Plots in single-cell QC plots are essential to assess the quality of your scRNA-seq experiment and identify potential issues that may impact downstream analyses.

Overview of QC Plots cellSight offers several QC plots to help you evaluate different aspects of your single-cell dataset:

  1. Gene Expression Metrics Violin Plots: Display the distribution of gene expression across cells, aiding in the identification of highly variable genes. Box Plots: Show the spread and central tendency of gene expression, highlighting potential outliers.
  2. Cell-Level Metrics Scatter Plots: Visualize relationships between important metrics like the number of detected genes and total counts per cell. Feature Plots: Display expression levels of specific genes across all cells, allowing identification of potential outliers.
  3. Mitochondrial Content Mitochondrial Content Plots: Evaluate the percentage of mitochondrial genes in individual cells, identifying potential stress or low-quality cells.

Quality control output

Image 1 Image 2 Image 3 Image 4 Image 5 Image 1 Image 2 Image 3 Image 4 Image 5 Image 1 Image 2 Image 3 Image 4 Image 5

filtered_data <-filtering(individual_seurat_objects,"C:/Users/ranoj/Desktop/second_example/")

image

The filtering step typically involves the following criteria:

Cell Quality Metrics: Cells may be filtered based on metrics such as total number of genes detected per cell, total counts per cell, and the percentage of mitochondrial genes. Cells with unusually high or low values for these metrics may be indicative of poor quality or technical issues.

Gene Expression Thresholds: Cells expressing an insufficient number of genes may be excluded. This helps remove potential ambient RNA contamination or low-quality cells with minimal transcriptional activity.

Mitochondrial Gene Content: Cells with high percentages of mitochondrial genes may indicate stress or damage. Filtering out such cells helps improve the overall quality of the dataset.

sctransformed_data <- sctransform_integration(filtered_data,"C:/Users/ranoj/Desktop/second_example/")

image image image image image image image image

The above function performs two important tasks: normalization using sctransform and integration. In the intricate realm of single-cell RNA sequencing (scRNA-seq), advancements in data processing and integration methodologies have become pivotal for unraveling the complexities inherent in cellular transcriptomes. One such transformative approach, SCTransform, serves as a beacon for addressing challenges like high dropout rates and low counts within scRNA-seq datasets. Simultaneously, the process of integration emerges as a crucial endeavor, offering a means to harmonize diverse datasets from distinct experimental conditions. Together, these methodologies elevate the precision and interpretability of scRNA-seq analyses, enabling researchers to glean nuanced insights into the molecular intricacies of individual cells.

  1. SCTransform: sctransform is a method used for normalizing and transforming single-cell RNA-seq data. It is particularly beneficial for addressing challenges such as high dropout rates and low counts inherent in single-cell datasets. The method aims to stabilize variance across expression levels, making the data more amenable to downstream analyses, such as clustering and differential expression. Applying sctransform to a single-cell RNA-seq dataset allows you to obtain transformed expression values that are more suitable for statistical analyses and visualization.

  2. Integration: Integration, in the context of single-cell RNA-seq data, refers to the process of combining or aligning multiple datasets to enable joint analysis. This is often necessary when dealing with data from different batches, experiments, or conditions. Integration methods aim to reduce batch effects and allow for a more accurate comparison of cells across datasets. The goal is to harmonize datasets, making them comparable and facilitating the identification of shared biological signals.

# Assuming 'sctransformed_data' is your Seurat object
#Changing the variable names to sample and type to create the plots
sctransformed_data$sample <- sctransformed_data$age
sctransformed_data$type <- sctransformed_data$subj

#Running the function to find the clusters present in the data
pca_clusters <- pca_clustering(sctransformed_data,"C:/Users/ranoj/Desktop/second_example/")

image image image image image

Clustering output

Image 1 Image 2 Image 3 Image 4 Image 5 Image 6

Tweedieverse is an emerging toolkit designed for differential expression (DE) analysis, providing a robust approach to handling the complexities inherent in gene expression data. Unlike traditional methods that assume normality, Tweedieverse takes advantage of the Tweedie distribution, which is well-suited for modeling the zero-inflated and over-dispersed nature of count data, common in transcriptomics. Accommodating continuous and discrete components allows for a more accurate estimation of gene expression variability. This improves the identification of differentially expressed genes, especially in datasets characterized by a high degree of sparsity. Additionally, the flexibility of Tweedieverse makes it ideal for various types of omics data, facilitating comprehensive and nuanced biological interpretations.

#Following the results from the paper and naming the clusters accordingly based on marker genes:
Idents(pca_clusters) = pca_clusters$integrated_snn_res.0.6
new.cluster.ids <- c("Sec-Pap","Mac/DC1","VascEC1","Kerat_diff1","Kerat_diff2","T-cells","Pro-Inf","Kerat_diff3",
                     "Pericytes1","Mesc","SR","Entro","Mac/DC2","Pericytes2","LymphEC","Mac/DC3","Melanocytes")
names(new.cluster.ids) <- levels(pca_clusters)
pca_clusters <- RenameIdents(pca_clusters, new.cluster.ids)
##Adding the identity of each clusters as a meta data to the object
pca_clusters$Celltype <- Idents(pca_clusters)
##Running tweedieverse for finding the differentially expressed (DE) genes in each celltype, for each group in the celltype
##In our scenario the metadata we are considering is "age". Tweediverse will find DE genes in each celltype for the metadata "age", where
##age can be divided into two groups.
tweedie <- de_analysis(pca_clusters,"C:/Users/conference/Desktop/compbio/",imp_var = "age")

tweedie tweedie1 tweedie2

Cell communication, or cell signaling, is the process by which cells interact with each other to coordinate a wide range of biological activities, from growth and development to immune responses and tissue repair. This communication occurs through signaling molecules such as proteins, hormones, and neurotransmitters that are secreted by one cell and received by another via receptors on the cell surface. These signals can trigger various responses inside the recipient cell, leading to changes in gene expression, metabolic activity, or behavior. Cell communication is critical for maintaining homeostasis in multicellular organisms, ensuring that different cell types work together harmoniously. Disruptions in these signaling pathways can lead to diseases such as cancer, autoimmune disorders, and developmental abnormalities. Understanding cell communication provides insights into how complex biological systems function and opens avenues for therapeutic interventions.

cellcomm <- cellcomm_analysis(pca_clusters,"C:/Users/conference/Desktop/compbio/",imp_var = "OLD", species = "human")

image image

Cell communication output

1. Interaction Count Plot (count): The plot visualizes the number of interactions between different cell types.Each node (circle) represents a specific cell type or cell group. The size of the node (vertex.weight) represents the size of that cell population (group size). The edges (lines connecting nodes) represent the interactions between cell types. The thicker or more prominent the edge, the greater the number of interactions between the connected cell types. This plot highlights how often cells from different cell types communicate with one another. A thicker edge between two cell types indicates more frequent communication. This plot is useful for identifying which cell types are more actively involved in cellular communication.

2. Interaction Weight/Strength Plot (weight): The plot visualizes the strength or weight of the interactions between different cell types, which reflects the intensity or strength of the signaling between them.The nodes still represent cell types, and their sizes reflect the population sizes. The edges in this plot represent the interaction strength (based on molecular signaling pathways). Thicker or bolder edges represent stronger interactions. This plot allows you to see not just the number of interactions but how strong or influential the signaling is between different cell types. Even if a pair of cell types doesn't interact frequently, the signaling might be stronger in those few interactions, giving insights into critical communication pathways.

Image 2 Image 1