Skip to content

Latest commit

 

History

History
58 lines (29 loc) · 2.25 KB

File metadata and controls

58 lines (29 loc) · 2.25 KB

Infographics Data visualization on Bioinformatics.

Introduction

This repository contains a simplified analysis of RNA-Seq data with R and Bioconductor. The objective is to show a workflow in a simple way that can serve as the basis for an infographic that will be made with the GRBio group.

Some graphics than you can see in the infographics are below.

Quality control & Exploration

Boxplots and other graphics help to check data distributions. Ideally, one might expect that samples tend to be more similar within groups than between groups. Distinct techniques such as PCA or Hierarchical clustering are used to check this assumption.

Boxplot of log counts

image

MDS plot

The library emojifont was used for this plot

image

Dendogram

image

Data analysis

A statistical test allows selecting significant differentially expressed genes. The volcano plot shows statistical versus biological significance.

Heatmap displays the expressions of selected genes in a grid (genes in rows & samples in columns). The color scale reflects the intensity of gene expression in each sample. On the margins, dendrograms group genes or samples based on the similarity of their gene expression pattern. This is useful for identifying genes that are commonly regulated.

Volcano plot

image

Heatmap

image

Biological interpretation

Genes are annotated in different knowledge databases by terms or categories describing their biological role. The distribution of annotations of selected genes is compared with the distribution of the same annotations in the genome. This allows determining which biological processes might be associated with our gene list. We end up linking those differentially expressed genes that are included in the most represented biological categories using a network plot (bottom left-hand figure)

Dotplot

image

CNET plot

image