Methods Resources

Resources that cover methods and procedures for bioinformatics, statistics, and machine learning.

Bioinformatics

Resource	Description
High Throughput Sequencing - StatQuest	Josh Starmer explains RNASeq, ChiPSeq, DESeq2, PCA, Expression Heatmaps, RPKM/FPKM/TPM and more
RNASeq RPKM vs FPKM vs TPM	Reviews different RNASeq units common in the literature
RNASeq Between Sample Normalization	Why FPKM and TPM aren't comparable between samples and library normalization is needed
Single-Cell RNASeq Data in R	Tutorial explaining how to use Bioconductor packages to efficiently manipulate single cell data
Broad Single-Cell RNASeq Tutorial	2019 Broad workshop on how to analyze sc-RNAseq data
Computational Genomics with R	Great introduction to different areas of bioinformatics using R
RNA-Seq Differential Expression Walkthrough	Harvard Broad workshop that covers the entire DEG workflow: sample QC, count normalization, DE analysis, visualization, and functional analysis.
Microarray Data Analysis	Workshop covering microarray DE analysis for GEO data using `limma`.
FastQC Plots: Good vs Bad Plots	Tutorial describing how to interpret `FastQC` plots with examples of good and bad data
CoMutPlotter: Visualize Mutation Data	Online tool to visualize mutation heatmaps and include clinical characteristics for a sequencing cohort
RNA Aligners Strandedness Settings	Details different settings for strandedness in RSEM, Salmon, Kallisto, HTSeq-Counts and others
RNASeq Strandedness Explanation	Explains unstranded vs stranded vs reverse stranded and how to use RSeQC to check the strandedness of FASTQ data
Common Clustering Mistakes	Discusses issues with clustering biological data
Should I remove PCR duplicates from RNA-Seq	Why we DO NOT remove PCR duplicates from RNA-Seq
Sequenza Walkthrough	Workshop covering how to use Sequenza and explaining outputs with more detail than the official Sequenza tutorial

Statistics

Resource	Description
StatQuest	Khan Academy style videos by Josh Starmer on distributions, parameters, linear models, machine learning, bioinformatics, and R
Statistical Tests Are Linear Models	Common statistical tests such as the t-test and ANOVA are just specialized forms of linear models
Flexible Imputation For Missing Data	Statistical aspects of missing data. Covers MCAR/MAR/MNAR assumptions, complete-case analysis, and MICE imputation methods
Modern Statistics For Modern Biology	Intro statistics for bioinformatics applications. Covers distributions, PCA, hypothesis testing, RNASeq analysis, machine learning, experimental design
Biostatistics For Biomedical Research	Intro biostatistics, regression models, randomized controlled trials (RCTs), observational studies, diagnosis, statistical pitfalls, high dimensional modeling
Statistical Problems to Avoid	Common statistical issues to think about in the design, analysis, and publication of research
Common Statistical Misconceptions	List of statistical misconceptions with sources to disprove each misconception
Regression Modeling Strategies	Comprehensive dive into regression modeling covers linear, logistic, and ordinal regression as well as survival analysis with lots of case studies
Communicating Frequentist Results	Proper way to describe treatment effects according to frequentist theory
Bayesian Re-analysis of Toss Up Clinical Trial	How to interpret clinical trials that slightly miss p < .05 cutoff and bayesian methods to re-analyze trials
Statistics Glossary	Glossary of Statistical Terms
Observed Power should be avoided	Issues with observed power overestimating the study's true power, misleading researchers
Observed Power Simulation Study	Simulation study showing problems with observed power
Criticisms of ROC curves for medical decision-making	Problems with using ROC curves to evaluate the quality of medical decision-making models
Table 1 P-values in RCTs	Baseline differences should not be tested in RCTs as differences are already due to chance from randomization
Pseudo R^2 Explanation	Intuitive explanation behind OLS R^2 and generalizes this intuition to pseduo R^2 measures for binary outcomes
GLMs Explained	Introduction to Generalized Linear Models (GLMs)
Multicollinearity and Omitted Variable Bias	Distinction between collinearity and omitted variable bias
Cox PH Diagnostics	Guide to checking Cox PH regression assumptions
Dealing with Unbalanced Datasets	Why resampling techniques should be avoided when analyzing datasets with unbalanced outcomes
Statistical Rethinking with `brms`	Uses `brms` and `tidyverse` code instead of base R
Interpretting P-Value Histogram	How to interpret results from running several statistical tests before adjusting for multiple comparisons
Correct Confidence Intervals For GLMs	Gavin Simpson's blog shows to how compute confidence intervals for GLM models that obey the constraints of the response variable
Beyond Multiple Linear Regression	Statistics textbook explaining GLMs and multi-level models
Mixed Models with R	Explanation of mixed effects models with R code
Using Mixture Models for Clustering	Tutorial using `mixtools` R package
Applied Statistics for Experimental Biology	Textbook with an in depth look at linear modeling for biological data

Machine Learning

Resource	Description
ML Resources	Tons of resources on all areas of machine learning including theory, practice, causal inference, DL, NLP, and RL

Visualization

Resource	Description
Data Visualization with ggplot2	Theory and practice of data visualization using `ggplot2`
ggplot2 Categorical Heatmaps	How to plot heatmaps showing categorical variable in `ggplot2`
Dynamite Plots Should Be Avoided	Why dynamite plots should be discarded and replaced with boxplots or violin plots
Colorspace Color Library	More color options with `colorspace` R package

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

methods_resources.md

methods_resources.md

Methods Resources

Bioinformatics

Statistics

Machine Learning

Visualization

Files

methods_resources.md

Latest commit

History

methods_resources.md

File metadata and controls

Methods Resources

Bioinformatics

Statistics

Machine Learning

Visualization