Skip to content

Latest commit

 

History

History
68 lines (62 loc) · 9.38 KB

methods_resources.md

File metadata and controls

68 lines (62 loc) · 9.38 KB

Methods Resources

Resources that cover methods and procedures for bioinformatics, statistics, and machine learning.

Bioinformatics

Resource Description
High Throughput Sequencing - StatQuest Josh Starmer explains RNASeq, ChiPSeq, DESeq2, PCA, Expression Heatmaps, RPKM/FPKM/TPM and more
RNASeq RPKM vs FPKM vs TPM Reviews different RNASeq units common in the literature
RNASeq Between Sample Normalization Why FPKM and TPM aren't comparable between samples and library normalization is needed
Single-Cell RNASeq Data in R Tutorial explaining how to use Bioconductor packages to efficiently manipulate single cell data
Broad Single-Cell RNASeq Tutorial 2019 Broad workshop on how to analyze sc-RNAseq data
Computational Genomics with R Great introduction to different areas of bioinformatics using R
RNA-Seq Differential Expression Walkthrough Harvard Broad workshop that covers the entire DEG workflow: sample QC, count normalization, DE analysis, visualization, and functional analysis.
Microarray Data Analysis Workshop covering microarray DE analysis for GEO data using limma.
FastQC Plots: Good vs Bad Plots Tutorial describing how to interpret FastQC plots with examples of good and bad data
CoMutPlotter: Visualize Mutation Data Online tool to visualize mutation heatmaps and include clinical characteristics for a sequencing cohort
RNA Aligners Strandedness Settings Details different settings for strandedness in RSEM, Salmon, Kallisto, HTSeq-Counts and others
RNASeq Strandedness Explanation Explains unstranded vs stranded vs reverse stranded and how to use RSeQC to check the strandedness of FASTQ data
Common Clustering Mistakes Discusses issues with clustering biological data
Should I remove PCR duplicates from RNA-Seq Why we DO NOT remove PCR duplicates from RNA-Seq
Sequenza Walkthrough Workshop covering how to use Sequenza and explaining outputs with more detail than the official Sequenza tutorial

Statistics

Resource Description
StatQuest Khan Academy style videos by Josh Starmer on distributions, parameters, linear models, machine learning, bioinformatics, and R
Statistical Tests Are Linear Models Common statistical tests such as the t-test and ANOVA are just specialized forms of linear models
Flexible Imputation For Missing Data Statistical aspects of missing data. Covers MCAR/MAR/MNAR assumptions, complete-case analysis, and MICE imputation methods
Modern Statistics For Modern Biology Intro statistics for bioinformatics applications. Covers distributions, PCA, hypothesis testing, RNASeq analysis, machine learning, experimental design
Biostatistics For Biomedical Research Intro biostatistics, regression models, randomized controlled trials (RCTs), observational studies, diagnosis, statistical pitfalls, high dimensional modeling
Statistical Problems to Avoid Common statistical issues to think about in the design, analysis, and publication of research
Common Statistical Misconceptions List of statistical misconceptions with sources to disprove each misconception
Regression Modeling Strategies Comprehensive dive into regression modeling covers linear, logistic, and ordinal regression as well as survival analysis with lots of case studies
Communicating Frequentist Results Proper way to describe treatment effects according to frequentist theory
Bayesian Re-analysis of Toss Up Clinical Trial How to interpret clinical trials that slightly miss p < .05 cutoff and bayesian methods to re-analyze trials
Statistics Glossary Glossary of Statistical Terms
Observed Power should be avoided Issues with observed power overestimating the study's true power, misleading researchers
Observed Power Simulation Study Simulation study showing problems with observed power
Criticisms of ROC curves for medical decision-making Problems with using ROC curves to evaluate the quality of medical decision-making models
Table 1 P-values in RCTs Baseline differences should not be tested in RCTs as differences are already due to chance from randomization
Pseudo R^2 Explanation Intuitive explanation behind OLS R^2 and generalizes this intuition to pseduo R^2 measures for binary outcomes
GLMs Explained Introduction to Generalized Linear Models (GLMs)
Multicollinearity and Omitted Variable Bias Distinction between collinearity and omitted variable bias
Cox PH Diagnostics Guide to checking Cox PH regression assumptions
Dealing with Unbalanced Datasets Why resampling techniques should be avoided when analyzing datasets with unbalanced outcomes
Statistical Rethinking with brms Uses brms and tidyverse code instead of base R
Interpretting P-Value Histogram How to interpret results from running several statistical tests before adjusting for multiple comparisons
Correct Confidence Intervals For GLMs Gavin Simpson's blog shows to how compute confidence intervals for GLM models that obey the constraints of the response variable
Beyond Multiple Linear Regression Statistics textbook explaining GLMs and multi-level models
Mixed Models with R Explanation of mixed effects models with R code
Using Mixture Models for Clustering Tutorial using mixtools R package
Applied Statistics for Experimental Biology Textbook with an in depth look at linear modeling for biological data

Machine Learning

Resource Description
ML Resources Tons of resources on all areas of machine learning including theory, practice, causal inference, DL, NLP, and RL

Visualization

Resource Description
Data Visualization with ggplot2 Theory and practice of data visualization using ggplot2
ggplot2 Categorical Heatmaps How to plot heatmaps showing categorical variable in ggplot2
Dynamite Plots Should Be Avoided Why dynamite plots should be discarded and replaced with boxplots or violin plots
Colorspace Color Library More color options with colorspace R package