diff --git a/include/cheat_sheet_plots/cc-by.svg b/include/cheat_sheet_plots/cc-by.svg new file mode 100644 index 0000000..e44c25f --- /dev/null +++ b/include/cheat_sheet_plots/cc-by.svg @@ -0,0 +1,155 @@ + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/include/cheat_sheet_plots/p1.png b/include/cheat_sheet_plots/p1.png new file mode 100644 index 0000000..4c528f6 Binary files /dev/null and b/include/cheat_sheet_plots/p1.png differ diff --git a/include/cheat_sheet_plots/p2_1.png b/include/cheat_sheet_plots/p2_1.png new file mode 100644 index 0000000..db060f2 Binary files /dev/null and b/include/cheat_sheet_plots/p2_1.png differ diff --git a/include/cheat_sheet_plots/p2_2.png b/include/cheat_sheet_plots/p2_2.png new file mode 100644 index 0000000..92a98f5 Binary files /dev/null and b/include/cheat_sheet_plots/p2_2.png differ diff --git a/include/cheat_sheet_plots/p2_3.png b/include/cheat_sheet_plots/p2_3.png new file mode 100644 index 0000000..b22fb1d Binary files /dev/null and b/include/cheat_sheet_plots/p2_3.png differ diff --git a/include/cheat_sheet_plots/p3.png b/include/cheat_sheet_plots/p3.png new file mode 100644 index 0000000..15af358 Binary files /dev/null and b/include/cheat_sheet_plots/p3.png differ diff --git a/include/cheat_sheet_plots/p4.png b/include/cheat_sheet_plots/p4.png new file mode 100644 index 0000000..f7930db Binary files /dev/null and b/include/cheat_sheet_plots/p4.png differ diff --git a/include/cheat_sheet_plots/p5.png b/include/cheat_sheet_plots/p5.png new file mode 100644 index 0000000..225cd20 Binary files /dev/null and b/include/cheat_sheet_plots/p5.png differ diff --git a/include/cheat_sheet_plots/p6.png b/include/cheat_sheet_plots/p6.png new file mode 100644 index 0000000..0bf2370 Binary files /dev/null and b/include/cheat_sheet_plots/p6.png differ diff --git a/index.Rmd b/index.Rmd index 9142b05..13412fc 100644 --- a/index.Rmd +++ b/index.Rmd @@ -85,7 +85,7 @@ print_df = function(D, ``` -This document is summarised in the table below. It shows the linear models underlying common parametric and "non-parametric" tests. Formulating all the tests in the same language highlights the many similarities between them. Get it [as an image](linear_tests_cheat_sheet.png) or [as a PDF](linear_tests_cheat_sheet.pdf). +This document is summarised in the table below. It shows the linear models underlying common parametric and "non-parametric" tests. Formulating all the tests in the same language highlights the many similarities between them. Get it [as an image](linear_tests_cheat_sheet.png) or [as a PDF](linear_tests_cheat_sheet.pdf) or [as a web page](linear_tests_cheat_sheet.html). *** diff --git a/index.html b/index.html index 3ece0d1..404535f 100644 --- a/index.html +++ b/index.html @@ -7,6 +7,7 @@ + @@ -27,7 +28,7 @@ - + @@ -107,7 +108,6 @@ } img { max-width:100%; - height: auto; } .tabbed-pane { padding-top: 12px; @@ -181,49 +181,10 @@ } - - - - + + + +
+
+

Common statistical tests are linear models

+

Last updated: 28 June, 2019.Also check out the Python version!

+
+
+

See worked examples and more details at the accompanying + notebook:
https://lindeloev.github.io/tests-as-linear

+
+
+ +
+
+
Common name
+
Built-in function R
+
Equivalent linear model in R
+
Exact?
+
The linear model in words
+
Icon
+ +
+ + Simple regression: lm(y ~ 1 + x) + + + +
+ +
+
y is independent of x
P: One-sample t-test
N: Wilcoxon signed-rank
+

t.test(y)
wilcox.test(y)
+

lm(y ~ 1)
lm(signed_rank(y) ~ 1)
+ +

One number (intercept, i.e., the mean) predicts y.
- (Same, but it predicts the signed rank of y.)
+
p1
+
+ +
+
P: Paired-sample t-test
N: Wilcoxon matched pairs
+
t.test(y1, y2, paired=TRUE)
wilcox.test(y1, y2, paired=TRUE)
+
lm(y2 - y1 ~ 1)
lm(signed_rank(y2 - y1) ~ 1)
+ +
One intercept predicts the pairwise y 2 -y 1 differences.
- (Same, but it predicts the signed rank of y 2 -y 1 .)
+
p2_1p2_2p2_3
+
+ +
+
y ~ continuous x
P: Pearson correlation
N: Spearman correlation
+

cor.test(x, y, method='Pearson')
cor.test(x, y, method=Spearman')
+

lm(y ~ 1 + x)
lm(rank(y) ~ 1 + rank(x))
+ +

One intercept plus x multiplied by a number (slope) predicts y.
- (Same, but with ranked x and y)
+
p3
+
+ +
+
y ~ discrete x
P: Two-sample t-test
P: Welch’s t-test
N: Mann-Whitney U
+
+
t.test(y1, y2, var.equal=TRUE)
t.test(y1, y2, var.equal=FALSE)
wilcox.test(y1, y2)
+
+
+
lm(y ~ 1 + G2)A
gls(y ~ 1 + G2, weights=...B)A
lm(signed_rank(y) ~ 1 + G2)A
+
+ +

An intercept for group 1 (plus a difference if group 2) predicts y.
- (Same, but with one variance per group instead of one common.)
- (Same, but it predicts the signed rank of y.)) +
+
p4
+
+ +
+ + Multiple regression: lm(y~1+x1+x2+...) + + +
+ +
+
P: One-way ANOVA
N: Kruskal-Wallis
+
aov(y ~ group)
kruskal.test(y ~ group)
+
+ lm(y ~ 1 + G2 + G3 + .. + Gn)A
lm(rank(y) ~ 1 + G2 + G3 + .. + Gn)A
+ +
An intercept for group 1 (plus a difference if group ≠ 1) predicts y.
- (Same, but it predicts the rank of y.)
+
p5
+
+ +
+
P: One-way ANCOVA
+
aov(y ~ group + x)
+
lm(y ~ 1 + G2 + G3 + ... + Gn + x)A
+
+
- (Same, but plus a slope on x.)
Note: this is discrete AND continuous. ANCOVAs are ANOVAs with a continuous x.
+
p6
+
+ +
+
P: Two-way ANOVA
+
aov(y ~ group * sex)
+
lm(y ~ 1+G2+G3+...+Gn+ + S2+S3+...+Sk+ + G2*S2+G3*S3+...+Gn*Sk)
+
+
Interaction term: changing sex changes the y ~ group parameters.
+ Note: G2 ... Gn is an indicator (0 or 1) for each non-intercept levels of the group variable. Similarly for S2 ... Sk for sex. The first line (with Gi ) is main effect of group, the second (with Si ) for sex and the third is the group * sex interaction. For two levels (e.g. male/female), line 2 would just be S2 and line 3 would be S2 multiplied with each Gi. +
+
[Coming]
+
+ +
+
Counts ~ discrete x
N: Chi-square test
+

chisq.test(groupXsex_table)
+
Equivalent log-linear model
glm(y ~ 1+G2+G3+...+Gn+ + S2+S3+...+Sk+ + G2*S2+G3*S3+...+Gn*Sk, family=...)A
+
+
Interaction term: (Same as Two-way ANOVA.)
Note: Run glm using the following arguments: glm(model, family=poisson())
As linear-model, the Chi-square test is log(y_i) = log(N) + log(alpha_i) + log(beta_j) + + log(alpha_i beta_j),where alpha_i and beta_j are proportions. See more info in the accompanying notebook.
+
Same as Two-way ANOVA
+
+ +
+
N: Goodness of fit
+
chisq.test(y)
+
glm(y ~ 1 + G2 + G3 + ... + Gn, family=...)A
+
+
(Same as One-way ANOVA and see Chi-Square note.)
+
1W-ANOVA
+
+
+ +

List of common parametric (P) non-parametric (N) tests and equivalent linear models. The notation y ~ 1 + x is R shorthand for y = 1·b + a·x which most of us learned in school. Models in similar colors are highly similar, but + really, notice how similar they all are across colors! For non-parametric models, the linear models are reasonable approximations for non-small sample sizes (see "Exact" column and click links to see simulations). Other less accurate approximations exist, e.g., Wilcoxon for the sign test and Goodness-of-fit for the binomial test. The signed rank function is signed_rank = function(x) sign(x) * rank(abs(x)). The variables Gi and Si are "dummy + coded" indicator variables (either 0 or 1) exploiting the fact that when Δx = 1 between categories the difference equals the slope. Subscripts (e.g., G2 or y1) indicate different columns in data. lm requires long-format data for all non-continuous models. All of this is exposed in greater detail and worked examples at https://lindeloev.github.io/tests-as-linear. + +

+ + + + \ No newline at end of file