This repository has been archived by the owner on Jun 27, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
contrasts_kwdyz11.qmd
393 lines (302 loc) · 13.9 KB
/
contrasts_kwdyz11.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
---
title: Contrast Coding of Visual Attention Effects
jupyter: julia-1.9
---
```{julia}
#| code-fold: true
using Chain
using DataFrames
using MixedModels
using SMLP2023: dataset
using StatsBase
using StatsModels
import ProgressMeter
ProgressMeter.ijulia_behavior(:clear);
```
# A word of caution {#sec-caution}
For a (quasi-)experimental set of data, there is (or should be) a clear _a priori_ theoretical commitment to specific hypotheses about differences between factor levels and how these differences enter in interactions with other factors. This specification should be used in the first LMM and reported, irrespective of the outcome. If alternative theories lead to alternative _a priori_ contrast specifications, both analyses are justified. If the observed means render the specification completely irrelevant, the comparisons originally planned could still be reported in a Supplement).
In this script, we are working through a large number of different contrasts for the same data. The purpose is to introduce both the preprogrammed (“canned”) and the general options to specify hypotheses about main effects and interactions. Obviously, we do not endorse generating a plot of the means and specifying the contrasts accordingly. This is known as the [Texas sharpshooter](https://www.bayesianspectacles.org/origin-of-the-texas-sharpshooter/) fallacy. The link leads to an illustration and brief historical account by Wagenmakers (2018).
Irrespective of how results turn out, there is nothing wrong with specifying a set of post-hoc contrasts to gain a better understanding of what the data are trying to tell us. Of course, in an article or report about the study, the _a priori_ and post-hoc nature of contrast specifications must be made clear. Some kind of alpha-level adjustment (e.g., Bonferroni) may be called for, too. And, of course, there are grey zones.
There is quite a bit of statistical literature on contrasts. Two “local” references are @Brehm2022 and @Schad2020.
For further readings see “Further Readings” in @Schad2020.
# Example data {#sec-data}
We take the `KWDYZ` dataset from @Kliegl2011.
This is an experiment looking at three effects of visual cueing under four different cue-target relations (CTRs).
Two horizontal rectangles are displayed above and below a central fixation point or they displayed in vertical orientation to the left and right of the fixation point.
Subjects react to the onset of a small visual target occurring at one of the four ends of the two rectangles.
The target is cued validly on 70% of trials by a brief flash of the corner of the rectangle at which it appears; it is cued invalidly at the three other locations 10% of the trials each.
We specify three contrasts for the four-level factor CTR that are derived from spatial, object-based, and attractor-like features of attention. They map onto sequential differences between appropriately ordered factor levels.
We also have a dataset from a replication and extension of this study @Kliegl2015
Both data sets are available in [R-package RePsychLing](https://github.com/dmbates/RePsychLing/tree/master/data/)
# Preprocessing {#sec-preprocessing}
```{julia}
dat1 = DataFrame(dataset(:kwdyz11))
cellmeans = combine(
groupby(dat1, [:CTR]),
:rt => mean,
:rt => std,
:rt => length,
:rt => (x -> std(x) / sqrt(length(x))) => :rt_semean,
)
```
# Julia contrast options {#sec-contrasts}
We use the same formula for all analyses
```{julia}
#| output: false
form = @formula rt ~ 1 + CTR + (1 + CTR | Subj)
```
This is the default order of factor levels.
```{julia}
show(StatsModels.levels(dat1.CTR))
```
Controlling the ordering of levels for contrasts:
1. kwarg `levels` to order the levels
2. The first level is set as the baseline; with kwarg `base` a different level can be specified.
## SeqDiffCoding
The `SeqDiffCoding` contrast corresponds to `MASS::contr.sdif()` in R.
The assignment of random factors such as `Subj` to `Grouping()` is necessary when the sample size is very large. We recommend to include it always, but in this tutorial we do so only in the first example.
```{julia}
m1 = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => SeqDiffCoding(; levels),
:Subj => Grouping()
)
fit(MixedModel, form, dat1; contrasts)
end
```
## HypothesisCoding
`HypothesisCoding` is the most general option available. We can implement all "canned" contrasts ourselves. The next example reproduces the test statistcs from `SeqDiffCoding` - with a minor modification illustrating the flexibility of going beyond the default version.
```{julia}
m1b = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1 1 0 0
0 -1 1 0
0 0 1 -1
];
levels,
labels=["spt", "obj", "grv"],
),
)
fit(MixedModel, form, dat1; contrasts)
end
```
The difference to the preprogrammed `SeqDiffCoding` is that for the third contrast we changed the direction of the contrast such that the sign of the effect is positive when the result is in agreement with theoretical expectation, that is we subtract the fourth level from the third, not the third level from the fourth.
## DummyCoding
This contrast corresponds to `contr.treatment()` in R
```{julia}
m2 = let
contrasts = Dict(:CTR => DummyCoding(; base="val"))
fit(MixedModel, form, dat1; contrasts)
end
```
The `DummyCoding` contrast has the disadvantage that the intercept returns the mean of the level specified as `base`, default is the first level, not the GM.
## YchycaeitCoding
The contrasts returned by `DummyCoding` may be exactly what we want.
Can't we have them, but also have the intercept estimate the GM, rather than the mean of the base level? Yes, we can! We call this "You can have your cake and it eat, too"-Coding (YchycaeitCoding). And we use `HypothesisCoding` to achieve this outcome.
```{julia}
m2b = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1 1 0 0
-1 0 1 0
-1 0 0 1
];
levels,
labels=levels[2:end],
)
)
fit(MixedModel, form, dat1; contrasts)
end
```
We can simply relevel the factor or move the column with -1s for a different base.
```{julia}
m2c = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1/2 1/2 0 0
-1/2 0 1/2 0
-1/2 0 0 1/2
];
levels,
labels=levels[2:end],
)
)
fit(MixedModel, form, dat1; contrasts)
end
```
We can simply relevel the factor or move the column with -1s for a different base.
## EffectsCoding
This contrast corresponds almost to `contr.sum()` in R.
```{julia}
m3 = let
contrasts = Dict(:CTR => EffectsCoding(; base="dod"))
fit(MixedModel, form, dat1; contrasts)
end
```
The “almost” qualification refers to the fact that `contr.sum()` uses the last factor levels as default base level; `EffectsCoding` uses the first level.
```{julia}
m3b = let levels = [ "dod", "val", "sod", "dos"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1/4 3/4 -1/4 -1/4
-1/4 -1/4 3/4 -1/4
-1/4 -1/4 -1/4 3/4
];
levels,
labels=levels[2:end],
)
)
fit(MixedModel, form, dat1; contrasts)
end
```
```{julia}
m3c = let levels = [ "dod", "val", "sod", "dos"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1/2 3/2 -1/2 -1/2
-1/2 -1/2 3/2 -1/2
-1/2 -1/2 -1/2 3/2
];
levels,
labels=levels[2:end],
)
)
fit(MixedModel, form, dat1; contrasts)
end
```
## HelmertCoding
`HelmertCoding` codes each level as the difference from the average of the lower levels. With the default order of `CTR` levels we get the following test statistics. These contrasts are othogonal.
```{julia}
m4 = let
contrasts = Dict(:CTR => HelmertCoding())
fit(MixedModel, form, dat1; contrasts)
end
```
```sh
+ HeC1: (2 - 1)/2 # (391 - 358)/2
+ HeC2: (3 - (2+1)/2)/3 # (405 - (391 + 358)/2)/3
+ HeC3: (4 - (3+2+1)/3)/4 # (402 - (405 + 391 + 358)/3)/4
```
## Reverse HelmertCoding
`Reverse HelmertCoding` codes each level as the difference from the average of the higher levels. To estimate these effects we simply reverse the order of factor levels. Of course, the contrasts are also orthogonal.
```{julia}
m4b = let levels = reverse(StatsModels.levels(dat1.CTR))
contrasts = Dict(:CTR => HelmertCoding(; levels))
fit(MixedModel, form, dat1; contrasts)
end
```
```sh
+ HeC1:(3 - 4)/2 # (405 - 402)/2
+ HeC2:(2 - (3+4)/2)/3 # (391 - (405 + 402)/2)/3
+ HeC3:(1 - (2+3+4)/3/4 # (356 -(391 + 405 + 402)/3)/4
```
## Anova Coding
Factorial designs (i.e., lab experiments) are traditionally analyzed with analysis of variance. The test statistics of main effects and interactions are based on an orthogonal set of contrasts.
We specify them with `HypothesisCoding`.
### A(2) x B(2)
An A(2) x B(2) design can be recast as an F(4) design with the levels (A1-B1, A1-B2, A2-B1, A2-B2).
The following contrast specification returns estimates for the main effect of A, the main effect of B, and the interaction of A and B.
In a figure With A on the x-axis and the levels of B shown as two lines, the interaction tests the null hypothesis that the two lines are parallel.
A positive coefficient implies overadditivity (diverging lines toward the right) and a negative coefficient underadditivity (converging lines).
```{julia}
m5 = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1 -1 +1 +1 # A
-1 +1 -1 +1 # B
+1 -1 -1 +1 # A x B
];
levels,
labels=["A", "B", "AxB"],
),
)
fit(MixedModel, form, dat1; contrasts)
end
```
It is also helpful to see the corresponding layout of the four means for the interaction of A and B (i.e., the third contrast)
```
B1 B2
A1 +1 -1
A2 -1 +1
```
Thus, interaction tests whether the difference between main diagonal and minor diagonal is different from zero.
### A(2) x B(2) x C(2)
Going beyond the four level factor; it is also helpful to see the corresponding layout of the eight means for the interaction of A and B and C.
```
C1 C2
B1 B2 B1 B2
A1 +1 -1 A1 -1 +1
A2 -1 +1 A2 +1 -1
```
### A(2) x B(2) x C(3)
TO BE DONE
## Nested coding
Nested contrasts are often specified as follow up as post-hoc tests for ANOVA interactions. They are orthogonal. We specify them with `HypothesisCoding`.
An A(2) x B(2) design can be recast as an F(4) design with the levels (A1-B1, A1-B2, A2-B1, A2-B2).
The following contrast specification returns an estimate for the main effect of A and the effects of B nested in the two levels of A.
In a figure With A on the x-axis and the levels of B shown as two lines, the second contrast tests whether A1-B1 is different from A1-B2 and the third contrast tests whether A2-B1 is different from A2-B2.
```{julia}
m8 = let levels = ["val", "sod", "dos", "dod"]
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1 -1 +1 +1
-1 +1 0 0
0 0 +1 -1
];
levels,
labels=["do_so", "spt", "grv"],
),
)
fit(MixedModel, form, dat1; contrasts)
end
```
The three contrasts for one main effect and two nested contrasts are orthogonal.
There is no test of the interaction (parallelism).
# Other orthogonal contrasts
For factors with more than four levels there are many options for specifying orthogonal contrasts as long as one proceeds in a top-down strictly hiearchical fashion.
Suppose you have a factor with seven levels and let's ignore shifting columns.
In this case, you have six options for the first contrast, that is 6 vs. 1, 5 vs.2 , 4 vs. 3, 3 vs. 4, 2 vs. 5, and 1 vs. 6 levels.
Then, you specify orthogonal contrasts for partitions with more than 2 elements and so on.
That is, you don't specify a contrast that crosses an earlier partition line.
In the following example, after an initial 4 vs 3 partitioning of levels, we specify `AnovaCoding` for the left and `HelmertCoding` for the right partition.
```{julia}
contrasts = Dict(
:CTR => HypothesisCoding(
[
-1/4 -1/4 -1/4 -1/4 +1/3 +1/3 +1/3
-1/2 -1/2 +1/2 +1/2 0 0 0
-1/2 +1/2 -1/2 +1/2 0 0 0
+1/2 -1/2 -1/2 +1/2 0 0 0
0 0 0 0 -1 +1 0
0 0 0 0 -1/2 -1/2 1
];
levels=["A1", "A2", "A3", "A4", "A5", "A6", "A7"],
labels=["c567.1234", "B", "C", "BxC", "c6.5", "c6.56"],
),
);
```
There are two rules that hold for all orthogonal contrasts:
1. The weights within rows sum to zero.
2. For all pairs of rows, the sum of the products of weights in the same columns sums to zero.
# Appendix: Summary (Dave Kleinschmidt)
[StatsModels](https://juliastats.org/StatsModels.jl/latest/contrasts/)
StatsModels.jl provides a few commonly used contrast coding schemes, some less-commonly used schemes, and structs that allow you to manually specify your own, custom schemes.
## Standard contrasts
The most commonly used contrasts are `DummyCoding` and `EffectsCoding` (which are similar to `contr.treatment()` and `contr.sum()` in R, respectively).
## "Exotic" contrasts (rk: well ...)
We also provide `HelmertCoding` and `SeqDiffCoding` (corresponding to base R's `contr.helmert()` and `MASS::contr.sdif()`).
## Manual contrasts
**ContrastsCoding()**
There are two ways to manually specify contrasts.
First, you can specify them **directly** via `ContrastsCoding`.
If you do, it's good practice to specify the levels corresponding to the rows of the matrix, although they can be omitted in which case they'll be inferred from the data.
**HypothesisCoding()**
A better way to specify manual contrasts is via `HypothesisCoding`, where each row of the matrix corresponds to the weights given to the cell means of the levels corresponding to each column (see @Schad2020 for more information).