-
Notifications
You must be signed in to change notification settings - Fork 1
/
manuscript.Rmd
executable file
·620 lines (420 loc) · 71.6 KB
/
manuscript.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
---
title : "Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003)"
shorttitle : "Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003)"
author:
- name : "**Lincoln J Colling**"
affiliation : "1"
corresponding : yes
address : "School of Psychology, University of Sussex, BN1 9QH, Brighton, United Kingdom"
email : "[email protected]"
- name : "**Dénes Szűcs**"
affiliation : "1"
corresponding : yes
address : "Downing Street, CB2 3EB, Cambridge, UK"
email : "[email protected]"
- name : "Damiano De Marco"
affiliation : "1, 12"
- name : "Krzysztof Cipora"
affiliation : "2"
- name : "Rolf Ulrich"
affiliation : "2"
- name : "Hans-Christoph Nuerk"
affiliation : "2"
- name : "Mojtaba Soltanlou"
affiliation : "2"
- name : "Donna Bryce"
affiliation : "2"
- name : "Sau-Chin Chen"
affiliation : "3"
- name : "Philipp Alexander Schroeder"
affiliation : "4"
- name : "Dion T Henare"
affiliation : "5"
- name : "Christine K Chrystall"
affiliation : "5"
- name : "Paul M Corballis"
affiliation : "5"
- name : "Daniel Ansari"
affiliation : "6"
- name : "Celia Goffin"
affiliation : "6"
- name : "H Moriah Sokolowski"
affiliation : "6"
- name : "Peter JB Hancock"
affiliation : "7"
- name : "Ailsa E Millen"
affiliation : "7"
- name : "Stephen RH Langton"
affiliation : "7"
- name : "Kevin J Holmes"
affiliation : "8"
- name : "Mark S Saviano"
affiliation : "8"
- name : "Tia A Tummino"
affiliation : "8"
- name : "Oliver Lindemann"
affiliation : "9"
- name : "Rolf A Zwaan"
affiliation : "9"
- name : "Jiří Lukavský"
affiliation : "10"
- name : "Adéla Becková"
affiliation : "11"
- name : "Marek A Vranka"
affiliation : "11"
- name : "Simone Cutini"
affiliation : "12"
- name : "Irene Cristina Mammarella"
affiliation : "12"
- name : "Claudio Mulatti"
affiliation : "12"
- name : "Raoul Bell"
affiliation : "13"
- name : "Axel Buchner"
affiliation : "13"
- name : "Laura Mieth"
affiliation : "13"
- name : "Jan Philipp Röer"
affiliation : "14, 2"
- name : "Elise Klein"
affiliation : "15"
- name : "Stefan Huber"
affiliation : "15"
- name : "Korbinian Moeller"
affiliation : "15,2"
- name : "Brenda Ocampo"
affiliation : "16"
- name : "Juan Lupiáñez"
affiliation : "17"
- name : "Javier Ortiz-Tudela"
affiliation : "17"
- name : "Juanma De la fuente"
affiliation : "17"
- name : "Julio Santiago"
affiliation : "17"
- name : "Marc Ouellet"
affiliation : "17"
- name : "Edward M Hubbard"
affiliation : "18"
- name : "Elizabeth Y Toomarian"
affiliation : "18"
- name : "Remo Job"
affiliation : "19"
- name : "Barbara Treccani"
affiliation : "19"
- name : "Blakeley B McShane"
affiliation : "20"
affiliation:
- id : "1"
institution : "Department of Psychology, University of Cambridge"
- id : "2"
institution : "Department of Psychology, University of Tübingen"
- id : "3"
institution : "Department of Human Development and Psychology, Tzu-Chi University"
- id : "4"
institution : "Department of Psychiatry and Psychotherapy, University of Tübingen"
- id : "5"
institution : "School of Psychology, University of Auckland"
- id : "6"
institution : "Department of Psychology & Brain and Mind Institute, The University of Western Ontario"
- id : "7"
institution : "Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, UK"
- id : "8"
institution : "Department of Psychology, Colorado College"
- id : "9"
institution : "Department of Psychology, Education & Child Studies, Erasmus University Rotterdam, Netherlands"
- id : "10"
institution : "Institute of Psychology of the Czech Academy of Sciences"
- id : "11"
institution : "Department of Psychology, Faculty of Arts, Charles University"
- id : "12"
institution : "Department of Developmental Psychology, University of Padova"
- id : "13"
institution : "Department of Experimental Psychology, Heinrich Heine University Düsseldorf"
- id : "14"
institution : "Department of Psychology and Psychotherapy, Witten/Herdecke University"
- id : "15"
institution : "Leibniz-Institut für Wissensmedien, Tübingen"
- id : "16"
institution : "School of Psychology, The University of Queensland"
- id : "17"
institution : "Research Center for Mind, Brain, and Behavior, University of Granada"
- id : "18"
institution : "Department of Educational Psychology, University of Wisconsin-Madison"
- id : "19"
institution : "Department of Psychology and Cognitive Science, University of Trento"
- id : "20"
institution : "Kellogg School of Management, Northwestern University"
abstract: |
The attentional spatial-numerical association of response codes (Att-SNARC) effect (Fischer, Castel, Dodd, & Pratt, 2003)---the finding that participants are quicker to detect left-side targets when the targets are preceded by small numbers and quicker to detect right-side targets when they are preceded by large numbers---has been used as evidence for *embodied* number representations and to support strong claims about the link between number and space (e.g., a mental number line). We attempted to replicate Experiment 2 of Fischer et al. by collecting data from 1105 participants at 17 labs. Across all 1105 participants and four interstimulus-interval conditions, the proportion of times the effect we observed was positive (i.e., directionally consistent with the original effect) was .50. Further, the effects we observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. Given this, we conclude that we failed to replicate the effect reported by Fischer et al. In addition, our analysis of several participant-level moderators (finger-counting habits, reading and writing direction, handedness, and mathematics fluency and mathematics anxiety) revealed no substantial moderating effects. Our results indicate that the Att-SNARC effect cannot be used as evidence to support strong claims about the link between number and space.
bibliography : ["cited.bib"]
floatsintext : yes
figurelist : no
tablelist : no
footnotelist : no
linenumbers : no
mask : no
draft : no
papersize : "A4"
documentclass : "apa6"
classoption : "man"
biblio-style : "apa"
output:
papaja::apa6_pdf:
citation_package: biblatex
latex_engine: xelatex
keep_tex: TRUE
includes:
after_body: "appendix.tex"
header-includes:
- \renewcommand\appendixname{Supplementary Results}
- \usepackage{subcaption}
- \usepackage{caption}
- \usepackage{makecell}
- \DeclareLanguageMapping{english}{english-apa}
- \DeclareBibliographyCategory{asterisk}
- \renewbibmacro*{begentry}{\ifcategory{asterisk}{\ensuremath{\ast}}{}}
- \newcommand*{\nocitemeta}[1]{\nocite{#1}\addtocategory{asterisk}{#1}}
- \usepackage{fontspec}
- \setmainfont{Tinos}
- \usepackage{float}
- \floatplacement{figure}{htp}
---
```{r setup, include = FALSE}
library("papaja")
library("tidyverse")
library("ReplicationProjectTools")
library("reticulate")
library("glue")
library("pwr")
library("magick")
library("R.matlab")
library("lme4")
library("nlme")
library("forestplot")
library("kableExtra")
#install.packages("kableExtra")
#use_python(system('which python3',intern = T))
#source_python(file = "../get_citation_count.py")
```
```{r analysis-preferences}
#citationCounts = get_citation() # obtain the latest citation counts!
citationCounts.counts = here::here("other_info","cites.txt") %>% read_lines() %>% stringr::str_split(pattern = "'",simplify = T) %>% .[[2]]
citationCounts.date = here::here("other_info","cites.txt") %>% read_lines() %>% stringr::str_split(pattern = "'",simplify = T) %>% .[[4]]
citationCounts = c(citationCounts.counts,citationCounts.date)
# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed, echo = FALSE, warning = FALSE)
```
```{r message=FALSE, warning=FALSE, include=FALSE, paged.print=FALSE}
require(knitr)
rd <- papaja::printp
readr::read_csv(here::here("data/processed_data/ExclusionsTable.csv")) -> Exclusions.Table
readr::read_csv(here::here("data/processed_data/EyeTrackerTable.csv")) -> EyeTrackerTable
# Do a little preprocessing
# Calculate the number of labs using an eye-tracker
EyeTrackerTable %>% group_by(Node) %>% summarise(LabsWithEye = sum(HasEye)) %>% filter(LabsWithEye > 0) %>% pull(Node) %>% length() -> LabsWithEye
# Calculate the perfect, partial, any ReplicationProjectTools
readr::read_csv(here::here("data/meta_data/model1.meta.csv")) %>% mutate(se = sqrt(v)) %>% select(-v) %>% mutate(lower = y - (qnorm(.95) * se), upper = y + (qnorm(.95) * se)) %>% rename(mean = y) %>% mutate(sig = sign(lower) == 1) -> individual.replications
individual.replications %>% select(LabID,DependentVariable,sig) %>% spread(DependentVariable,sig) %>% select(LabID,d250,d500,d750,d1000) %>% mutate(perfect = d250 == FALSE & d500 == TRUE & d750 == TRUE & d1000 == FALSE) -> perfect.replications
perfect.replications %>% filter(perfect == FALSE) %>% mutate(partial = d500 == TRUE | d750 == TRUE) -> partial.replications
individual.replications %>% select(LabID,DependentVariable,sig) %>% spread(DependentVariable,sig) %>% select(LabID,d250,d500,d750,d1000) %>% mutate(any = d250 == TRUE | d500 == TRUE | d750 == TRUE | d1000 == TRUE) -> any.effects
load("manuscript.RData")
# This will need to be generated somewhere
Fischer.estimates = read.csv(file.path(here::here(),"other_info/Original_estimates.csv")) %>% glue_data("{sprintf('%.2f',RT)} ms at the {delay} ms ISI condition")
#Fischer.estimates = read.csv(file.path(here::here(),"other_info/Original_estimates.csv")) %>% glue_data("{sprintf('%.2f',RT)} (90% CI [{sprintf('%.2f',RT - (qnorm(.95) * se))}, {sprintf('%.2f',RT + (qnorm(.95) * se))}]) ms at the {delay} ms ISI condition")
#read.csv(file.path(here::here(),"other_info/all_estimates.csv")) %>% pull(sd) %>% median() %>% ceiling() -> median.sd
```
# Introduction
A foundational issue in cognitive science is the question of how people represent concepts. Classical approaches to cognitive science, exemplified by Fodor's [-@Fodor1975] *language-of-thought hypothesis* and Newell and Simon's [-@Newell1976] *physical-symbol-systems hypothesis*, view representations as abstract or amodal and as distinct from sensorimotor processing. In contrast to these traditional views, a range of other views that go under labels such as *embodied*, *situated*, or *grounded* cognition maintain that representations (a) are intimately linked to sensorimotor processing [see, e.g., @Wilson2002six for an overview], (b) are analog rather than symbolic, and (c) represent by resembling their targets in some sense [e.g., see @Gladzijewski2017; @Williams2018].
One area of research that has provided a wealth of empirical findings valuable for debates about this issue has been numerical cognition. In fact, @Fischer2011em referred to numerical cognition as the "prime example of embodied cognition." In particular, they pointed to tasks examining spatial-numerical associations to make their case.
Researchers have long reasoned that numbers might be represented in a spatially organized manner (Galton, 1880), for example, as a *mental number line* [e.g., @Restle:1970km]. Key support for this notion comes from a series of nine parity-judgment experiments conducted by @Dehaene:1993fc. In their experiments, @Dehaene:1993fc asked participants to judge whether a number was odd or even and reported that responses to large numbers were faster when participants pressed a right-hand key rather than a left-hand key, whereas the opposite was true for small numbers. They labeled this number-magnitude-by-response-side interaction the spatial-numerical association of response codes (SNARC) effect.
In these experiments, there was no standard with which to compare the presented number. Consequently, whether a particular number was responded to more quickly with the left hand or the right hand was not determined by the absolute magnitude of the number, but rather by the relative magnitude of the number within a stimulus set. Thus, the number 5 was responded to more quickly with the left hand when it appeared in a set of numbers ranging from 4 to 9 but more quickly with the right hand when it appeared in a set of numbers ranging from 0 to 5 [e.g., @Dehaene:1993fc; @Fias:1996ms].
@Dehaene:1993fc reported that the effect was dependent on neither the handedness of participants nor the hand used to make the response, but instead depended on the side of space of the response: When participants' hands were crossed, responses to small numbers were quicker with the right hand than with the left [however, see, @Woods:2006cp]. Nonetheless, @Dehaene:1993fc did report that the effect was dependent on participants' reading and writing direction. Specifically, although they reported finding the effect in experiments with French participants, who had experience reading and writing from left to right, they also reported failing to find the effect in an experiment with Iranian participants, who had experience reading and writing from right to left (see @Shaki:2009ch and @Zebine). Together, the results from the nine experiments reported in @Dehaene:1993fc were taken to support the idea of a mental number line and the association of numbers of increasing magnitude with the left-to-right axis of external space.
Although the SNARC effect appears to be robust (see @Wood:2008rev and @Toomarian:2018rev for recent reviews), the great range of findings has resulted in debate about mechanism. One such debate concerns whether the SNARC effect is produced by early, response-independent mechanisms or whether processes at the stage of response selection are responsible. According to theories that place the origin of the SNARC effect at an early stage, the mere observation of a number should be sufficient to activate the spatial code because the spatial code is intimately connected to the numerical representation. Consequently, these theories make the strongest claims about the link between number and space. Theories that place the origin of the SNARC effect at the response-selection stage, however, make weaker claims about the connection between number and space. As @PecherBoot:2011 noted, if the response-selection stage gives rise to the SNARC effect, then no underlying spatial-numerical representation need be assumed.
Most recent work has tended to support the notion that the response-selection stage is the locus of the SNARC effect. In particular, Keus and colleagues have used both behavioral [@Keus:2005ho] and psychophysiological [@Keus:2005jh] evidence to argue in favor of a later, response-related origin of the SNARC effect. Further support comes from a computational model that relies on task-dependent conceptual coding of the number at a stage distinct from the numerical representation itself [@Gevers:2006model].
In addition, response-polarity-related accounts break the link between a number, space, and the SNARC effect. For example, @Proctor:2006jv argued that on binary classification tasks, items in the task set are coded as being positive or negative in polarity. Response selection can then be facilitated when there is a structural overlap between the polarity of the item (the number in the case of the SNARC effect) and the response. Thus, perceptual or conceptual overlap between the stimulus and response dimensions is not required for the SNARC effect to occur. In short, @Gevers:2006model model and @Proctor:2006jv account do not rely on the notion of a mental number line or sensorimotor-linked representations.
A range of empirical findings support these types of accounts. For example, @Santens:2008 reported that SNARC-like effects can be produced when left-right responses are replaced with unimanual close-far responses; small numbers are associated with close responses, and large numbers are associated with far responses. Further, @Landy:2008 reported that verbal "yes" and "no" responses on a parity-judgment task were facilitated by large numbers and small numbers, respectively.
Finally, still other researchers have argued in favor of a working memory account of the SNARC effect. For example, in an experiment reported by @vanDijck:2011kk, participants performed a fruit/vegetable classification task after having been encouraged to store the stimuli as an ordered set in working memory. Specifically, a sequence of fruit and vegetable names was displayed in the center of the computer screen, and participants were tested on the order of the items. Then, in a subsequent classification task, responses to items that had appeared early in the sequence were faster if made with the left hand rather than the right hand, and responses to items that had appeared later in the sequence were faster if made with the right hand rather than the left hand. The authors argued that this working memory account can also explain why SNARC-like effects emerge for other kinds of ordinal sequences, such as months of the year [@Gevers:2003je] or days of the week [@Gevers:2004gj], as well as why spatial-numerical associations can be moderated by giving participants instructions to associate numbers with positions on a clockface (1--5 on the right and 6--10 on the left) rather than on a ruler [1--5 on the left and 6--10 on the right; @Bachtold:1998].
Given that several competing accounts of the SNARC effect exist and that many of these accounts do not require a mental number line, one may doubt whether spatial-numerical associations provide evidence for anything like "embodied" number representations or number representations that are intimately linked with space. However, there is evidence that does support an early, response-independent locus for the SNARC effect and thus does provide support for the notion of a mental number line and spatially linked number representation---the modified version of Posner's [-@Posner] attentional cuing task developed by @Fischer:2003ju. In Fischer et al.'s experiment, participants were asked to press a single response button whenever a lateralized target, a white circle, appeared, regardless of whether it appeared on the left or the right. The target was always preceded by either a small number (1 or 2) or a large number (8 or 9), which was unrelated to the subsequent location of the target. Because the response was not lateralized, response-related effects were not possible. Results from this paradigm were consistent with the SNARC effect, as participants were quicker to detect left-side targets when they were preceded by small numbers and quicker to detect right-side targets when they were preceded by large numbers, at least when the numbers and targets were separated by an interstimulus interval (ISI) between 250 and 1000 ms. This finding---named the attentional SNARC (Att-SNARC) effect---suggests that viewing a number can cue spatial attention either to the left or to the right depending on the magnitude of the number.
Because the Att-SNARC effect is strong evidence in favor of an early, response-independent locus for the mechanism underlying the SNARC effect, the Att-SNARC effect plays a crucially important role in adjudicating debates about the origin of the SNARC effect and the nature of number representations. As a result, Fischer et al.'s original finding has been extremely influential (e.g., cited 746 times according to Google Scholar as of May 15, 2020). However, subsequent attempts to replicate the effect have produced a wide array of results.
@Galfano:2006cu reported a so-called statistically significant effect for left-side targets when the data were aggregated over ISI conditions of 500 and 800 ms and a one-tailed test was employed, estimate = 6 ms, *t*(25) = 1.75, *p* = .046 (reported as *p* = .04). They also reported a statistically significant effect for right-side targets when the data were aggregated over these two ISI conditions and a one-tailed test was employed, but the claimed statistical significance reflected a reporting error, estimate = 5 ms, *t*(25) = 1.59, *p* = .062 (reported as *p* = .04). Although it is possible to obtain a point estimate for each of the ISI conditions with the data aggregated over the left- and right-side targets (500-ms ISI: 8 ms; 800-ms ISI: 4 ms), the corresponding variances and test statistics for these estimates were not reported and cannot be obtained from what was reported.
@Ristic:2006cr reported a statistically significant effect when the data were aggregated over six ISI conditions ranging from 350 to 800 ms and over the left- and right-side targets, estimate = 3.79 ms (unreported; obtained via digitization of the figure), *F*(1, 17) = 5.48, *p* = .032 (reported as *p* \< .05). Although it is possible, via digitization of the figure, to obtain a point estimate for each of the six ISI conditions with the data aggregated over the left- and right-side targets (350-ms ISI: 11.24 ms; 400-ms ISI: 2.81 ms; 500-ms ISI: --1.44 ms; 600-ms ISI: 6.17 ms; 700-ms ISI: 6.05 ms; 800-ms ISI: --2.17 ms), the corresponding variances and test statistics for these estimates were not reported and cannot be obtained from what was reported.
@Dodd:2008dv reported a statistically significant effect when the data were aggregated over three ISI conditions ranging from 250 to 750 ms and over the left- and right-side targets, but the claimed statistical significance reflected a reporting error, estimate = 5.5 ms (unreported), *F*(1, 29) = 4.05. *p* = .054 (reported as *p* \< .05). They also reported statistically significant effects for the 500-ms ISI condition for left-side targets, estimate = 16 ms, *t*(29) = 2.48, *p* = .010 (reported as *p* \< .05), and for right-side targets, estimate = 6 ms, *t*(29) = 2.34, *p* = .013 (reported as *p* \< .05). Although it is possible to obtain a point estimate for each of the three ISI conditions with the data aggregated over the left- and right-side targets (250-ms ISI: 6 ms; 500-ms ISI: 11 ms; 750-ms ISI: --0.5 ms), the variances and test statistics for these estimates were not reported and cannot be obtained from what was reported.
@Salillas2008 reported a so-called statistically nonsignificant effect for a 450-ms ISI condition when the data were aggregated over the left- and right-side targets, estimate = 7.5 ms, *F*(1, 11) = 1.3, *p* = .28 (reported as "ns"). Additionally, @Ranzini2009 reported a statistically nonsignificant effect when the data were aggregated over three ISI conditions ranging from 300 to 500 ms and over the left- and right-side targets, estimate = 3 ms (unreported; obtained via digitization of the figure), *F*(1, 14) = 4.1, *p* = .06. Point estimates and variances and test statistics for such estimates for the three ISI conditions with the data aggregated over the left- and right-side targets were not reported and cannot be obtained from what was reported.
More recently, @vanDijck2014 reported a statistically nonsignificant effect when the data were aggregated over four ISI conditions ranging from 250 to 1000 ms and over the left- and right-side targets, estimate = 1 ms (unreported; obtained via digitization of the figure), reported *F(1, 42)* \< 1.05, reported *p* \> .37. Point estimates and variances and test statistics for such estimates for the four ISI conditions with the data aggregated over the left- and right-side targets were not reported and cannot be obtained from what was reported. In a second experiment, @vanDijck2014 also reported a statistically nonsignificant effect when the data were aggregated over three ISI conditions ranging from 100 to 700 ms and over the left- and right-side targets, estimate = --2.5 ms (unreported; obtained via digitization of the figure), *F*(1, 28) = 2.94, *p* = .097 (no estimates were reported). Point estimates and variances and test statistics for such estimates for the three ISI conditions with the data aggregated over the left- and right-side targets were not reported and cannot be obtained from what was reported.
@Zanolie:2014jr reported a statistically nonsignificant effect when the data were aggregated over four ISI conditions ranging from 250 to 1000 ms and over the left- and right-side targets, estimate = 0.5 ms (unreported; obtained via digitization of the figure), *F*(1, 19) = 0.03, *p* = .863. Although it is possible to obtain a point estimate for each of the four ISI conditions with the data aggregated over the left- and right-side targets (250-ms ISI: --1 ms; 500-ms ISI: 2 ms; 750-ms ISI: 5 ms; 1000-ms ISI: --4 ms), the variances and test statistics for these estimates were not reported and cannot be obtained from what was reported. In a second experiment, @Zanolie:2014jr also reported a statistically nonsignificant effect when the data were aggregated over the same four ISI conditions and over the left- and right-side targets, estimate = --1.5 ms (unreported; obtained via digitization of the figure), *F*(1, 23) = 0.17, *p* = .686. Although it is possible to obtain a point estimate for each of the four ISI conditions with the data aggregated over the left- and right-side targets (250-ms ISI: --2 ms; 500-ms ISI: 5 ms; 750-ms ISI: --3 ms; 1000-ms ISI: --6 ms), the variances and test statistics for these estimates were not reported and cannot be obtained from what was reported.
Finally, @Fattorini2015 reported a statistically nonsignificant effect when the data were aggregated over 500-ms and 700-ms ISI conditions and over the left- and right-side targets, estimate = 2 ms (unreported; obtained via digitization of the figure), *F*(1, 59) = 1.69, *p* = .20. Point estimates and variances and test statistics for such estimates for the two ISI conditions with the data aggregated over the left- and right-side targets were not reported and cannot be obtained from what was reported. In a second experiment, @Fattorini2015 also reported a statistically nonsignificant effect when the data were aggregated over four ISI conditions ranging from 250 to 1000 ms and over the left- and right-side targets, estimate = --1.75 ms (unreported; obtained via digitization of the figure), *F*(1, 31) = 1.5, *p* = .22. Although it is possible to obtain a point estimate for each of the four ISI conditions with the data aggregated over the left- and right-side targets (250-ms ISI: --2 ms; 500-ms ISI: --1 ms; 750-ms ISI: --2 ms; 1000-ms ISI: --2 ms), the variances and test statistics for these estimates were not reported and cannot be obtained from what was reported.
A natural approach to assessing these various attempts to replicate the Att-SNARC effect would involve synthesizing the evidence across all published studies of the effect via meta-analysis. This would allow for, among other things, the estimation of an overall average effect size, the heterogeneity in effect sizes across studies, and the effects of potential moderators at the study level or otherwise. However, this approach is complicated because (a) the statistical significance (or nonsignificance) of a study's results typically affects whether or not the study is published, which results in a set of published studies that is not representative, and (b) meta-analytic results are biased when the set of studies analyzed is not representative [@McSBocHan16; @ioannidis:eff]. Instead, the Registered Replication Report (RRR) format pursued in the present study provides an ideal means of assessing the Att-SNARC effect because in an RRR, results from all participating labs are included in the meta-analysis regardless of their statistical significance or nonsignificance. Further, preregistration of the primary hypotheses and statistical analyses mitigates some potential biases.
An additional benefit of the RRR format is that it allows for the investigation of potential moderators not previously considered, which might shed new light on mechanism and perhaps also the wide array of results observed in the various attempts to replicate the Att-SNARC effect. Consequently, in addition to replicating the experimental protocol of Fischer et al., we investigated several variables that could potentially moderate the Att-SNARC effect: finger-counting habits, reading and writing direction, handedness, and mathematics ability and mathematics anxiety (see Fischer [@Fischer:2006er; @Fischer:2008bv], @Fischer:2014kz, @Georges:2016gn, and @Shaki:2009ch for details and conjectures).
Before proceeding, we note that alternative accounts of the effect reported by Fischer et al. have been suggested. These include, for example, accounts based on working memory [@vanDijck2014]. We also note that manipulations that make explicit associations between number and space have been able to produce Att-SNARC-like effects [e.g., @Fattorini2015, experiment 3]. Because these alternative accounts and additional manipulations have theoretical implications for the Att-SNARC effect that differ that originally proposed and our focus is on the latter, we do not consider them here.
# Disclosures
## Preregistration
This study was preregistered. All relevant documentation is available on the Open Science Framework (OSF) at <https://osf.io/he5za/>
## Data, materials, and online resources
The data and materials are available on OSF at <https://osf.io/he5za/>. Links to the lab-specific pages of all participating labs are available on OSF at <https://osf.io/7zyxj>. Data and scripts to re-create the manuscript are available on a companion website at <http://git.colling.net.nz/attentional_snarc/>. An archived version of the companion website is available at <https://doi.org/10.5281/zenodo.3738555>.
## Reporting
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.
## Ethical approval
All participating labs obtained ethical approval in accordance with their local requirements, and the research was carried out in according with the Declaration of Helsinki.
# Methods
## Sample size
Each participating lab was required to provide a target sample size no smaller than 60 participants and a stopping rule (see the lab-specific pages for details). We chose 60 participants as the minimum because, as required for RRRs, it provides high power conditional on a hypothetical assumed effect size of `r round(pwr::pwr.t.test(n = 60, d = .4, type = "one", alternative = "greater")$power,2)` for a one-tailed test at α = .05, conditional on an effect size of 0.4 on the standardized Cohen's *d* scale, about the midpoint of previously published estimates. This value corresponds to a raw effect size of 6 ms assuming a between-participants standard deviation of 15 ms, again about the midpoint of previously published estimates.
Because of time constraints, not all labs were able to reach the minimum target of 60 participants (see Table 1 for the sample size achieved by each lab). However, given the sample sizes actually achieved, and again conditional on an effect size of 0.4 on the standardized Cohen's *d* scale, a statistically significant effect would be expected in 93% of the labs (i.e., about 16). Thus, if 0.4 is a reasonable estimate of the effect size and there are no substantial moderators of the effect, statistically significant effects would be expected not only at the meta-analytic level but also at the level of the individual lab.
## Materials
The participating labs all had (a) a testing station, such as a room or a cubicle, where participants could undertake the experiment without distraction; (b) a computer for presenting stimuli and recording responses; (c) a chin rest or similar device to ensure that participants remained a set distance from the computer monitor; and (d) a tape measure used to calibrate distance from the screen. Five labs also optionally made use of an eye tracker to record participants' eye movements during the attentional-cuing task (see the lab-specific pages for details).
An instruction booklet detailing how to perform the setup and calibration procedure and the finger-counting assessment was provided to the labs. These materials were initially written in English, but each lab conducted the experiment in the predominant language of its locale. Thus, the experiment was also conducted in German, Dutch, Czech, Spanish, Italian, and Chinese. All materials were translated from English into these other languages and then independently back-translated into English to ensure accuracy.
All materials including translations are available on OSF (see <https://osf.io/7zyxj/>). To perform analyses, we used R [Version 3.5.1; @R-base] and the R packages *bindrcpp* [Version 0.2.2; @R-bindrcpp], *checkmate* [Version 1.8.5; @R-checkmate], *dplyr* [Version 0.7.6; @R-dplyr], *forcats* [Version 0.3.0; @R-forcats], *forestplot* [Version 1.7.2; @R-forestplot], *ggplot2* [Version 3.0.0; @R-ggplot2], *glue* [Version 1.3.0; @R-glue], *kableExtra* [Version 0.9.0; @R-kableExtra], *knitr* [Version 1.20; @R-knitr], *lme4* [Version 1.1.18.1; @R-lme4], *magick* [Version 1.9; @R-magick], *magrittr* [Version 1.5; @R-magrittr], *Matrix* [Version 1.2.14; @R-Matrix], *nlme* [Version 3.1.137; @R-nlme], *papaja* [Version 0.1.0.9842; @R-papaja], *purrr* [Version 0.2.5; @R-purrr], *pwr* [Version 1.2.2; @R-pwr], *R.matlab* [Version 3.6.2; @R-R.matlab], *readr* [Version 1.1.1; @R-readr], *reticulate* [Version 1.10; @R-reticulate], *stringr* [Version 1.3.1; @R-stringr], *tibble* [Version 1.4.2; @R-tibble], *tidyr* [Version 0.8.1; @R-tidyr], and *tidyverse* [Version 1.2.1; @R-tidyverse].
## Procedure
We employed an experimental paradigm based on Experiment 2 of @Fischer:2003ju. We chose Experiment 2 over Experiment 1 because Experiment 2 had fewer ISI conditions and because the results were statistically significant in a greater proportion of the conditions. Before starting data collection, each lab performed a monitor calibration procedure using a supplied calibration script. This procedure involved measuring the viewing distance from the computer monitor and the size of standard stimuli presented on the screen (see <https://osf.io/2m4ad/> for details). After participants provided informed consent, they were seated in front of the monitor with their chin placed in a chin rest that was located a fixed distance from the monitor (set during the calibration procedure), and then data collection commenced.
The standard trial structure, which was identical to that of Fischer et al. and did not include timing modifications for the eye tracker (see the Eye-Tracking Protocol subsection for details), is shown in Figure \@ref(fig:Trial). The initial display on each trial consisted of a centrally located white fixation point (0.2° diameter) flanked by two white outline boxes (1° $\times$ 1°), all on a black background. The centers of the boxes were located 5° from the center of the fixation point. This initial display was shown for 500 ms. Next, a digit (1, 2, 8, or 9; height of 0.75°) replaced the fixation point for a fixed duration of 300 ms. After the digit was removed, the fixation point reappeared. Finally, a circular white target (0.7° diameter) appeared in either the left- or the right-side box after a variable duration (250 ms, 500 ms, 750 ms, or 1000 ms) on target trials, and no target appeared on catch trials.
Target trials ended after a response was made or 1000 ms after the onset of the target, whichever came first. Catch trials ended 1000 ms after the digit was removed. Trials advanced automatically, separated by an intertrial interval of 1000 ms.
Participants responded to the appearance of the target by pressing the space bar with their preferred hand. When a participant responded before the target appeared or responded on a catch trial, the trial ended, and the following warning appeared: "Too quick! Please wait until the target appears in a box before pressing SPACE" [English version]. When a participant failed to respond on a target trial, the following warning was presented: "Too slow! Please press SPACE as soon as the target appears." Participants who erred on more than 5% of trials were excluded from analyses.
Participants performed a total of 800 trials (640 target trials and 160 catch trials), split into five blocks of 160 trials each, with 128 target trials and 32 catch trials per block; the trials in each block were evenly divided across the four ISI conditions, four digits, and two target locations, and the order of presentation was random.
(ref:Trial) Trial structure for target trials and catch trials. The initial display on each trial consisted of a centrally located white fixation point flanked by two white outline boxes, all on a black background. Next, a digit replaced the fixation point. After the digit was removed, the fixation point reappeared. Finally, a circular white target appeared in either the left- or the right-side box after a variable duration on target trials, and no target appeared on catch trials. Target trials ended after a response was made or 1000 ms after the onset of the target, whichever came first. Catch trials ended 1000 ms after the digit was removed. Trials advanced automatically, separated by an intertrial interval of 1000 ms.
```{r Trial, echo=FALSE, fig.cap="(ref:Trial)", out.width="\\textwidth"}
knitr::include_graphics("trialstructure")
```
## Eye-tracking protocol
Code implementing an eye-tracking protocol using an EyeLink 1000 (SR Research, Ottawa, Ontario, Canada) eye tracker was provided to all labs and is available at Github (<https://github.com/ljcolling/FischerRRR-eyetracking>). Of the five labs that optionally made use of an eye tracker, one used a different eye tracker; this lab has provided information regarding deviations from the standard protocol on its lab-specific page. The standard nine-point grid was used for calibration and validation at the start of each block and when required during a block. The start of a trial was triggered after the detection of 500 ms of stable fixation within a 2° box centered on the fixation point. If the system could not detect a stable fixation within a 2000-ms time window, the calibration process was repeated. After the digit was presented, and before the target appeared, the gaze position was monitored, and any deviations outside a 1° box centered on the fixation point were recorded. Any deviations toward the lateral boxes that exceeded 2° resulted in the trial being marked as contaminated. These trials were excluded from primary analyses; however, they were analyzed separately in an attempt to determine any possible effect of eye movements on the results.
## Finger counting
To assess finger-counting fluency, we used a task derived from that developed by @Lucidi:2014gn. Participants were asked to read aloud four sentences while counting the number of syllables in each. Because reading aloud prevents verbalizing counting, most participants needed to resort to finger counting while sounding out the syllables. For each sentence, the experimenter recorded the first finger and first hand the participant used. Although most participants used their fingers for the task, some participants adopted a different strategy. Participants who failed to engage in finger counting after two sentences were prompted to do so. Details of the prompting were recorded in lab logs (see the lab-specific pages for details).
The results from the finger-counting task were used to place participants into five groups: consistent left-starters, consistent right-starters, inconsistent left-starters, inconsistent right-starters, and others. This classification was determined not only by participants' hand choices, but also by how consistently they engaged in finger counting. The consistent left-starters and consistent right-starters included those participants who counted using a hand on all four occasions and started on the same hand on at least three of them. The inconsistent left-starters and inconsistent right-starters included participants who counted using a hand on two or three occasions and started on the same hand on at least two of them. The *other* group included all remaining participants (e.g., those who did not count on their fingers, those who counted on their fingers only once, and those who counted an equal number of times with each hand).
## Reading/writing direction
To assess reading and writing direction, we used a simple question asking participants if they had experience with languages that are written exclusively from left to right (e.g., English and German), with languages that are not written exclusively from left to right (e.g., Hebrew), or with languages of both types (see <https://osf.io/dqnkq/> for details). For the Chinese version of this question, participants were asked if they had experience with languages that are usually written horizontally, with languages that are usually written vertically, or with languages of both types (see <https://osf.io/r3fhx/> for details). Responses to this question were used to place participants into two groups: exclusively left-to-right readers-writers and not exclusively left-to-right readers-writers. Participants who selected the first option were placed in the left-to-right readers-writers group, and all the remaining participants were placed in the not-exclusively left-to-right readers-writers group.
## Handedness
To assess handedness, we used Nicholls, Thomas, Loetscher, and Grimshaw's [-@Nicholls:2013ha] 10-item questionnaire. In labs conducting the experiment in a language other than English, the questionnaire was translated, and some questions were replaced with more culturally appropriate versions when required (see https://osf.io/r3fhx/ for details).
## Mathematics assessment
To assess mathematics fluency, we used the short mathematics assessment employed by @Tibber:2013ho. This test is adapted from the Mathematics Calculation Subtest of the Woodcock-Johnson III Tests of Cognitive Abilities [@Woodcock:1989ww]. It contains 25 multiple-choice mathematics questions requiring addition, subtraction, multiplication, and division. Participants had 30 s to select the response on each trial; the timing was controlled by the computer software. A countdown timer was stationed in the top left of the screen to inform participants of the time remaining. The 25 questions were split into five sets of 5 questions each. Two errors on a single set or errors on consecutive sets terminated the test. The final score was the total number of correct answers.
## Mathematics anxiety
To assess mathematics anxiety, we used the Abbreviated Math Anxiety Scale [AMAS; @Hopko:2003et]. The AMAS contains nine questions that ask participants to rate (on a scale from 1 to 5) how anxious they would feel during particular events, including thinking of an upcoming mathematics test, taking a mathematics examination, and listening to a mathematics lecture. In labs conducting the experiment in a language other than English, the AMAS was translated. The final score was the sum of the individual ratings; possible scores ranged from 9 (low anxiety) to 45 (high anxiety).
## Exit questionnaire
An exit questionnaire that asked participants to describe the purpose of the experiment was used to determine whether they had guessed its purpose. Participants who guessed correctly, as judged by the experimenter, were excluded from primary analyses; however, their data were analyzed separately to determine whether guessing the experiment's purpose moderated the Att-SNARC effect.
## Exclusion criteria
Participants who committed errors on more than 5% of the catch trials, who correctly guessed the purpose of the experiment, or who did not undertake all tasks were excluded from the analysis.
## Analysis
The dependent variables of interest were the congruency effects in the four ISI conditions (i.e., 250 ms, 500 ms, 750 ms, and 1000 ms). The congruency effect was defined as the average difference in response time between congruent and incongruent trials; congruent trials were defined as trials with left-side targets preceded by low digits (1 or 2) and trials with right-side targets preceded by high digits (8 or 9), and incongruent trials were defined as trials with left-side targets preceded by high digits and trials with right-side targets preceded by low digits. A positive value for the congruency effect indicates that participants were faster responding on congruent trials than on incongruent trials, and a negative value indicates the reverse.
We analyzed our data via multilevel multivariate meta-analytic models [@McSBoc18]. Such models have at least two advantages over the standard random-effects meta-analytic model. First, they can take account of the dependence between multiple dependent variables (here, the congruency effect in each of the four ISI conditions). Second, rather than assuming a simple two-level structure, with participants nested within labs, they can take account of more complex nesting structures (here, participants nested within moderator groups, such as consistent left-starters, consistent right-starters, etc., and moderator groups nested within labs). In short, the standard approach necessitates treating several variance components as zero, and thereby makes unwarranted independence assumptions.
For each analysis, we considered several simplifications of the equal-allocation multilevel multivariate compound-symmetry specification detailed in @McSBoc18; we also considered an equal-variance version of the single-correlation equal-allocation multilevel multivariate compound-symmetry specification that, in the notation of that article, sets the σ~*d*,*d*~ equal for all dependent variables *d* (i.e., the congruency effect in each of the four ISI conditions). We chose among the six specifications using Akaike's information criterion [AIC; @Aki74].
In analyzing moderators, it is ideal to consider them all jointly within a single model. Unfortunately, data sparsity precluded this. When the moderators were considered jointly, many combinations of them resulted in either zero or very few participants per moderator group in each lab. Indeed, this was also the case for some moderators when considered alone (i.e., reading and writing direction and handedness; see Tables S4 and S6, respectively, in the Supplemental Material). Consequently, we consider each moderator separately.
For models featuring no moderators (Model 1) or discrete moderators (finger counting, reading and writing direction, and handedness; Models 2--4, respectively), for simplicity we analyzed the data at the moderator-group level, as per @McSBoc18, using data from moderator groups not precluded for reasons of data sparsity. For the model featuring continuous moderators (mathematics fluency and mathematics anxiety; Model 5), this was not possible, so we analyzed the data at the participant level using an analogous specification (see the Model 5 subsection for details) and using data from all participants. Our motivation for considering these moderators follows.
### Model 1: No Moderators
@Fischer:2003ju reported a positive congruency effect. The purpose of Model 1 was to assess this reported effect by replicating the analysis performed by @Fischer:2003ju and consequently, this model did not take account of any moderators.
### Model 2: Finger counting
Recent work suggests that spatial-numerical compatibility effects in general [@Fischer:2008bv]---including attentional-cuing effects in response to numbers [@Fischer:2014kz]---might be moderated by finger-counting behavior. Specifically, this work suggests that these effects are stronger among people who start finger counting on the left hand and weaker or possibly even reversed among those who start finger counting on the right hand. The purpose of Model 2 was to assess this possibility, and consequently this model took account of the finger-counting moderator.
This model used only data from participants who consistently engaged in finger counting and consistently started on the same hand, that is, participants categorized as consistent left-starters or consistent right-starters. We restricted the analysis to these two groups principally because if finger-counting behavior has an effect, we would expect it to be most prominent in participants whose finger-counting habits are clear and unambiguous.
### Model 3: Reading/writing direction
Recent work suggests that the congruency effect might be weaker or possibly even reversed among people who have experience with languages that are not read and written exclusively from left to right [@Fischer:2008bv; @Shaki:2009ch]. The purpose of Model 3 was to assess this possibility, and consequently this model took account of the reading-and-writing-direction moderator. Specifically, participants were placed into two groups according to their responses on the reading-writing questionnaire: those who read and wrote exclusively left to right and those who did not.
### Model 4: Handedness
The purpose of Model 4 was to assess whether handedness moderates the congruency effect, and consequently this model took account of the handedness moderator. Specifically, participants were classified as left-handed or right-handed according to their responses on the handedness questionnaire.
### Model 5: Mathematics fluency and mathematics anxiety
Recent work suggests that numerical abilities [@Fischer:2006er] and mathematics anxiety [@Georges:2016gn] may influence the strength of spatial-numerical associations. The purpose of Model 5 was to assess this possibility, and consequently this model jointly took account of both mathematics fluency and mathematics anxiety, as measured by the math test and AMAS, respectively. Specifically, we employed a multilevel model with fixed effects included for the full set of ISI Condition $\times$ Math Test $\times$ AMAS interactions, and random effects included for each participant, for each ISI condition for each lab (with equal variance and zero correlation), and for the math test, the AMAS, and the Math Test $\times$ AMAS interaction for each lab (independently).
### Secondary analyses
The purpose of our secondary analyses was to assess whether insight into the purpose of the experiment or eye movements moderated the congruency effect. Specifically, Model 1 was estimated separately on data from participants who correctly guessed the purpose of the experiment and also separately on data from eye-movement-contaminated trials of participants with contaminated trials in every combination of ISI and congruency condition.
# Results
## Replication operationalisation
According to the common definition of replication employed in practice, a subsequent experiment has successfully replicated a prior experiment if the results from the two experiments either (a) failed to attain statistical significance or (b) were directionally consistent and attained statistical significance. This definition has been applied analogously in large-scale replication projects such as the present one by comparing the statistical significance (or nonsignificance) of the results from a meta-analysis of the replication studies with the statistical significance (or nonsignificance) of the results from the original study.
However, the null-hypothesis significance-testing paradigm upon which this operationalization of replication is based has been the subject of no small amount of criticism over the decades [@Rozenboom; @Meehl1978; @Cohen1994; @Gelman2003; @McShane2016; @McShane2017], and recent calls to abandon it abound [@Amrhein; @mcshane2019; @Wasserstein; @AmrheinGreendlandMcShane]. Further, recent work discussing alternative statistical paradigms specifically in the context of replication [@CollingSzucs] has called for a better understanding of how statistical inference relates to scientific inference. A key point is that any assessment of whether a theory is supported by data depends on whether the magnitude of the observed effect is consistent with the theory [@Gelman:effect]. Consequently, in assessing replication, we distinguish between *statistical hypotheses* and *scientific hypotheses* and focus on that latter, specifically in light of the scientific hypothesis advanced by @Fischer:2003ju.
## Exclusions
In total, `r LabNames %>% length() %>% english::as.english()` labs contributed data from `r Exclusions.Table$Total %>% sum()` participants; `r Exclusions.Table$Total %>% sum() - Exclusions.Table$Analysed %>% sum()` were excluded as per our exclusion criteria, which left a total of `r Exclusions.Table$Analysed %>% sum()`. See Table \@ref(tab:Exclusions) for details of the total number of participants recruited by each lab, the number included in the analysis, and the number excluded for each reason; the technical-error category includes those participants who were excluded for having incomplete data because of, for example, equipment failure or experimenter error.
`r LabsWithEye %>% english::Words()` labs used an eye tracker for at least some of their participants. Table \@ref(tab:Eyedetail) in the Supplemental Material shows the number of participants in each of these labs tested with an eye tracker, the number of participants whose data were analyzed in our secondary analysis of trials contaminated by eye movement, and the number of such contaminated trials in each combination of ISI condition and congruency condition.
```{r Exclusions, results='asis'}
Exclusions.Table %>% mutate_all(linebreak) %>% kable("latex", escape = F, booktabs = T, caption = "Total number of participants, number analysed, number excluded for reasons of technical error, number excluded for more than 5\\% catch trial errors, and number excluded for guessing the purpose of the experiment for each lab.", digits = 0, linesep = "", col.names = linebreak(c("Lab","Total\nParticipants","Analysed\nParticipants","Technical\nError","Catch Trial\nError","Guessed\nPurpose"), align = 'c'), align = c('l','c','c','c','c','c')) %>% kableExtra::kable_styling(latex_options = c("hold_position"))
```
```{r}
model1.fe.stats %>% glue::glue_data("{Estimate} ms (90% CI[{sprintf('%.2f',Estimate - (qnorm(.95) * `Std. Err.`))}, {sprintf('%.2f',Estimate + (qnorm(.95) * `Std. Err.`))}])") ->
model1.report
round((model1.fe.stats$`Std. Err.` * sqrt(model1.fe.stats$`$n$`))[2]) -> obs.sd
dd %>% select(variable,value,SubjectID) %>% group_by(variable) %>% summarise(M = mean(value), SD = sd(value)) %>% mutate(variable = case_when(variable == "d250" ~ "250 ms", variable == "d500" ~ "500 ms", variable == "d750" ~ "750 ms", variable == "d1000" ~ "1000 ms")) %>% glue::glue_data("a mean of {sprintf(fmt = '%.2f', M)} ms and a standard deviation of {sprintf(fmt = '%.2f', SD)} ms at the {variable} ISI condition") -> prelimtext
dat %>% select(d250,d500,d500,d750,d1000) %>% as.matrix() %>% cor() -> cor.matrix
(dd %>% pull(value) %>% sign() %>% table(deparse.level = F) ) / ((dd %>% pull(value) %>% sign() %>% table(deparse.level = F) ) %>% sum()) -> con.total
(dd %>% filter(variable == "d250") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) / ((dd %>% filter(variable == "d250") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) %>% sum()) -> con.250
(dd %>% filter(variable == "d500") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) / ((dd %>% filter(variable == "d500") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) %>% sum()) -> con.500
(dd %>% filter(variable == "d750") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) / ((dd %>% filter(variable == "d750") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) %>% sum()) -> con.750
(dd %>% filter(variable == "d1000") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) / ((dd %>% filter(variable == "d1000") %>% pull(value) %>% sign() %>% table(deparse.level = F) ) %>% sum()) -> con.1000
paste0.final<-function(x,collapse = ", ", final = ", and "){
paste(c(paste(x[1:end(x)[1]-1], collapse = collapse),x[end(x)[1]]), collapse = final)
}
props = dd %>% select(value, SubjectID, variable) %>% mutate(value = sign(value)) %>% group_by(SubjectID) %>% summarise(value = sum(value == 1)) %>% group_by(value) %>% summarise(n = n()) %>% mutate(n = n / sum(n)) %>% mutate(n = sprintf(fmt = "%.2f",n)) %>% pull(n)
```
## Preliminary analyses
Across all `r dd %>% pull(SubjectID) %>% unique() %>% length()` participants and four ISI conditions, the congruency effect we observed had a mean of `r dd %>% pull(value) %>% mean() %>% sprintf(fmt = "%.2f")` ms and a standard deviation of `r dd %>% pull(value) %>% sd() %>% sprintf(fmt = "%.2f")` ms. In addition, across all `r dd %>% pull(SubjectID) %>% unique() %>% length()` participants, it had a `r prelimtext %>% .[[1]]`, `r prelimtext %>% .[[2]]`, `r prelimtext %>% .[[3]]`, and `r prelimtext %>% .[[4]]`. Further, across the six possible pairs of ISI conditions, the correlation had a mean of `r cor.matrix[upper.tri(cor.matrix, diag = F)] %>% mean() %>% sprintf(fmt = "%.2f")` (and a mean of `r cor.matrix[upper.tri(cor.matrix, diag = F)] %>% abs() %>% mean() %>% sprintf(fmt = "%.2f")` in magnitude).
Across all `r dd %>% pull(SubjectID) %>% unique() %>% length()` participants and four ISI conditions, the proportion of times the congruency effect we observed was positive was `r con.total[2] %>% unname() %>% sprintf(fmt = "%.2f")`. In addition, across all `r dd %>% pull(SubjectID) %>% unique() %>% length()` participants, this proportion was `r con.250[2] %>% unname() %>% sprintf(fmt = "%.2f")` in the 250-ms ISI condition, `r con.500[2] %>% unname() %>% sprintf(fmt = "%.2f")` in the 500-ms ISI condition, `r con.750[2] %>% unname() %>% sprintf(fmt = "%.2f")` in the 750-ms ISI condition, and `r con.1000[2] %>% unname() %>% sprintf(fmt = "%.2f")` in the 1000-ms ISI condition. Further, the number of ISI conditions with a positive congruency effect was zero for `r props[1]` of the participants, one for `r props[2]` of the participants, two for `r props[3]` of the participants, three for `r props[4]` of the participants, and four for `r props[5]` of the participants. All of these results are compatible with the relevant binomial distribution with probability parameter .5 (i.e., the distribution of the number of heads on tosses of a fair coin).
## Primary analyses
### Model 1: No moderators
The effects we observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. Specifically, Fischer et al. estimated an effect of `r Fischer.estimates %>% paste0.final()`. In contrast, Model 1 estimated effects of `r model1.fe.stats %>% pull(Estimate) %>% sprintf(fmt = "%.2f ms") %>% paste0.final()`, respectively, in the four ISI conditions.
Given these results in tandem with those of our preliminary analyses, we conclude that we failed to replicate the effect reported by @Fischer:2003ju.
The effects we observed were highly consistent across ISI conditions. They were also highly consistent across labs, perhaps surprisingly given a recent report---contrary to both substantive and statistical expectations---of a nontrivial degree of heterogeneity across labs in large-scale replication projects like the present study [@mcshane2019b]. Specifically, we estimated heterogeneity across labs at `r estimates1$EAMMCS[[m1.idx]]$vc$variance %>% sqrt() %>% sprintf(fmt = "%.2f")` ms---nonzero but practically unimportant for many purposes (see Table \@ref(tab:mod1) in the Supplemental Material for details). This result suggests that lab-level moderators are unlikely to have driven our results.
\begin{figure}[H]
\centering
\begin{subfigure}{.6\textwidth}
\includegraphics[]{d250}
\caption{250 ms ISI Condition}
\end{subfigure}
\begin{subfigure}{.5\textwidth}
\includegraphics[]{d500}
\caption{500 ms ISI Condition}
\end{subfigure}
\end{figure}
\begin{figure}[H]
\centering
\ContinuedFloat % continue from previous page
\begin{subfigure}{.5\textwidth}
\includegraphics[]{d750}
\caption{750 ms ISI Condition}
\end{subfigure}
\begin{subfigure}{.5\textwidth}
\includegraphics[]{d1000}
\caption{1000 ms ISI Condition}
\end{subfigure}
\caption{Summary of results from Experiment 2 of Fischer, Castel, Dodd, and Pratt (2003), each lab in the present study, and Model 1. Each panel presents the estimate for a given interstimulus-interval (ISI) condition: (a) 250 ms, (b) 500 ms, (c) 750 ms, and (d) 1000 ms. The squares give the effect observed in each lab in each ISI condition; the size of each square is inversely proportional to the sample size. The horizontal lines give the 90\% confidence interval (CI) in each lab in each ISI condition, and the diamond gives the Model 1 estimate and 90\% CI. Labs are identified by the last name of their first authors (as listed in the appendix); labs that used an eye tracker are marked with an asterisk. The effects observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. (2003). They were also highly consistent both across ISI conditions and across labs; the latter result suggests that lab-level moderators are unlikely to have driven our results.}\label{fig:model1}
\end{figure}
### Model 2: Finger counting
```{r metasum, echo=FALSE, fig.cap="(ref:metasum)", out.width=".8\\textwidth", fig.align="center"}
knitr::include_graphics("meta_summaryv4")
```
Model 2 was estimated on data from `r LS.n` consistent left-starters from 17 labs and `r RS.n` consistent right-starters from 17 labs. We summarize the results from Experiment 2 of @Fischer:2003ju along with results from Models 1 through 4 in Figure \@ref(fig:metasum). Although previous work suggests a stronger congruency effect among left-starters and a weaker or possibly even reversed effect among right-starters, Figure \@ref(fig:metasum) shows that finger counting had no substantial impact on the results. Specifically, the figure shows a minuscule congruency effect for each finger-counting group in each ISI condition and minuscule differences between congruency effects for the two finger-counting groups in each ISI condition (see Tables \@ref(tab:count) and \@ref(tab:mod2) in the Supplemental Material for details).
(ref:metasum) Summary of results from Experiment 2 of Fischer, Castel, Dodd, and Pratt (2003) and Models 1 through 4. Each panel presents the estimates for a given interstimulus-interval condition: from top to bottom, 250 ms, 500 ms, 750 ms, and 1000 ms. The squares give the point estimates, and the horizontal lines give 90% confidence intervals (CIs). The effects observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. (2003). They were also highly consistent across ISI conditions.
### Model 3: Reading/writing direction
```{r}
read.csv(file.path(here::here(),"data/meta_data/model3.meta.csv")) %>% select(ConditionDescription,LabID,Freq) %>% distinct() %>% group_by(ConditionDescription) %>% summarise(nlab = n(), n = sum(Freq)) %>% rename(Con = ConditionDescription) -> Model3.counts
```
Model 3 was estimated on data from `r Model3.counts %>% filter(Con == "LTR") %>% pull(n)` exclusively left-to-right readers-writers from `r Model3.counts %>% filter(Con == "LTR") %>% pull(nlab)` labs and `r Model3.counts %>% filter(Con == "NLR") %>% pull(n)` not exclusively left-to-right readers-writers from `r Model3.counts %>% filter(Con == "NLR") %>% pull(nlab)` labs. Although previous work suggests a weaker or possibly even reversed congruency effect among participants who have experience with languages that are not read and written exclusively from left to right. Figure \@ref(fig:metasum) shows that reading and writing direction had no substantial impact on the results. Specifically, the figure shows a minuscule effect for each reading-and-writing-direction group in each ISI condition and minuscule differences between the congruency effects for the two reading-and-writing direction groups in each ISI condition (see Tables \@ref(tab:read) and \@ref(tab:mod3) in the Supplemental Material for details).
### Model 4: Handedness
```{r}
read.csv(file.path(here::here(),"data/meta_data/model4.meta.csv")) %>% select(ConditionDescription,LabID,Freq) %>% distinct() %>% group_by(ConditionDescription) %>% summarise(nlab = n(), n = sum(Freq)) %>% rename(Con = ConditionDescription) -> Model4.counts
```
Model 4 was estimated on data from `r Model4.counts %>% filter(Con == "LH") %>% pull(n)` left-handed participants from `r Model4.counts %>% filter(Con == "LH") %>% pull(nlab)` labs and `r Model4.counts %>% filter(Con == "RH") %>% pull(n)` right-handed participants from `r Model4.counts %>% filter(Con == "RH") %>% pull(nlab)` labs. Figure \@ref(fig:metasum) shows that handedness had no substantial impact on the results. Specifically, the figure shows a minuscule effect for each handedness group in each ISI condition and minuscule differences between the congruency effects for the two handedness groups in each ISI condition (see Tables \@ref(tab:hand) and \@ref(tab:mod4) in the Supplemental Material for details).
### Model 5: Mathematics fluency and mathematics anxiety
```{r}
load(file.path(here::here(),"data/processed_rdata/results5.Rdata"))
```
Model 5 was estimated on data from 1105 participants from 17 labs. Although previous work suggests that mathematics fluency and mathematics anxiety might moderate congruency effects, we observed no substantial moderating effects (see Table \@ref(tab:mod5) in the Supplemental Material for details).
```{r}
read_csv(here::here("data/processed_data/EyeTrackerDetail.csv")) -> EyeTrackerDetail
EyeTrackerDetail %>% gather(ISI,N,-Lab,-Subjects,-Analysed,-`Trial Type`) %>% separate(N,c("Analysed.Trials","Total.Trials"),extra = "drop") -> EyeTrackerDetail.seperate
EyeTracker.Trials = EyeTrackerDetail.seperate$Analysed.Trials %>% as.numeric() %>% sum()
EyeTracker.Subj = EyeTrackerDetail.seperate %>% group_by(Lab) %>% summarise(Analysed = unique(Analysed)) %>% pull(Analysed) %>% sum()
EyeTracker.Labs = EyeTrackerDetail.seperate %>% pull(Lab) %>% unique() %>% length()
read_csv(here::here("data/meta_data/model1b.meta.csv")) -> GuessTable
GuessTable %>% group_by(LabID) %>% summarise(Freq = unique(Freq)) %>% pull(Freq) %>% sum() -> n.guess
GuessTable %>% pull(LabID) %>% unique() %>% length() -> n.guess.labs
```
## Secondary analyses
Model 1 was estimated separately on data from `r n.guess` participants (from `r n.guess.labs %>% english::words()` labs) who correctly guessed the purpose of the experiment and also separately on data from `r EyeTracker.Trials` eye-movement-contaminated trials of `r EyeTracker.Subj` participants (from `r EyeTracker.Labs %>% english::words()` labs) with contaminated trials in every combination of ISI and congruency condition. These analyses yielded no results of substantive interest (see the Supplemental Material for details).
# Discussion
The Att-SNARC effect has been used to argue for an early, response-independent, and automatic origin of the SNARC effect. If the SNARC effect is produced by early mechanisms, this would provide good evidence for embodied number representations and support strong claims about the link between number and space (e.g., a mental number line).
We attempted to replicate Experiment 2 of @Fischer:2003ju by collecting data from 1105 participants at 17 labs. Across these 1105 participants and four ISI conditions, the proportion of times the congruency effect we observed was positive was .50. Further, the effects we observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. Given this, we conclude that we failed to replicate the effect reported by Fischer et al.
The effects we observed were highly consistent both across ISI conditions and across labs; the latter result suggests that lab-level moderators are unlikely to have driven our results. In addition, our analyses of several participant-level moderators (finger-counting habits, reading and writing direction, handedness, and mathematics fluency and mathematics anxiety) revealed no substantial moderating effects.
We conclude with two important points. First, on the basis of the common definition of replication employed in practice, one might object that we did in fact successfully replicate @Fischer:2003ju, at least in the 500-ms ISI condition. In response, we argue that this objection illustrates one major flaw of that definition: Our result in the 500-ms ISI condition is manifestly incompatible with the analogous result of @Fischer:2003ju. In addition, we view a difference of about 1 ms, even if "real," as too small for any neurally or psychologically plausible mechanism---particularly one constrained to operate only within a narrow time window of 500 ms after the stimulus. That said, we recognize that some such mechanism could be subject to an arbitrarily large attenuation factor in any particular experimental paradigm, such as that of Fischer et al., and that potential new paradigms could reveal an effect. Nonetheless, even if such paradigms are forthcoming, we maintain on the basis of our results that the paradigm of Fischer et al. provides no evidence of such a mechanism.
Second, we note several limitations of the present study. First and foremost, although our results demonstrate that the Att-SNARC effect cannot be used as evidence to support the strong claims about the link between number and space discussed earlier, our results do not refute such accounts. Specifically, although one might, on the basis of our results, prefer accounts of the SNARC effect that do not imply a mental number line, the evidence for and against different claims about the SNARC effect must be viewed in its entirety. The Att-SNARC effect provides only one such piece of evidence---albeit a particularly strong and valuable one.
In addition, a set of limitations relates to our sample of participants. Our sample was recruited primarily from North America, Europe, and Australasia. Consequently, participants who read and wrote exclusively from left to right are overrepresented in our data. As reading and writing direction has been shown to strongly moderate spatial-numerical associations, it would have been preferable to have more participants with experience with languages that are not read and written exclusively from left to right. In addition, data sparsity precluded considering all moderators jointly in a single model, and thus we considered each moderator separately.
Finally, the finger-counting assessment we employed did not contain an explicit instruction to engage in finger counting. As a result, some participants employed finger counting inconsistently, and they were therefore excluded from the Model 2 analysis.
# Acknowledgements
LJC and DS are funded by James S. McDonnell Foundation 21st Century Science Initiative in Understanding Human Cognition (grant number 220020370; received by DS). We acknowledge the help of the original authors, in particular Martin Fischer and Jay Pratt. We also note this project would not have been possible without editor Alex Holcombe's patient and thoughtful help at every step of the process.
# Author contributions
L. J. Colling and D. Szűcs proposed the study. L. J. Colling programmed the experiments. L. J. Colling and B. B. McShane developed the analysis plan and conducted the analyses. L. J. Colling wrote an initial manuscript. L. J . Colling and B. B. McShane wrote revised and final manuscripts. All authors critically reviewed the manuscript by providing comments, feedback, and edits at all stages of writing, and all authors approved the final manuscript. All authors were involved in data collection. Authors from the contributing labs provided translated materials where required (see the appendix).
```{r}
papaja::render_appendix("appendix.Rmd")
```
# References