-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.tex
658 lines (548 loc) · 27.2 KB
/
index.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Response to Pachter's Review},
pdfauthor={Joshua Paik and Igor Rivin},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\BuiltInTok}[1]{#1}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
\newcommand{\ExtensionTok}[1]{#1}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ImportTok}[1]{#1}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\NormalTok}[1]{#1}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\RegionMarkerTok}[1]{#1}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}
%%% Change title format to be more compact
\usepackage{titling}
% Create subtitle command for use in maketitle
\providecommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}
\setlength{\droptitle}{-2em}
\title{Response to Pachter's Review}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\author{Joshua Paik and Igor Rivin}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{1/22/2020, Last Update: 1/28/2020 6:00 P.M. G.M.T.}
\begin{document}
\maketitle
{
\setcounter{tocdepth}{2}
\tableofcontents
}
\begin{quote}
\emph{``The value of preprints is in their ability to accelerate
research via the rapid dissemination of methods and discoveries.''} -
\href{https://liorpachter.wordpress.com/2019/10/21/zero-data-rna-seq/}{Lior
Pachter}
\end{quote}
\hypertarget{introduction}{%
\subsection{Introduction}\label{introduction}}
Lior Pachter published a
\href{https://liorpachter.wordpress.com/2020/01/17/diversity-matters/}{review}
of our \href{https://arxiv.org/pdf/2001.00670.pdf}{paper} claiming that
age was the greatest contributor to the observed difference in citations
between signers of Letters A, B, and C. Any characterization of our
paper which says we do not account for age is \emph{false}. In this
response, we will clarify a few points in our paper, repeat the relevant
analyses, and show that citations per year is an age agnostic metric to
compare mathematicians. We will also go to some length to address
potential objections to this new analysis, and also show they are
categorically false. \textbf{To clarify, when comparing mean and median
citations per year amongst R1 Math Professors, A \textless{} B
\textless{} C.} This result still stands, as does the rest of our
analysis, post some revision to our data. Finally, we will clearly
demonstrate Professor Pachter's mistake, namely parameter tuning
(hypertuning) of his arbitrary age cutoff, to acheive a result which
supports his incorrect interpretation.
We appreciate Professor Pachter's review - it gives us a chance to make
our analysis stronger. We would note that a revision was already in the
works and that in a normal review process we would have had three months
to respond. However, as our character and ability as scientists were
attacked, we thought it was appropriate to reply as quickly as possible.
\hypertarget{corrections-and-clarifications}{%
\subsection{Corrections and
Clarifications}\label{corrections-and-clarifications}}
We would like to thank Pachter for finding the bug in our appendix which
pushes the mean Google Scholar citations of B further away from A. We
agree that the sentence - ``while this is not optimal, a quick sample
size calculation shows that one needs 303 samples or 21\% of the data to
produce statistics at a 95\% confidence level and a 5\% confidence
interval.'' - is ridiculous.
We should explain exactly how the data collection took place. We used
the \href{https://pypi.org/project/scholarly/}{scholarly api} to
initially collect our Google Scholar citations data. But the issues were
that the scraper did not accurately differentiate between those who had
a generic name and the observed fact that older mathematicians (like
Cheeger or Gromov) do not have Google Scholar citations. To assure data
quality, we manually checked the google scholar citations of every
single letter signer, comparing publications when necessary.
However, the empirical difference in citations was staggering and we
could predict an objection. More professors from R2 (teaching focused
universities) signed A, so it could have pushed the average down. We had
already spent so much effort collecting Google Scholar citations, so we
made a choice to only collect MathSciNet data on R1 Math Professors,
which is why Professor Pachter did not have MathSciNet citations in our
dataset. This choice was not made explicitly enough in our first
version. Let us look at the NaN values of those who are full math
professors at R1 universities.
\includegraphics{index_files/figure-latex/unnamed-chunk-3-1.pdf}
One sees that 17.34\% of the Math Sci Net citations data is missing. It
appears there was some sort of systematic but unintentional error in the
data collection from MathSciNet. We report 3 NaNs on A and B, 3 on A
only, 0 on B and C, 50 on B Only, and 0 on C only. We manually checked
the missing data and find that all but
\href{https://www.math.arizona.edu/~civil/}{Marta Civil} and
\href{https://science.iupui.edu/people/watt-jeffrey}{Jeffrey X Watt}
(who are math educators) have Math Sci Net entries. The remaining
omissions are fixed and we visually check for NaNs again.
\includegraphics{index_files/figure-latex/unnamed-chunk-6-1.pdf}
65/323 is empty for AMS citations per year. While visually the nans
appear uniform, we will impose a stricter signifance level, say 2\%, to
assess the difference in AMS citations per year.
Now that we are comparing apples to apples, we reperform the main
results.
\hypertarget{the-main-result-of-paik-rivin-r1-math-professors-citations-and-citations-per-year}{%
\subsection{The Main Result of Paik-Rivin: R1 Math Professors Citations
and Citations per
Year}\label{the-main-result-of-paik-rivin-r1-math-professors-citations-and-citations-per-year}}
We will compare the mean number of citations and citations per year
(that is years elapsed since completion of PhD) between signers of
Letters A, B, and C. We will validate the significance of the difference
between signers using a
\href{https://en.wikipedia.org/wiki/Resampling_(statistics)\#Permutation_tests}{permutation
test}.
A permutation test is a non-parametric means of assessing the
significance test of a population. Throughout this section, we will be
comparing the mean citations of two populations, X and Y. We will work
under the assumption that our null hypothesis is
\(H_0: \mu(X) = \mu(Y)\), and our alternative is
\(H_1: \mu(X) < \mu(Y)\).
A permutation test works as follows. Let \(X\) and \(Y\) be our relevant
populations, of size \(n_X\) and \(n_Y\). We would like to know whether
we can accept that the observed difference in means was not due to
chance. We record the observed difference in means as
\(\delta = \mu(X) - \mu(Y)\). We then take the union of our two
population, \(Z = X \cup Y\), and randomly partition \(Z\) into two new
sets \(A\) and \(B\), where \(|A| = n_X\) and \(|B| = n_Y\). We store
\(\mu(X)-\mu(Y)\) and induce a distribution \(D\) of potential
differences and repeat the process \(n=10,000\) times. We can induce the
p-value, or the probability that our observed difference was due to
chance by the probability
\(p = |\{d:d\leq \delta, \forall d \in D\}|/n\).
\hypertarget{mathscinet-citations-for-r1-math-professors}{%
\subsubsection{MathSciNet Citations for R1 Math
Professors}\label{mathscinet-citations-for-r1-math-professors}}
\includegraphics{index_files/figure-latex/unnamed-chunk-10-1.pdf}
The mean number of citations for signers of letter A is 397 and the
median is 261. The mean number of citations for signers of letter B is
1435 and the median is 915. The mean number of citations for signers of
letter C is 2177 and the median is 1353.
The three hypotheses we would like to assess are:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\(H_0: \mu(A) = \mu(B), H_1: \mu(A) < \mu(B)\)
\item
\(H_0: \mu(B) = \mu(C), H_1: \mu(B) < \mu(C)\)
\item
\(H_0: \mu(A) = \mu(C), H_1: \mu(A) < \mu(C)\)
\end{enumerate}
The induced p-value for hypothesis 1 is 0. The induced p-value for
hypothesis 2 is 0.0016. The induced p-value for hypothesis 3 is 0. Hence
we reject all three null hypotheses in favor of the alternative, and
\(\mu(A) < \mu(B) < \mu(C)\).
\hypertarget{mathscinet-citations-per-year-for-r1-math-professors}{%
\subsubsection{MathSciNet Citations per Year for R1 Math
Professors}\label{mathscinet-citations-per-year-for-r1-math-professors}}
Of course, we considered the fact that citations grow with age, so we
calculated citations per year. There may be objections to this - one
could hypothesize that citations per year grow with age - but we will
soon thoroughly reject this claim.
\includegraphics{index_files/figure-latex/unnamed-chunk-13-1.pdf}
The mean number of citations per year for signers of letter A is 16 and
the median is 11. The mean number of citations per year for signers of
letter B is 42 and the median is 27. The mean number of citations per
year for signers of letter C is 55 and the median is 42.
The three hypotheses we would like to assess are:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\(H_0: \mu(A_{citperyear}) = \mu(B_{citperyear}), H_1: \mu(A_{citperyear}) < \mu(B_{citperyear})\)
\item
\(H_0: \mu(B_{citperyear}) = \mu(C_{citperyear}), H_1: \mu(B_{citperyear}) < \mu(C_{citperyear})\)
\item
\(H_0: \mu(A_{citperyear}) = \mu(C_{citperyear}), H_1: \mu(A_{citperyear}) < \mu(C_{citperyear})\)
\end{enumerate}
The induced p-value for hypothesis 1 is 0. The induced p-value for
hypothesis 2 is 0.07788. The induced p-value for hypothesis 3 is 0.
Hence we reject hypothesis 1 and 3, and conclude that
\(\mu(A_{citperyear}) < \mu(B_{citperyear}) \leq \mu(C_{citperyear})\).
\hypertarget{there-is-no-evidence-that-citations-per-year-grows-with-age}{%
\subsection{There is no evidence that Citations per Year grows with
age}\label{there-is-no-evidence-that-citations-per-year-grows-with-age}}
Let us check whether there is a relationship between age and citations
per year in our limited dataset.
\includegraphics{index_files/figure-latex/unnamed-chunk-18-1.pdf}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{confint}\NormalTok{(linearmodel1)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## 2.5 % 97.5 %
## (Intercept) 3.7459618 40.351608
## df$age 0.1149979 1.120528
\end{verbatim}
The slope of the regression line is slightly positive (0.6178, 95\%
Confidence Interval = (0.115, 1.12)), but the \(R^2\) values (Adjusted =
0.01662), are tragically low. So there is really no correlation. However
one could object that we do not have enough data (n = 258), to assess
that there is no correlation between citations per year and age. We know
this, but thought it would more appropriate to analyze this in a
separate paper. However, as noted above, our honor and ability as
scientists were attacked so\ldots{}
\hypertarget{presenting-citations-data-on-every-r1-math-professor-with-mathscinet-citations}{%
\subsubsection{Presenting citations data on every R1 Math Professor with
MathSciNet
citations}\label{presenting-citations-data-on-every-r1-math-professor-with-mathscinet-citations}}
(plus the Institute of Advanced Studies and UC Merced)
We manually collected the citations and year of first publication of
every R1 full math professor by consulting
\href{https://en.wikipedia.org/wiki/List_of_research_universities_in_the_United_States}{wikipedia},
going to the relevant faculty pages and then collecting MathSciNet
citations. We then anonymized it. The 2787 professors we collected data
on is in line with
\href{http://www.ams.org/profession/data/annual-survey/2016dp-tableDF1.pdf?fbclid=IwAR1mgI0qSEs5nCGquqye741_0lZU-ez7dlcJ3wZYhDtJUswhH1SX7yeiiak}{data}
collected by the AMS, after taking into account that about half of
universities in the US are classified R2. Great lengths were taken to
assess the accuracy of this data, including correlating publications,
PhD years, etc. Of course errors in data collection, especially manual
typing errors, happen, but by no means are these errors systematic.
\textbf{Exercise 1: Pick your favorite R1 institution, go to MathSciNet,
and check how similar our data is to what you determined.}
\textbf{Exercise 2: Determine every university without a female
professor. We will note that the University of Colorado - Denver has a
very strong female professor, but she does not have MathSciNet
citations. There should be at least one surprise (and a few non
surprises).}
Many aspects of this dataset can, should, and will be analyzed. For now,
the following will suffice.
\hypertarget{citations-per-year-vs-age}{%
\subsubsection{Citations per Year vs
Age}\label{citations-per-year-vs-age}}
We plot the Citations per Year vs Age for all math R1. We generate a
linear regression model and output a 95\% confidence interval for the
slope. \includegraphics{index_files/figure-latex/unnamed-chunk-23-1.pdf}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{confint}\NormalTok{(linearmodel2)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## 2.5 % 97.5 %
## (Intercept) 10.1588495 18.33147
## allR1$age 0.2515584 0.48675
\end{verbatim}
So while visually it appears that there is no correlation between year
and citations per year, one may object and say, the slope is positive!
Which leads to the following question.
\textbf{Question: To what power must we raise age to get zero within the
confidence interval of slope.}
We object to this question, because the implication of the question is,
by how much should we discount the accomplishments of those who are
older. Nevertheless, we proceed.
\includegraphics{index_files/figure-latex/unnamed-chunk-26-1.pdf}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{confint}\NormalTok{(linearmodel3)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## 2.5 % 97.5 %
## (Intercept) 6.715962422 9.56734674
## allR1$age -0.004656401 0.07740065
\end{verbatim}
It seems raising age to the 1.3 will do the trick.
We will reperform the permutation test comparing citations per
year\^{}1.3.
\hypertarget{citations-per-year-adjusting-for-fitted-handicap-on-age}{%
\subsubsection{Citations per Year adjusting for fitted handicap on
age}\label{citations-per-year-adjusting-for-fitted-handicap-on-age}}
\includegraphics{index_files/figure-latex/unnamed-chunk-28-1.pdf}
The mean number of citations per year\^{}1.3 for signers of letter A is
6 and the median is 4. The mean number of citations per year\^{}1.3 for
signers of letter B is 15 and the median is 9. The mean number of
citations per year\^{}1.3 for signers of letter C is 19 and the median
is 15.
The three hypotheses we would like to assess are:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\(H_0: \mu(A_{citperyear^{1.3}}) = \mu(B_{citperyear^{1.3}}), H_1: \mu(A_{citperyear^{1.3}}) < \mu(B_{citperyear^{1.3}})\)
\item
\(H_0: \mu(B_{citperyear^{1.3}}) = \mu(C_{citperyear^{1.3}}), H_1: \mu(B_{citperyear^{1.3}}) < \mu(C_{citperyear^{1.3}})\)
\item
\(H_0: \mu(A_{citperyear^{1.3}}) = \mu(C_{citperyear^{1.3}}), H_1: \mu(A_{citperyear^{1.3}}) < \mu(C_{citperyear^{1.3}})\)
\end{enumerate}
The induced p-value for hypothesis 1 is 0. The induced p-value for
hypothesis 2 is 0.0974. The induced p-value for hypothesis 3 is 0.0002.
Hence we fail to reject hypothesis 2 at a 2\% significance level and
reject hypotheses 1 and 3 in favor of the alternative. We conclude that
after adjusting for age \(\mu(A) < \mu(B) \leq \mu(C)\).
\hypertarget{one-more-check-that-age-is-irrelevant-when-comparing-citations}{%
\subsection{One more check that age is irrelevant when comparing
citations}\label{one-more-check-that-age-is-irrelevant-when-comparing-citations}}
This method was suggested by a friend as a final check to eliminate any
question that age was the greatest confounder. We want to show that
\(\mu(A) < \mu(B \cup C)\). We will randomly sample a population of 20
from A, called \(X\). For each member \(x \in X\), we will find every
person from B and C that is within a four year age interval from \(x\).
We will randomly sample one, and induce a new population \(Y\). Then we
will compare the means by storing X-Y. We repeat this 1,000 times and
plot a histogram of the induced values. If 0 is within this new
distribution, then maybe there is a chance, a totally slim one after
above, that in fact age is a confounder. If the distribution is
primarily negative, then X \textless{} Y. Otherwise X \textgreater{} Y.
We perform this analysis with both AMS citations and Google Scholar
citations.
\includegraphics{index_files/figure-latex/unnamed-chunk-31-1.pdf}
When comparing mathscinet citations with this age matched randomization
test, we see that none of the induced distribution is greater than or
equal to zero. So when comparing similarly aged apples to apples,
\(A < B\cup C\)
We perform the same analysis with Google Scholar citations.
\includegraphics{index_files/figure-latex/unnamed-chunk-33-1.pdf}
When comparing Google Scholar citations with this age matched
randomization test, we see that 18.1\% of the induced distribution is
greater than or equal to zero. So when comparing similarly aged apples
to apples, it is inconclusive if \(A < B\cup C\). Of course, we wonder
if this is actually Lior Pachter.
\includegraphics{index_files/figure-latex/unnamed-chunk-35-1.pdf}
When comparing Google Scholar citations, removing Pachter, with this age
matched randomization test, we see that 2.7\% of the induced
distribution is greater than or equal to zero. So when comparing
similarly aged apples to apples, it indeed seems that \(A < B\cup C\).
\hypertarget{pachters-magic-trick-hypertuning}{%
\subsection{Pachter's Magic Trick:
Hypertuning}\label{pachters-magic-trick-hypertuning}}
A note about Pachter's final, ``damning,'' (it is not), figure. He chose
a cutoff of age 36, and compared the average Google Scholar citations of
letter signers. He finds that if one does this cutoff, the mean
citations of A is greater than B. We found this choice of 36 to be
curious and somewhat arbitrary. It smelled like parameter tuning, but we
wanted to investigate.
We plot the average citations per year and note with a vertical line,
the 36 (PhD) age cutoff.
\includegraphics{index_files/figure-latex/unnamed-chunk-37-1.pdf}
The maximum age since PhD of a letter signer of A is 49. If he were to
cutoff his comparison at that point, clearly \(C>A\). If he were to
cutoff his comparison at 38, \(C>A\). Any further left of 36, he would
be accused of being biased.
Notice the spike at age 21. This is caused by Lior Pachter. What would
happen if we removed Pachter?
\includegraphics{index_files/figure-latex/unnamed-chunk-39-1.pdf}
Perhaps removing Lior was a confounder. So we remove the top five
mathematicians from C.
\includegraphics{index_files/figure-latex/unnamed-chunk-40-1.pdf}
So it is clear that Pachter's analysis was some sort of magic trick,
potentially a thought experiment, and a misleading one. It is highly
unlikely that a tenured and respected expert in computation and
statistics did not know the above result, expecially when a student he
suggests take an introductory statistics course immediately spotted it.
One may suspect that he purposefully chose his 36 cutoff to try to
undermine our results.
\hypertarget{tier-rankings}{%
\subsection{Tier Rankings}\label{tier-rankings}}
In our excel sheet, (which we understand is the bane of
reproducibility), and through the magic of pivot tables, we rank R1
departments by calculating average department citations per year (since
first publication).
The top 11 departments using this ranking are:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Princeton
\item
Institute of Advanced Studies
\item
Harvard
\item
Stanford
\item
University of Chicago
\item
University of California - Los Angeles
\item
Massachussetts Institute of Technology
\item
Columbia University
\item
New York University
\item
University of Miami
\item
University of California - Berkeley
\end{enumerate}
We calculate the average citations per year since PhD of letters A, B,
and C, and compare them to our ranked list.
The average Math Sci Net Citations per year (PhD Age) is:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
For letter A - 15.98726
\item
For letter B - 41.86467
\item
For letter C - 55.3615
\end{enumerate}
Temple has an average citations per year of 12.33, so we retract our
claim that letter A is comparable to Temple. It is closer to the
University of Massachusetts - Amherst which has an average citations per
year of 16.17. By
\href{https://www.usnews.com/best-graduate-schools/top-science-schools/mathematics-rankings}{US
News}, University of Massachusetts - Amherst's Math Department has a
rank of 55. Rutgers has an average citations per year of 35.01, so we
retract our claim that letter B is comparable to Rutgers. It is closer
to the University of Minesota which has an average citations per year of
42.07 and a US News Ranking of 19. For Letter C, we claimed that it was
another tier higher - indeed it is closer to the University of Chicago,
which has an average citations per year of 56.27, ranked 6 by US News.
An astute observer would notice we are not exactly comparing apples to
apples. Presumably one's first publication could be before one finishes
their PhD. So even with the boost, the order amongst letter signers
stands.
\hypertarget{discussion-and-conclusion}{%
\subsection{Discussion and Conclusion}\label{discussion-and-conclusion}}
We have debunked the claim that age is the confounder for the difference
in citations and citations per year between signers of Letter A, B, and
C. Indeed, the least meritorious of mathematicians as a whole signed
letter A, whereas the more meritorious signed letters B and C, with
merit judged by citations. If one was not willing to believe citations
impose even a small order on merit, one could replace citations with
Fields medals, AMS Fellowships, or many other metrics.
In this analysis, we have addressed most of the criticisms in Pachter's
review, acknowledging our errors when pointed, while rejecting his false
claim that age was the greatest confounder. The only one we have not
addressed is his point that, ``several p-values are computed and
reported without any multiple testing correction.'' After consultation
with a respected statistician, we do not see what the issue is. We
reported every p-value and he is welcome to change the set.seed in our
code, which he applauds us as easily reproducible.
We conclude by reiterating our thanks to Pachter. We truly appreciated
your review.
\hypertarget{data-and-code}{%
\subsection{Data and Code}\label{data-and-code}}
All code and data used for this report is available at.
\url{https://github.com/joshp112358/Response-to-Pachter}
\hypertarget{references}{%
\subsection{References}\label{references}}
Lior Pachter's Blog Post - Diversity Matters - January 17, 2020
\url{https://liorpachter.wordpress.com/2020/01/17/diversity-matters/}
Chad Topaz's Paper - Version 10 -
\url{https://osf.io/preprints/socarxiv/fa4zb/}
Our original Paper - Version 1 -
\url{https://arxiv.org/pdf/2001.00670.pdf}
In Preparation - A Citations Analysis of R1 Math Departments by Joshua
Paik and Igor Rivin
\hypertarget{miscellany}{%
\subsection{Miscellany}\label{miscellany}}
There seems to be some squabbles in the comments of Pachter's blog
whether the paper is Paik-Rivin or Rivin-Paik. In mathematics, we follow
the Hardy-Littlewood rule, namely all authors are first authors and we
list authors alphabetically.
\end{document}