-
Notifications
You must be signed in to change notification settings - Fork 0
/
trading-genetics.Rmd
3115 lines (2657 loc) · 141 KB
/
trading-genetics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: |
Trading social status for genetics in marriage markets:
evidence from Great Britain and Norway
author: |
Abdel Abdellaoui\thanks{Department of Psychiatry, Amsterdam UMC, University
of Amsterdam, Amsterdam, The Netherlands. Email: [email protected]},
Oana Borcan\thanks{School of Economics, University of East Anglia, Norwich,
UK. Email: [email protected]},
Pierre-André Chiappori\thanks{Department of Economics, University of Columbia,
New York. Email: [email protected]},\
David Hugh-Jones\thanks{Corresponding author. Email: [email protected]},
Fartein Ask Torvik\thanks{Norwegian Institute of Public Health, Oslo.
Email: [email protected]} &
Eivind Ystrøm\thanks{Norwegian Institute of Public Health, Oslo.
Email: [email protected]}
abstract: |
Under social-genetic assortative mating (SGAM), socio-economic status (SES) and
genetically inherited traits are both assets in marriage markets, become
associated in spouse pairs, and are passed together to future generations.
This gives a new explanation for persistent intergenerational inequality and
"genes-SES gradients" -- observed genetic differences between high- and
low-SES people. We model SGAM and test for it in two large surveys from Great
Britain and Norway. Spouses of earlier-born siblings have genetics predicting
more education. This effect is mediated by individuals' own education and
income. Under SGAM, shocks to SES are reflected in the DNA of subsequent
generations, and the distribution of genetic variants in society is endogenous
to economic institutions.
\par\textbf{Keywords:} Assortative mating, MoBa, UK Biobank.
date: "`r Sys.Date()`"
output:
bookdown::pdf_document2:
toc: false
latex_engine: xelatex
number_sections: false
keep_tex: true
editor_options:
chunk_output_type: console
markdown:
wrap: 72
bibliography: bibliography.bib
mainfont: Times
fontsize: 12pt
linkcolor: blue
header-includes:
- \usepackage{subfig}
- \captionsetup[subfloat]{labelformat=empty}
- \captionsetup[figure]{width=5in}
- \usepackage{setspace}\onehalfspacing
- \usepackage{amsmath}
- \usepackage{amsthm}
- \usepackage{placeins}
- \usepackage{etoc}
- \newtheorem{prop}{Proposition}
- \newtheorem{claim}{Claim}
---
```{r setup, include = FALSE}
library(robomit)
library(Formula)
library(car)
library(drake)
library(Hmisc)
library(tibble)
library(dplyr)
library(magrittr)
library(AER)
library(ggplot2)
library(huxtable)
library(broom)
library(fixest)
library(santoku)
library(purrr)
library(scales) # percent should override santoku
library(systemfit)
library(nlWaldTest)
set.seed(27101975)
regression_subset <- function (data) {
data %>%
filter(
n_sibs.x >= 2,
n_sibs.x <= 6,
! is.na(birth_order.x),
! is.na(height.x),
! is.na(fluid_iq.x),
! is.na(university.x),
! is.na(bmi.x),
! is.na(sr_health.x)
# we don't demand first_job_pay.x is not NA, that would
# shrink the N by a lot
)
}
calc_prop_shared_children <- function (mf_pairs) {
drake::loadd(parent_child)
# all pairs where at least 1 parent has a genetic child in the sample
# note that this creates multiple parent-child relationships
mf_w_parent <- mf_pairs %>%
left_join(parent_child, by = c("ID.m" = "parent_id"),
relationship = "many-to-many") %>%
left_join(parent_child, by = c("ID.f" = "parent_id"),
relationship = "many-to-many",
suffix = c(".m", ".f")) %>%
filter(! is.na(child_id.m) | ! is.na(child_id.f))
n_one_has_kid <- length(unique(mf_w_parent$ID.m)) # using ID.f gives same
mf_w_both_parents <- mf_w_parent %>%
filter(child_id.m == child_id.f) # NAs excluded
n_both_same_kid <- length(unique(mf_w_both_parents$ID.m))
return(list(one = n_one_has_kid, both = n_both_same_kid))
}
pretty <- function (n, digits = 2, ...) {
formatC(n, digits = digits, big.mark = ",", ...)
}
update_with_birth_order_dummies <- function (fml) {
update(fml, . ~ . - birth_order.x + factor(birth_order.x))
}
convert_moba_for_huxreg <- function (tidy, glance) {
huxtable::tidy_replace(tidy, tidied = tidy, glance = glance)
}
knitr::opts_chunk$set(echo = FALSE)
knitr::knit_hooks$set(
inline = function (x) {
if (is.numeric(x)) x <- as.character(round(x, getOption("digits")))
x <- gsub("-", "\u2212", x)
paste(as.character(x), collapse = ", ")
}
)
options(huxtable.long_minus = TRUE)
theme_set(theme_minimal())
drake::loadd(mf_pairs)
drake::loadd(mf_pairs_twice)
drake::loadd(famhist)
drake::loadd(resid_scores)
famhist %<>% left_join(resid_scores, by = "f.eid")
rm(resid_scores)
famhist$EA3 <- famhist$EA3_excl_23andMe_UK_resid
mf_pairs_reg <- regression_subset(mf_pairs_twice)
mf_pairs_reg <- mf_pairs_reg %>%
group_by(female.x, YOB.x) %>%
mutate(
first_job_pay.x = c(scale(first_job_pay.x))
)
my_note <- "{stars}. Standard errors clustered by spouse pair in parentheses."
# for p values
my_stars <- c(`***` = 0.001, `**` = 0.01, `*` = 0.05, `+` = 0.10)
# my_stars <- NULL
# basic formulae
fml_bo_psea <- list()
fml_bo_psea[[1]] <- EA3.y ~ birth_order.x | factor(n_sibs.x)
fml_bo_psea[[2]] <- EA3.y ~ university.x + birth_order.x | factor(n_sibs.x)
fml_bo_psea[[3]] <- EA3.y ~ first_job_pay.x + birth_order.x | factor(n_sibs.x)
fml_bo_psea[[4]] <- EA3.y ~ first_job_pay.x + university.x + birth_order.x |
factor(n_sibs.x)
fml_bo_psea <- lapply(fml_bo_psea, Formula::as.Formula)
# common controls
fml_bo_psea <- lapply(fml_bo_psea, update,
. ~ . + EA3.x + par_age_birth.x)
fml_bo_psea <- lapply(fml_bo_psea, update,
. ~ . | . + factor(YOB.x) + factor(birth_mon.x))
# alternative mediators
fml_bo_psea[-1] <- lapply(fml_bo_psea[-1], update,
. ~ . + fluid_iq.x + height.x + bmi.x + sr_health.x)
# basic tidy args
tidy_args <- list(se = "cluster", cluster = "couple_id", conf.int = FALSE)
# tidy_args <- list(se = "hetero", conf.int = FALSE)
reg_coefs <- c(
"Birth order" = "birth_order.x",
"University" = "university.xTRUE",
"Income" = "first_job_pay.x",
"Fluid IQ" = "fluid_iq.x",
"Height" = "height.x",
"BMI" = "bmi.x",
"Self-reported health" = "sr_health.x",
"Own PSEA" = "EA3.x",
"Parents' age at birth" = "par_age_birth.x"
)
moba_reg_coefs <- c(
"Birth order" = "parity",
"University" = "university",
"Income" = "incomez",
"Height" = "height",
"BMI" = "bmi",
"Own PSEA" = "eapgsresid",
"Parents' age at birth" = "parentalage"
)
# load moba results
load("moba-results/tradinggenetics_moba_v04.Rdata")
```
```{r TODO-NOTES, eval = FALSE}
# * TODO - May 2024
# - push model on differences between meritocracy and SGAM so as to answer
# "why should economists care", R2.
# - rewrite previous literature sect, esp on genes and endogeneity, to
# note that G/E correlation is widespread and people think about it, and
# to focus on our "genes on the left" contribution.
# - add refs to Houmark et al and to Rustichini et al; read and also Brumpton
# et al nature comms
# - personality as a mediator?
# - clarity on the model in terms of dimensions and why SES is not
# - read
* Notes on how to submit over EM
- change \symbf to \bm throughout, add \usepackage{bm}
- edit Loic Yengo and Kare XXX in within-tex-file autocreated bibliography
to use TeX symbols not unicode, because TeX is stupid and is for stupid
people
* TODO - April 2023
- work on integrating the model with the theory, esp. now we have two societies.
Can we estimate $a$? And plug in estimates of $\theta$ so as to calculate
the long-run genes-SES correlation? (And we could also estimate this
within our data...)
- do something about Norway over time?
- see also narrow TODO list in the Norway data section
# TODO - Apr 2022
* Principal components, own correlation with birth order? Just to check. Cf. Abdel's earlier critique of GAM papers.
* Versions of first stage with sibling FE? I did this, they are the right
sign except for BMI, but never significant and with big SEs. Maybe doesn't
add much.
* Extension with a ne b: does correlation increase in difference between a and b?
DONE
* Oana, do analysis of missing mediator and how close it would have to correlate
with univ. attendance to make our results insignificant
* What to do with the Chiappori et al style matching model? Ask Pierre
DONE
* Comp statics of long run, using Imp Fun Thm;
solve comp stats and do graphs
* Rework conclusion esp re $\theta$, $\gamma$
DONE
* $E[x'_2 | x_2]$ and $E[x'_2 | x_1, x_2]$ in prop-gamma,
to show that $\sigma$ is a confound while $\gamma$ is a mediator?
* Connect theory to Oana's empirics
* G-E interactions? We mention in intro; could just mention it as a potential
extension
DONE
# * Why missing all the first job data? - because not everyone does the
# online followup. We could use "current job" which might be endogenous
# to your spouse; then do first job as a robustness check.
#
#
# * maybe - consider alternative exogenous shocks to income. For example, some # professions are more "cyclical" than others wrt recessions. If we could do
# predicted income at age 21-25 from business cycle X profession, that might
# count as exogenous. (Could use an independent source to estimate evolution
# of incomes, e.g. GHS or BHPS)
#
# * Maybe: number of elder *brothers*? (We don't have this info but we have
# total number of brothers, so we can interact this with the birth order
# effect)
#
# Check prop 2. (Bob question: if there's noise, then shouldn't
# correlation tend to a limit less than 1? And if so, then at that limit,
# parental correlation should equal child correlation).
# - I think the claim doesn't contradict this. But do double-check!
# TODO: We may have to control, rather than residualize, on PCs
# to get accurate p-values, as our N is not huge by the end (or is 2000
# enough not to worry)
# Connect the model to existing theoretical models of matching
# (e.g. Gale-Shapley, Becker 1974, econometrics of matching). DHJ.
# Robustness: look at all 33 polygenic scores for families of size 3. Rerun
# regressions excluding family size of 3. Appendix. DHJ.
# For future: consider gender asymmetries - e.g. does male SES matter more?
# Might need some other "good genes" measures e.g. waist-hip ratio.
# Maybe full set of birth order dummies within family size
# but maybe only for family size 2-4 or so?
#
# * NB: why in mf_pairs_twice do we see differences in own EA3 with birth_order?
# - just chance? it's only true for families of size 3...
# NOTES on economics literature
# Areas/topics
# * Econometrics of matching
# - Choo & Siow 2006
# - Chiappori et al 'fatter attraction'
# - Quite simple regressions as a result of their basic framework
# - Run regressions on each of the other person's characteristics;
# in the linear version of their framework, coefficients should all be
# proportional
# - I'm not sure these guys are our key targets though, more likely some
# people we might have to "appease"...!
# * Inheritance of inequality
# * Assortative mating
# - a mechanism for inequality
# - also related to family economics
# - this might be a good way to set up the paper:
# - "We bring together two explanations of persistent inequality. (1) Genetics
# and (2) assortative mating."
# * Getting comments from Greg Clark would be very useful.
# * Maybe from Samuel Bowles too?
# * They have a Science article on IG Wealth Transmission
# * This special issue of Current Anthropology is important:
# https://www.journals.uchicago.edu/toc/ca/2010/51/1
# - they distinguish "material", "somatic" (inc genetic, but also e.g.
# embodied knowledge), and "relational" wealth
# - In their response to comments, Mulder et al. acknowledge that maybe
# one kind of wealth can help you acquire other kinds, and this is an
# important area of future research.
# * Fernandez et al. (Love and Money)
# - inequality increases marital sorting because it increases the benefits
# to matching with another skilled type;
# - they confirm the correlation cross-country
# - they also predict sorting lowers GDP (not sure how this works)
# - there is a link with macroeconomics and the demographic transition in
# this literature
# * Eika et al. 2019
# - Empirics of assortative mating in US from 1940s: it's increased at
# the bottom but gone down at the top
# - Important for inequality but changes have led to little net increase
# in inequality. (A simple accounting framework when you compare "what
# if people had matched randomly into households")
# - The motivation "assortative mating is increasing inequality" is
# a sufficient one, for this paper - it's a big issue.
# * Doepke and Tertilt "families in macroeconomics"
# - parents' fertility decisions are important for econ growth
# - as is skill formation
# - I think this is for another paper - e.g. what happens when the rich
# have more children?
# * Gould, Moav and Simhon 2008
# - to read
# - relates end of polygyny to econ growth
# - as above, maybe for another paper
#
# PEOPLE you could talk to
# * Doepke
# * Clark
# * Fernandez
# * Bowles
```
\normalem
# Introduction
How families are formed, and transmit traits and assets to their
offspring, is crucial for understanding inequality and social structure.
Assortative mating in marriage markets can increase inequality between
families [@breen2011educational; @greenwood2014marry] and contribute to
its persistence across generations, which is surprisingly high
[@clark2015intergenerational; @solon2018we]. Wealthy families pass on
advantages to their children through both genetic inheritance and
environmental influence [@Rimfeld_2018; @bjorklund2006origins;
@sacerdote2011nature].
This paper examines a plausible aspect of marriage markets: both social
status and genetics contribute to a person’s attractiveness, and as a
result, they may become associated in subsequent generations.[^1] For
example, suppose that wealth, intelligence and health are advantages in
a potential spouse. Then wealthy people are more likely to marry
intelligent or healthy people, and their children will inherit both
wealth, and genetic variants associated with intelligence or health. We
call this mechanism social-genetic assortative mating (SGAM). SGAM may
be an important channel for the transmission of inequality. It creates a
genetic advantage for privileged families, which may help to explain the
long-run persistence of inequality. At the same time, this advantage is
not a fact of biology, but is endogenous to the social structure.
Indeed, under SGAM, environmental shocks to a person’s social status may
be reflected in the genetics of his or her children.
[^1]: *Social status* refers to characteristics that an individual
possesses in virtue of their social position. For example, my wealth
is a fact about me that holds in virtue of my relationship to
certain social institutions (bank deposits, title deeds et cetera).
Other examples include caste, class, income, and educational
qualifications. *Socio-economic status* (SES) is a specific type of
social status which exists in economically stratified societies,
covering variables like educational attainment, occupational class,
income and wealth [e.g. @white1982relation].
Below, we first write down a theory where attractiveness in the marriage
market is a function of both socio-economic status (SES) and genetic
variants. We show that social-genetic assortative mating in one
generation increases the correlation between SES and genetic variants in
the offspring generation. This result provides a new explanation of
*genes-SES gradients* -- systematic genetic differences between high-
and low-SES people [@belsky2018genetic; @Rimfeld_2018;
@bjorklund2006origins]. The dominant existing explanation for these
gradients is meritocratic social mobility: if a genetic variant predicts
success in the labour market, then it will become associated with high
SES and will be inherited in high-SES families. While under meritocracy,
genes causes SES, under SGAM causality goes both ways, from genes to SES
and vice versa. Also, the size of genes-SES gradients depends on
economic institutions. Under institutions which increase
intergenerational mobility, like high inheritance tax rates,
genes-SES gradients become weaker. On the other hand, an increase in meritocracy
can make them stronger. SGAM also interacts with economic institutions to
determine the level of socioeconomic inequality.
Next, using data on spouse pairs from two large genetically-informed
surveys in Great Britain and Norway, we test the hypothesis that a
person’s higher social status attracts spouses with genetic variants
predicting greater educational attainment. Our genetic measure, the
polygenic score for educational attainment (PSEA), derives from
large-scale genome-wide association studies [@lee2018gene;
@okbay2022polygenic]. PSEA reflects a bundle of polygenic effects on
underlying traits, including intelligence, personality, and physical and
mental health [@demange2021investigating]. PSEA predicts, and causes,
educational attainment itself, as well as intelligence and labour market
outcomes. It is already known that humans mate assortatively on PSEA
[@hugh2016assortative; @robinson2017genetic; @torvik2022modeling], which
makes it a likely candidate for detecting SGAM.
The endogeneity of socio-economic status is the main challenge in
identifying the effect of SES on the spouse’s genetic endowment. For
instance, people with high educational qualifications tend to also have
high PSEA, and as mentioned above, they may take partners based on
genetic similarity. Indeed, recent studies show strong assortative
mating on PSEA, much more than we would expect if spouses matched only
on observed measures of educational attainment [@okbay2022polygenic]. To
isolate the causal link from own SES to partner's genes, we use a shock
to SES which is independent of own genetics. Specifically, we use a
person's *birth order*. Earlier-born children receive higher parental
investment and have better life outcomes, including measures of SES such
as educational attainment and occupational status [@black2011older;
@booth2009birth; @Lindahl_2008]. At the same time, the facts of biology,
in particular the so-called "lottery of meiosis", guarantee that
siblings' birth order is independent of their genetic endowments.[^2]
Because birth order could affect partner choice through both SES and
non-SES mechanisms, we run a mediation analysis similar to
@heckman2013understanding, decomposing the treatment effect into effects
of measured and unmeasured mediating variables. Specifically, we
estimate a reduced-form model with spouse polygenic scores for
educational attainment (PSEA) as the dependent variable, and own birth
order as the main independent variable. We then add in to the model
measures of own socio-economic status, including university attendance
and income. Under certain assumptions, these variables can be
interpreted as mediating the effect of birth order on spouse genetics.
[^2]: Although @muslimova2020dynamic find that PSEA and birth order
*interact* to produce human capital.
In both Great Britain and Norway, later-born children have spouses with
significantly lower PSEA in the reduced-form regressions. When we add
mediators, including university attendance and/or income, the effect of
birth order shrinks substantially, becoming insignificant in Great
Britain, while the SES mediators significantly increase the spouse’s
PSEA. The results are robust to the inclusion of several controls,
including non-SES mediators, and a rich set of own genetic traits. Thus,
SES appears to mediate the effect of birth order on spouse genetics. The
effects of individual mediators differ between the two countries. While
university attendance explains more than a third of the effects of birth
order in both Britain and Norway, income explains about 10% of the
effects in Britain but has little or no independent effect in Norway.
Although our main focus is on testing the basic mechanism of SGAM, this
is suggestive evidence that in a more egalitarian society, some forms of
SES are less important to the marriage market, with long-run
implications for genes-SES gradients.
Both economists and geneticists study assortative mating. The economics
literature has typically focused on educational similarities [e.g.
@pencavel1998assortative; @chiappori2017partner] or social class or
caste [e.g. @abramitzky2011marrying; @banerjee2013marry], but also
sorting based on age, physical traits and ethnicity
[@hitsch2010matching]. Some papers have studied substitution between
different traits.[^3] For instance, @chiappori2012fatter showed that
individuals trade off BMI for partners’ income or education.
[^3]: @oreffice2010anthropometry show that height and BMI are associated
with spouse earnings. @dupuy2014personality find spouse matching on
multiple independent dimensions, including education, height, BMI
and personality. @chiappori2021analyzing analyse matching on
multiple characteristics and show that a three-dimensional matching
model fits their data.
In genetics, @halsey1958genetics showed that social mobility combined with
assortative mating might increase the association between genetics and
social class. @cloninger1979multifactorial model genetic and cultural
transmission, where assortative mating is based directly on phenotype
and culture is transmitted from parents. Assortative mating, modeled
simply as a correlation coefficient, leads culture and genetics to
be associated in offspring. @heath1985resolving, following earlier
papers [@rao1976resolution; @rao1979path], introduce "social homogamy",
i.e. assortative mating by social background. @otto1995genetic extend
assortative mating to include both phenotypic and social homogamy.
More recently, interest in these topics has been revived by empirical
findings from genomics. "Direct" effects of individual genetic variants, estimated by
within-family studies, are different from "indirect" effects, i.e. associations
found in the whole sample, and direct effects of polygenic scores can be smaller
than population-wide associations [@howe2022within; @young2022mendelian].
Also, parental alleles which are *not* transmitted to the child correlate with
child outcomes [@kong2018nature]. Both these phenomena could be explained
by confounding from gene-environment correlation, or by assortative mating
[@young2023estimation; @nivard2024more]. Lastly, correlations between spouses'
polygenic scores for education are higher than can be explained by assortative
mating on measured phenotypic education alone [@okbay2022polygenic;
@robinson2017genetic; @torvik2022modeling]. To address this, several recent papers
papers have estimated structural models of assortative mating in family data
[@eaves1999comparing; @torvik2022modeling; @collado2023estimating; @rustichini2023educational]. Because both cultural and genetic inheritance
proceed from parents to children, it can be hard to differentiate
them. For example, @collado2023estimating derive extremely low estimates
for heritability of education, within a model in which all genetic similarity
between spouses is driven by matching either on the measured phenotype, or
on a shared cultural factor; whereas @torvik2022modeling estimate partner
correlation between "true" polygenic scores for education of 0.37, and
heritability above 50%, in a model where environment is shared between siblings
but not across generations.[^okbay] In this context, we think it is worth
taking a different approach. We cleanly identify separate environmental and
genetic contributions to assortative mating: environmental contributions using
birth order, genetic contributions by comparing polygenic scores within
siblings.[^tbl-rev]
[^okbay]: As @okbay2022polygenic put it: "Because the parameters of a general
biometric model cannot be separately identified from a small number of
phenotypic correlations among different types of relatives, researchers
typically have to assume that some of the parameters equal zero in order to
estimate other parameters."
[^tbl-rev]: See Table \@ref(tab:tbl-reversed-moba) in the appendix.
SGAM has consequences for inequality and social mobility. Long-run estimates of
intergenerational persistence of wealth and status are surprisingly higher than
would be predicted from parent-child correlations [@clark2015intergenerational;
@barone2021intergenerational;@solon2018we], and distant relatives in the same
generation are also more similar than parent-child and spousal correlations
would predict [@collado2023estimating]. @clark2023inheritance argues that this
can be explained by an underlying process where unobserved genetic variation
determines wealth. This requires a high degree of assortative mating. Our model
shows that genetics may itself be a mediator for the transmission of SES, via
"trading" in marriage markets. We also show how different social and economic
institutions can affect that process. When SES is highly transmissible across
generations, this increases the long-run association between SES and genetics.
If so, institutional reforms that increase *intergenerational mobility*, like
mass education or inheritance taxation, may affect not only economic but genetic
inequality. Conversely, an increase in *economic meritocracy* increases the
long-run association between SES and genetics,[^4] posing the problem raised by
@young1958rise and more recently @markovits2019meritocracy: meritocracy may be
self-limiting or even self-undermining.
[^4]: See Proposition \@ref(prop-gamma).
In terms of cross-sectional inequality, the conventional wisdom is that it is
increased by assortative mating on SES [@fernandez2001sorting;
@breen2011educational; @greenwood2014marry]. But that depends what else people
assort on. As we show below[^ineq], if the same genes that are relevant in marriage
markets also affect economic outcomes, then an increase in the role of genes
vis-à-vis SES in marriage markets may increase economic inequality: it makes
households more unequal in genetics, and these are passed on to their children
with high reliability.
[^ineq]: See Figure \@ref(fig:pic-heritability-inequality).
SGAM can also explain a large body of evidence for cross-sectional
associations between genetics and social status. For example: from twin
studies, the heritability of occupational class and educational
attainment, i.e. the proportion of variance explained by genetic
differences between individuals, is around 50% [@Tambs_1989].
Genome-wide Complex Trait Analysis (GCTA) shows that the family
socio-economic status of 2-year-old children can be predicted from their
genes [@Trzaskowski_2014]. Children born into higher-income families
have more genetic variants predicting educational attainment
[@belsky2018genetic]. Adoption studies show that both post-birth
environment and pre-birth conditions (genetics and prenatal environment)
contribute to the transmission of wealth and human capital [e.g.
@bjorklund2006origins]. There is also a genes-SES gradient in genetic
predictors of health. DNA-derived scores predicting several health
outcomes are associated with regional economic deprivation
[@abdellaoui2019genetic]. The correlation between education and health
may be mediated by shared genetic causes [@amin2015schooling;
@boardman2015can]. Family SES correlates with several health-related
polygenic scores [@selzam2019comparing], and genetic variants associated
with SES may explain the genetic correlations between many mental health
outcomes [@marees2021genetic].
SGAM shows how marriage markets can lead high SES to be associated with
different genetic variants, i.e. it can explain genes-SES gradients. The
standard explanation for these gradients is returns to human capital in
labour markets, also known as meritocratic mobility. Higher-ability
parents reap higher market returns, and they may then pass both higher
socio-economic status and their genes to their children, leading to an
association between the two [@belsky2018genetic].[^belsky] This mechanism depends
on the level of meritocracy in social institutions
[@branigan2013variation; @Heath_1985]: in a society where social status
was ascribed rather than earned, it could not take effect. Indeed, after
the fall of communism in Estonia, the heritability of SES increased,
presumably because post-communist society allowed higher returns to
talent [@Rimfeld_2018]. By contrast, SGAM does not require meritocracy.
Even when social status is entirely ascribed, it can still become
associated with certain genetic variants, so long as their associated
phenotypes are prized assets in marriage markets. Since meritocracy is
historically rare, while assortative mating is universal, this suggests
that genes-SES gradients are likely to be historically widespread.
[^belsky]: @belsky2018genetic offer three reasons for the association between
education-linked genetic variants and SES, but do not consider SGAM.
Lastly, we contribute to a literature in economics that examines the
relationship between genetic and economic variables.
@benjamin2011promises and @benjamin2024social are reviews. Several
recent papers use polygenic scores, in particular polygenic scores for
educational attainment [e.g. @barth2020genetic; @papageorge2020genes;
@ronda2020family]. @barban2021effect use PSEA as an instrument for
education in a marital matching model. These papers, like much of the
behavior genetics literature, take genetic endowments as exogenous and
examine how they affect individual outcomes, perhaps in interaction with
the environment. We take a different approach by putting genetics on the
left hand side of the estimating equation. Assortative mating and cultural
inheritance are social processes, so we think there are good prospects for
social scientists to contribute to understanding how genetic variants get
distributed in society – what geneticists call "stratification" and "dynastic
effects".
The observations behind SGAM are not new. That status and physical
attractiveness assort in marriage markets is a commonplace and a
perennial theme of literature. In the Iliad, powerful leaders fight over
the beautiful slave-girl Bryseis. In Jane Austen’s novels, wealth,
attractiveness and “virtue” all make a good match. @marx1844economic
wrote “the effect of ugliness, its repelling power, is destroyed by
money.” The literature on mate preference from evolutionary psychology
[@buss1986preferences; @buss1989sex; @buss2019mate] confirms that
attractive mate characteristics include aspects of social status (“high
earning capacity,” “professional status”) as well as traits that are
partly under genetic influence (“intelligent,” “tall,” “kind,”
“physically attractive”). Despite this, to our knowledge, few papers have examined
the socio-economic consequences of assortative
mating between SES and genetics.[^papers] In particular, we are the first to show
how SGAM interacts with institutional variables to affect economic inequality,
mobility and associations between genes and SES, and the first to cleanly
identify an environmental effect on spouse genetics.
[^papers]: Specifically, @halsey1958genetics and @rustichini2023educational.
```{r literature-notes, eval = FALSE}
# Notes
# Fernandez et al 2005
# - skills are caused by investment in human capital
# - sorting is inefficient since it increases inequality of parents' capital
# and some people will be credit-constrained.
#
# Marry Your Like (Greenwood 2014)
# - simple accounting methodology
# - ass mating has increased and has contributed to increase in inequality in the
# US, 1960 to 2005
#
# Eika 2019
# - assortative mating has decreased among highly educated, increased among low
# educated, 1960s-2010s. Overall, increase only until the 1980s
# - changes in ass mating barely move time trends in household income inequality
# - findings for US + 4 European countries
# Clark 201x - surname groups show higher intergen transmission of wealth
# than
# individual parent-child pairs. Also grandparents matter independently.
# One explanation: underlying "social status" or "social competence" which
# is measured with error by wealth.
# Solon 2018 - reviews evidence on long-run transmissibility. Not
# much evidence for Clark's "0.7". Many alternative explanations.
# Becker-Tomes 1979?
#
# Natural/cultural selection models
* Otto et al.
- derive models with cultural and genetic inheritance, plus
A.M. on the phenotype or on the underlying cultural type;
- nb one thing we maybe add is a microfoundation in terms of indiv. choice..
- metaanalysis of various parent/child correlations on IQ variation
- so, no research design ish elements...
* Rao, Morton and Yee 1976
- path analysis allowing for spousal correlations, both "E-E" and "G-E"
- then fit parameters, again no research design;
- I think there's room to argue that we add something simply by
the use of the exog. variation
* Cloninger et al. 1979
- path analysis; cultural transmission can vary btw mothers and fathers
- no microfoundations
* Fisher 1918
- mentions ass. mating though not the cultural kind
- but does dist btw mating on phenotype and on underlying genetics
Others to look at:
* Beauchamp et al. 2011
- looks at ass. mating w.r.t. height/intelligence correlation
- cross trait a.m. but both genetic
* Vinkhuyzen et al. 2010
- A.M. of phenotypic/social types
- effects on heritability of intelligence estimates
- good for a sense of "state of the art"
* Heath and Eaves 1985
- how to distinguish between phenotypic and social A.M.
- a "classic"
```
# Model
```{r model-section, child="model-section.Rmd"}
```
# Data and methods
The central insight in our model is that higher SES and good genes
assort in the marriage market. We wish to test this directly, i.e. to
test whether $0 < a < 1$ in the attractiveness equation $$
i(x) = a x_1 + (1-a) x_2
$$ where $x_2$ is social status and $x_1$ is genetic endowment. Consider
the effect of a change in $x_2$ holding $x_1$ constant. If $a = 1$ then
this will not change $i(x)$ and therefore will not change the expected
characteristics of the spouse. So, if we regress spouse's $x_1$ on own
$x_2$, and reject the null of no effect, we can reject $a = 1$.[^8]
[^8]: Conceivably, if $a = 0$ but there is a pre-existing correlation
between $x_1$ and $x_2$ in the population, then an increase in own
$x_2$ will increase spouse's expected $x_2$ and therefore spouse's
expected $x_1$, even though the latter does not enter the
attractiveness equation. We can separately test the null that
$a = 0$ by regressing spouse's $x_2$ on own $x_1$, holding own $x_2$
constant. Existing work has already linked own genetics to spouse's
SES, e.g. education, so we focus on the other direction and treat
this direction as a robustness check below.
We use data from two sources: Great Britain and Norway. This allows us
to check our basic result in two different societies, and also to make
(tentative) comparisons between them. Our Great Britain data comes from
the UK Biobank, a study of about 500,000 individuals born between 1935
and 1970 [@bycroft2018uk]. The Biobank contains information on
respondents' genetics, derived from DNA microarrays, along with
questionnaire data on health and social outcomes. The Biobank does not
contain explicit information on spouse pairs. We categorize respondents
as pairs if they had the same home postcode on at least one
occasion;[^9] both reported the same homeownership/renting status,
length of time at the address, and number of children; attended the same
UK Biobank assessment center on the same day; both reported living with
their spouse ("husband, wife or partner"); and consisted of one male and
one female. We also eliminate all pairs where either spouse appeared
more than once in the data. This leaves a total of
`r pretty(nrow(mf_pairs))` pairs.[^10]
[^9]: A typical UK postcode contains about 15 properties.
[^10]: In the appendix, we test the validity of our matching process by
counting the proportion of pairs who had a shared genetic child, in
a subsample of the data. We also check whether any misidentified
pairs might have biased our results, by constructing a dataset of
"known fake pairs".
Our Norway data comes from the Norway Mother, Father and Child Cohort
Study (MoBa), a population-based study of pregnant women and their
partners and children [@magnus2016cohort; @paltiel2014biobank].
Participants were recruited from all over Norway from 1999-2008. 41% of
women consented to participation. In this paper, we use about 100,000
genotyped individuals and about 45,000 genotyped spouse pairs. The
Norway data has some advantages over UK Biobank, including higher
participation, larger sample size, and spouse pairs which are known
rather than inferred. On the other hand it is missing some variables,
including IQ measures and self-reported health.
Our key dependent variable is spouse's *Polygenic Score for Educational
Attainment* (PSEA). A polygenic score is a DNA-derived summary measure
of genetic risk or propensity for a particular outcome, created from
summing small effects of many common genetic variants, known as Single
Nucleotide Polymorphisms (SNPs). We focus on PSEA rather than other
polygenic scores for two reasons. First, educational attainment plays a
key role in human mate search. People are attracted to educated
potential partners [@buss1986preferences; @belot2013dating]; spouse
pairs often have similar levels of educational attainment, as well as
similar PSEA [@vandenberg1972assortative; @schwartz2005trends;
@greenwood2014marry; @hugh2016assortative; @torvik2022modeling]. Second,
PSEA predicts a set of important socioeconomic variables, including not
only education but also social and geographic mobility, IQ, future
income and wealth [@belsky2016genetics; @barth2020genetic;
@papageorge2020genes].[^11]
[^11]: See @papageorge2020genes for a detailed discussion of polygenic
scores aimed at economists.
```{r calc-psea-check}
mod_own_psea <- feols(university ~ EA3, data = famhist, notes = FALSE)
mod_own_psea_within <- feols(university ~ EA3 | sib_group, data = famhist,
notes = FALSE)
n_own_psea <- glance(mod_own_psea)[["nobs"]]
n_own_psea_within <- glance(mod_own_psea_within)[["nobs"]]
r2_own_psea <- glance(mod_own_psea)[["r.squared"]]
r2_own_psea_moba <- 0.08136 # calculated by Fartein
mod_own_psea <- tidy(mod_own_psea)
mod_own_psea_within <- tidy(mod_own_psea_within)
eff_own_psea <- mod_own_psea[[2, "estimate"]]
pval_own_psea <- mod_own_psea[[2, "p.value"]]
eff_own_psea_within <- mod_own_psea_within[[1, "estimate"]]
pval_own_psea_within <- mod_own_psea_within[[1, "p.value"]]
stopifnot(pval_own_psea < 2e-16)
stopifnot(pval_own_psea_within < 2e-16)
n_with_job_codes <- sum(! is.na(mf_pairs_reg$first_job_pay.x))
```
PSEA in the UK was calculated using per-SNP summary statistics from
@lee2018gene, re-estimated excluding UK Biobank participants; in Norway,
using statistics from @okbay2022polygenic. The score was normalized to
have mean 0 and variance 1. Because polygenic scores are created from
estimates of many small effects, they contain a large amount of noise
relative to the true best estimator that could be derived from genetic
data. For instance, PSEA explains only 11–13% of variance in educational
attainment [@lee2018gene], whereas the true proportion explained by
genetic variation -- the heritability -- is estimated from twin studies
to be about 40% [@branigan2013variation]. Also, polygenic scores are no
more guaranteed to be causal than any other independent variable. For
example, social stratification by ancestry may lead genes to be
associated with educational attainment even if they play no causal role
[@selzam2019comparing].
Despite these points, PSEA has non-trivial estimated effects on
educational attainment. PSEA correlates with measures of education,
including university attendance and years of full-time education. Effect
sizes are smaller but still non-trivial in within-siblings regressions
[@lee2018gene], where they can be interpreted as causal, since genetic
variation across siblings is guaranteed to be random by the biological
mechanism involved -- the "lottery of meiosis" (see below). We recheck
these facts within the UK Biobank sample. In a simple linear regression
(N = `r pretty(n_own_psea)`) of university attendance on PSEA, a
one-standard-deviation increase in PSEA was associated with a
`r pretty(eff_own_psea*100)` percentage point increase in the
probability of university attendance ($p < 2 \times 10^{-16}$). In a
within-siblings regression among genetic full siblings (N =
`r pretty(n_own_psea_within)`), the increase was
`r pretty(eff_own_psea_within * 100)` percentage points ($p < 2 \times
10^{-16}$). This suggests that about half of the raw correlation of PSEA
with university attendance is down to environmental confounds like
parental nurture, while the remainder is causal [cf. @lee2018gene].
Still, the causal effect remains substantial: for a rough comparison,
the (ITT) effect on college attendance of the Moving To Opportunity
experiment in the US was 2.5 percentage points [@chetty2016effects].
We use two measures of socio-economic status: income, and university
attendance. Income is a direct measure of SES. University attendance is
a predictor of income over the whole life course, and a form of SES in
itself. The MoBa data includes both university attendance and income. UK
Biobank includes university attendance, but only has a direct measure of
current household income, which is inappropriate for our purposes
because it includes income from both spouses and is measured after
marriage. Instead, we estimate income in the respondent's first job, by
matching the job's Standard Occupational Classification (SOC) code with
average earnings by SOC from @ONS2007ASHE. Job codes are only available
for a subset of respondents. We convert income to a z score among each
group of respondents with the same gender and year of birth.
Figure \@ref(fig:pic-basic-corr) illustrates the core idea of SGAM
within the UK Biobank data. The X axis shows a measure of one partner's
socio-economic status: university attendance or income. The Y axis plots
the other partner's mean PSEA. Both males and females who went to
university had spouses with higher PSEA. So did males and females with
higher income in their first job. Since DNA is inherited, these people's
children will also have higher PSEA.[^12]
[^12]: Figure \@ref(fig:pic-basic-corr-moba) in the appendix shows the
same plot for the MoBa sample.
```{r pic-basic-corr, fig.cap = "Spouse polygenic score for educational attainment (PSEA) against own university attendance and own income in first job (Great Britain). Lines show 95\\% confidence intervals. PSEA is normalized to have mean 0 and variance 1. Income is estimated from the respondent's first job, as the average income of the SOC job code.", fig.subcap = rep("", 4), fig.align = "center", fig.width = 3, fig.height = 3, fig.ncol = 2}
pic_ss <- stat_summary(fun.data = mean_cl_normal, na.rm = TRUE)
pic_cc <- coord_cartesian(ylim = c(-0.15, 0.25))
pic_theme <- theme(axis.title = element_text(size = 10),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank())
mf_pairs %>%
filter(! is.na(university.m)) %>%
mutate(University = ifelse(university.m, "Yes", "No")) %>%
ggplot(aes(University, EA3.f)) +
pic_ss +
pic_cc +
pic_theme +
labs(x = "Male spouse university attendance", y = "Female spouse PSEA")
mf_pairs %>%
filter(! is.na(university.f)) %>%
mutate(University = ifelse(university.f, "Yes", "No")) %>%
ggplot(aes(University, EA3.m)) +
pic_ss +
pic_cc +
pic_theme +
labs(x = "Female spouse university attendance", y = "Male spouse PSEA")
pic_ss <- stat_summary(
fun.data = mean_se,
fun.args = list(mult = 1.96),
na.rm = TRUE
)
pic_cc <- coord_cartesian(ylim = c(-0.15, 0.25))
mf_pairs %>%
filter(! is.na(first_job_pay.m)) %>%
mutate(Income = santoku::chop_deciles(first_job_pay.m, labels = 1:10)) %>%
ggplot(aes(Income, EA3.f)) +
pic_ss +
pic_theme +
labs(x = "Male spouse income decile", y = "Female spouse PSEA")
mf_pairs %>%
filter(! is.na(first_job_pay.f)) %>%
mutate(Income = santoku::chop_deciles(first_job_pay.f, labels = 1:10)) %>%
ggplot(aes(Income, EA3.m)) +
pic_ss +
pic_theme +
labs(x = "Female spouse income decile", y = "Male spouse PSEA")
```
These plots do not prove that SGAM is taking place. Since an
individual's own PSEA correlates with both their educational attainment,
and their income, both figures could be a result of genetic assortative
mating (GAM) alone [@hugh2016assortative]. Indeed, recent studies show
much higher levels of GAM than could be explained by matching on the
observed education phenotype alone [@okbay2022polygenic]. So, to
demonstrate SGAM, we need a source of social status which is exogenous
to genetics. Also, the link between social status and spouse genetics is
likely to be noisy, for three reasons: first, polygenic scores contain a
large amount of error, as discussed above; second, causal mechanisms
behind variation in social status are likely to be noisy; third, to
paraphrase @shakespeare1595midsummer, the spouse matching process is
highly unpredictable. So, we need a large N to give us sufficient power.
This rules out time-limited shocks such as changes to the school leaving
age [@Davies_2018].
We use *birth order*. It is known that earlier-born children receive
more parental care and have better life outcomes, including measures of
SES such as educational attainment and occupational status
[@Lindahl_2008; @booth2009birth; @black2011older].[^13] On the other
hand, all full siblings have the same *ex ante* expected genetic
endowment from their parents, irrespective of their birth order. This is
guaranteed by the biological mechanism of meiosis, which ensures that
any gene is transmitted from either the mother or the father to the
child, with independent 50% probability [@mendel1865experiments;
@lawlor2008mendelian]. For example, siblings' expected polygenic score
is equal to the mean of their parents' polygenic scores.[^14] We can
therefore use birth order as a "shock" to social status. "Shock" is in
quotes because we do not claim that birth order is exogenous to all
other variables. For example, it naturally correlates with parental age,
and it may also correlate with household SES at the time of birth. We
only claim that birth order is exogenous to genetic variation.
[^13]: Earlier work was ambiguous on the effects of birth order [e.g.
@hauser1985birth; @hanushek1992trade]. However, this work often used
unrepresentative samples and/or did not control for family size or
parental age. More recent work improves on this and shows clear
birth order effects. @kantarevic2006birth show that parental age is
an important confound for birth order. @black2005more show
substantial birth order effects in the whole Norwegian population,
even in a family fixed-effects specification, and after controlling
for mother's age. @booth2009birth examine UK families, controlling