This repository contains the data and code used in a manuscript Estimating the change in soccer's home advantage during the Covid-19 pandemic using bivariate Poisson regression by Luke Benz and Michael Lopez. A pre-print of our manuscript is available here.
- league_info.csv: csv of general information on 17 European Leagues used in this analysis. Most important are the
restart_date
(the date the league returned to play following the pause of its season to the Covid-19 pandemic) andfbref_league_id
, the league's unique id on Football Reference. - fbref_scraper.R: R script for scraping game level statistics for each league from Football Reference
- fbref_data/: Folder containing all data used for this project. Each of the 17 leagues has its own folder, containing 5 csv files of game level statistics for games played that year.
- models/cards/ R scripts for fitting yellow card models (Model (4) in paper)
- stan/cards/ Stan files for fitting yellow card models (Model (4) in paper)
- models/goals/ R scripts for fitting goal card models (Model (3) in paper)
- stan/cards/ Stan files for fitting goal card models (Model (3) in paper)
- models/empirical_baselines.R: Saves models/empirical_baselines.csv for empirical Bayes priors in Models (3) and (4) with no-zero correlation.
.rds
files of posterior draws are available for goals and yellow card models for each league.
- bvp_goals_no_corr/: Folder of posterior draw
.rds
objects for Model (3) in paper - bvp_goals_lambda_3/: Folder of posterior draw
.rds
objects for Model (4) in paper
.rds
files of model objects are available for goals and yellow card models for each league.
- bvp_goals_no_corr/: Folder of
.rds
model objects for Model (3) in paper - bvp_goals_lambda_3/: Folder of
.rds
model objects for Model (4) in paper
- simulations/simulation.R R script for running simulations described in Section (4).
- simulations/biv_pois.stan STAN file for running simulations described in Section (4).
- simulations/paired_comp.stan STAN file for running simulations described in Section (4).
- simulations/sim_files/: Folder of saved simulation results (specifically, see v2_sims/ for the most up to date versions in version 2 of the paper).
- paper_figures/: R scripts to produce all figures and tables we present in the manuscript.
- eda/: Folder of old or draft versions of analysis.
- helpers.R: R script of useful helper functions
In order to replicate the entire model fitting process simply run:
- models/goals/bvp_goals_no_corr.R Model (3) presented in our manuscript.
- models/cards/bvp_yc_lambda3.R Model (4) presented in our manuscript.
Such a script takes the following steps:
- Creates model-specific directories for saving both model objects and posterior
.rds
files. - Reads in data for a specific league from fbref_data/ folder, via
read_league_csvs()
helper function in helpers.R. - Filters data to relevant games.
- Prepares data for use by Stan.
- Sources corresponding Stan file and fits model.
- Saves league specific model
.rds
object into model-specific folder. - Saves posterior draws
.rds
obeject into model-specific folder. - Repeat 2-7 for each of 17 leagues.
Model (4), our yellow card model which assumes correlation > 0, relies on empirical baselines as priors from a version of the model fit with no correlation.
In order to fully reproduce the results in our paper, one would run the following scripts in order:
- models/goals/bvp_goals_no_corr.R: Model (3) presented in our manuscript.
- models/cards/bvp_yc_no_corr.R: For yellow card priors when fitting Model (4).
- models/empirical_baselines.R: Extracting posterior means of no-correlataion versions of models for use in priors when fitting verions of models with correlation.
- models/cards/bvp_yc_lambda3.R Model (4) presented in our manuscript.
- models/goals/bvp_goals_lambda3.R: Verion of Model (3) w/ corrletion; not presented in manuscript.
Note the rstan
package is required to work with model objects and/or run the modeling scripts. For assistance installing Stan, please refer to the official documentation.