This pipeline was built for the Peter et al 2019 manuscript on applying EEMS to a number of human populations and compares the results to PCA on the same datasets. The pipeline share here includes a workflow that comparisonn between several additional methods (listed below).
As some of the data used requires permission, we are not free to redistribute it. To re-generate all figures from the paper, it will be necessary to
- acquire access to all data and create the master data set as described in the merge-pipeline
- change paths in
config/config.json
to reflect your working environment - run
snakemake all
Genotypic data is stored in plink format.
Metadata/location data is stored using the
PopGenStructures
data format, with some minor (recommended) changes.
The pipeline is implemented using Snakemake,
using python
for most data wrangling and R
for most plotting