This example runs through the process of: 0. Getting the data
- Converting plink data to mixPainter (chromopainter) format
- Running ADMIXTURE
- Running mixPainter in a way that allows comparisons to the ADMIXTURE run
- Getting the data out of mixPainter and into the badMIXTURE R package.
From a linux or mac terminal, copy the example data from our repository:
## Get the data
## For linux:
alias mywget=wget
## For mac OSX:
if [ `uname` == "Darwin" ]; then
alias mywget="curl -O"
fi
files="Recent_admix.bim Marginalisation_admix.bim Remnants_admix.bim Remnants_admix.fam Recent_admix.fam Marginalisation_admix.fam Marginalisation_admix.bed Recent_admix.bed Remnants_admix.bed"
for file in $files; do
mywget https://people.maths.bris.ac.uk/~madjl/finestructure/badmixture/$file
done
## Get the scripts to process the data
git clone https://github.com/danjlawson/badMIXTUREexample.git
## Note that this stage requires git. You technically don't need it; you can download the scripts manually if you would prefer. put them in a folder called "badMIXTUREexample" under your working directory for all the paths to work correctly.
Follow the instructions at https://github.com/danjlawson/badMIXTURE. Specifically:
install.packages("devtools")
library(devtools)
install_github("danjlawson/badMIXTURE")
## These are the tools we need. You can either put them all in your path, or update the variables with their full path
plink="plink1.9" # Available from https://www.cog-genomics.org/plink2
plink2chromopainter="plink2chromopainter.pl" # included in the finestructure download: https://people.maths.bris.ac.uk/~madjl/finestructure/finestructure.html
convertrecfile="convertrecfile.pl" # included in the finestructure download
makeuniformrecfile="makeuniformrecfile.pl" # included in the finestructure download
admixture="admixture" # available from https://www.genetics.ucla.edu/software/admixture/download.html
./doprocess.sh # This generates calls to "convert.sh" that will process the three datasets.
## AT THIS POINT YOU SHOULD READ AND CHECK convert.sh!! Understanding that is the point of this exercise!
## In particular, if you just run the scripts, they will use 8 cores of your machine for several days. You might not want this!
The included script "convert.sh" explains how we did everything for the paper. The process is:
- Run plink to convert the file to ped/map 1a. (Optionally, prune the data down to fewer SNPs)
- Run plink2chromopainter.pl to convert to chromopainter format
- (Optionally, create a recombination map with either makeuniformrecfile.pl or convertrecfile.pl)
- Run mixPainter
Each of these steps is a single command.
## Additional information
ADMIXTURE takes 147m using 8 cores to process the complete dataset. mixPainter takes 1962m using 8 cores. They scale similarly with the number of SNPs (linearly) and the number of individuals (mixPainter is quadratic).