To infer multiple-wave admixture by fitting ALD using a p-spectrum
iMAAPs is a powerful tool to estimate multiple-wave population admixed time, which is currently designed to infer the two-way, multiple-wave admixture based on admixture induced LD. This software can deal with genotype data, haplotype data and the data re-coded according to admixture ancestries.
iMAAPs V1.0.0
Infer Multiple-wave Admixture by fitting Admixture LD using Polynomial spectrum
The iMAAPs is a powerful tool to estimate multiple-wave population admixed time, which is a software currently designed to infer the two-way, multiple-wave admixture based on admixture induced LD.
For any technical questions, please contact {yingzhou&yuankai}@picb.ac.cn
iMAAPs is currently developed for Linux environments. The source code is availble at website: http://www.picb.ac.cn/PGG/resource.php To compile iMAAPs, GSL, FFTW and OpenMP should be installed properly. Please see details at
GSL http://www.gnu.org/software/gsl/ FFTW http://www.fftw.org/ OpenMP http://www.openmp.org/
Then go to the main directory of iMAAPs, "make" and then you get the executable file iMAAPs.
The basic command follows as:
./iMAAPs -p [parameter file]
A simple example:
./iMAAPs adm.par > adm.log
The parameter file contains the inputs and parameters, and using “./iMAAPs -P” or “./iMAAPs -help para” to get a brief description of parameter file.
The file(*.list) contains all the data input of genotype files and SNP ananotation. These files are seperated into different chromosomes. 1st column: Chromosome ID 2nd column: Genotype files 3rd contains: SNP files
The genotype is in EIGENSTRAT format. Each columns represents a sample individual and each row represents position for that SNP. We use 0, 1 and 2 to code the binary data, where 9 is missing. Markers with more than two states should be deleted from analyis. Any characters different from 0,1 and 2 are dealled as missing in our programe without any warning.
Each file contains the SNP information for one chromosome. 1st column: Chromosome. Use X for X chromosome. 2nd column: SNP name 3rd column: Genetic position (in Morgans) 4th column: Physical position (in bp) 5th and 6th columns: (Optional) reference alleles and variant alleles
Time file contains the time points (generation) used to test admixture signals. The defalt time file is suggested to regular use.
Individual files describe the basic information of the sample we used 1st column: Individual ID 2nd column: Gender 3rd column: Population Label
Basic setting: Inputfilelist List of input files. Default: - Indfile Individual notation. Default: - Admpop Label for admixed population. Default: - Refpops Labels for reference populations. Default: - Timefile Time file for testing the admixture signals. Default: - Outdir Output directory Default: ./
Advanced setting:
Jackknife Whether applied jack knife test. Default: off
"1" "on" "ON" stands for open.
"0" "off" "OFF" stands for close.
Mindis Minimum distance of two WLD bins(in Morgan). Defalut: 0.0002
Maxdis Maximum distance of two WLD bins(in Morgan). Default: 0.3
Binsize Bin size of WLD bin(in Morgan) Default: 0.0002
Iter Number of iteration to the final fitting results.
Default: 100000
Num_threads Number of threads used globally. Default: 1
Num_threads_wld Number of threads used for WLD calculation. Default: equal to num_threads
iMAAPs has two main output: *.rawld and *.ad. *.rawld records the Weighted LD we calculated. *.ad records the fitting results for the weighted LD for both overall fitting and for fitting with each Jackknife. For each time point, the exponential coefficients at the time point are listed at that row. For a better summary, a R script is provided to deal with these data.
Suppose you have installed iMAAPs and related package successfully, then go the example directory:
../iMAAPs/iMAAPs -p ADM.par > ADM.log
You get the output file ADM.rawld and ADM.ad
install.package("iMAAPsmry_0.1.tar.gz");
library(iMAAPsmry);
data(ad_inp)#or#ad_inp<-read.csv(file="../example/ADM.ad",header=T,sep="\t");
data(rawld_inp)#or#rwald_inp<-read.csv(file="../example/ADM.rawld",header=T,sep="\t");
x<-iMAAPsmry(ad_inp=ad_inp,rawld_inp=rawld_inp,denoise_cut=0.99, num_chr=10);
#summary spectrum
> x$sum_spectrum
gen mag
[1,] 0 0.0017808938
[2,] 47 0.0078082779
[3,] 48 0.0166346230
[4,] 49 0.0213023051
[5,] 50 0.0241563812
[6,] 51 0.0281666768
[7,] 52 0.0283945524
[8,] 53 0.0246115749
[9,] 54 0.0195652641
[10,] 55 0.0140609798
[11,] 56 0.0093997124
[12,] 57 0.0003601793
#estimated admixture time
> x$time
median SD pvalue mag mag_sd prop
[1,] 0.00000 0.000000 1.152493e-08 0.001780894 0.0003210443 0.1919001
[2,] 51.48246 2.394118 2.834765e-08 0.028394552 0.0063921496 0.8080999
The parameters file(*.par) composes parameters to run iMAAPs:
Timefile lists the time points for testing the admixture signals. We strongly suggest you to use default value without any modification unless you knew well about the basic algorithm.
Mindis controls the starting distance of weigthed LD used for exponential fitting. This parameter should be properly set according to the marker density. Small Mindis requires dense markers.
Maxdis controls the upboundary of the distance between two markers used for exponential fitting. Maxdis should be smaller than the shortest chromosome used for inference. Binsize controls the stepwise distance for calculating the average of weighted LD. Small Binsize requires dense markers.
Iter controls the number of Iteration to get the accurate fitting results. We suggest to use the default value.
Jackknife is used for calculating the significance of the admixture signals. For a quick scan of the data, you can put off Jackknife.
Num_threads controls the number of threads used globally, this value is no bigger than the number of the chromosomes used for inference.
Num_threads_wld controls the number of threads used for calculating weighted LD, which is specified for the memory control when dealling NGS data. If you have large memory(>128G for each fitting process) or you should set it to a small value, say 3.
If you use iMAAPs
in your research, please cite
Zhou Y, Yuan K, Yu Y, Ni X, Xie P, Xing EP, Xu S. Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions. Heredity (Edinb). 2017 May;118(5):503-510. doi: 10.1038/hdy.2017.5. Epub 2017 Feb 15. PMID: 28198814; PMCID: PMC5564381.