Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 2.78 KB

README.md

File metadata and controls

34 lines (22 loc) · 2.78 KB

two taxon asymmetry

This repository contains Mathematica and python notebooks associated with Mackintosh and Setter (2024), as well as a python script to estimate Am from a VCF file.

The Mathematica notebook shows how the generating function of genealogical branch lengths (Lohse et al. 2011) can be used to obtain (both marginal and conditional) expectations for the length of external branches, and therefore Am.

The python notebooks use coalescent simulations with msprime (Baumdicker et al. 2021) to estimate Am for arbitrary demographic histories, block sizes and recombination rates. Notebook 1 lays out the general approach, while notebooks 2-4 simulate specific demographic histories. Notebook 5 explores the non-monotone behaviour of Am under a model of migration with asymmetric population sizes.

The script estimate_Am.py can be used to estimate Am from a (filtered) VCF file. The VCF does not have to have a tabix index, but the script is much faster if there is one.

[Options]
    -v, --vcf <STR>                             VCF file
    -a, --a_samples <STR>                       A comma delimited list of samples from population A, e.g. sample_1,sample_2,sample_3
    -b, --b_samples <STR>                       A comma delimited list of samples from population B
    -j, --jackknife_blocksize <INT>             Number of SNPs in each block (aim for >20 blocks) [default: 100_000]
    -r, --rollover <STR>                        Allow jackknife blocks to roll over between sequences [default: False]
    -s, --span <INT>                            Look for het' mutations within this many bases of a hetAB
    -h, --help                                  Show this message

An example command would be:

estimate_Am.py -v heliconius_20220202.vcf.gz -a ros.CJ2071.m -b chi.CJ565.m -j 1_000_000 -s 8 > heliconius_Am_8.out

Note that a given value of -s corresponds to block_size = (2 * value) + 1. For example, searching 4 bases either side of a hetAB mutation is equivalent to a block size of 9 bases. It is always recommended to calculate Am across a range of block sizes. This can be done using a loop, although this is much slower than submiting commands in parallel.

for i in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768; do estimate_Am.py -v heliconius_20220202.vcf.gz -a ros.CJ2071.m -b chi.CJ565.m -j 1_000_000 -s $i; done | grep "Final" > heliconius_Am.out

The script requires the modules docopt, scikit-allel and numpy, as well as tabix from htslib. These can all be installed via conda.

conda install conda-forge::docopt conda-forge::scikit-allel anaconda::numpy bioconda::tabix