two taxon asymmetry

This repository contains Mathematica and python notebooks associated with Mackintosh and Setter (2024), as well as a python script to estimate A_m from a VCF file.

The Mathematica notebook shows how the generating function of genealogical branch lengths (Lohse et al. 2011) can be used to obtain (both marginal and conditional) expectations for the length of external branches, and therefore A_m.

The python notebooks use coalescent simulations with msprime (Baumdicker et al. 2021) to estimate A_m for arbitrary demographic histories, block sizes and recombination rates. Notebook 1 lays out the general approach, while notebooks 2-4 simulate specific demographic histories. Notebook 5 explores the non-monotone behaviour of A_m under a model of migration with asymmetric population sizes.

The script estimate_Am.py can be used to estimate A_m from a (filtered) VCF file. The VCF does not have to have a tabix index, but the script is much faster if there is one.

[Options]
    -v, --vcf <STR>                             VCF file
    -a, --a_samples <STR>                       A comma delimited list of samples from population A, e.g. sample_1,sample_2,sample_3
    -b, --b_samples <STR>                       A comma delimited list of samples from population B
    -j, --jackknife_blocksize <INT>             Number of SNPs in each block (aim for >20 blocks) [default: 100_000]
    -r, --rollover <STR>                        Allow jackknife blocks to roll over between sequences [default: False]
    -s, --span <INT>                            Look for het' mutations within this many bases of a hetAB
    -h, --help                                  Show this message

An example command would be:

estimate_Am.py -v heliconius_20220202.vcf.gz -a ros.CJ2071.m -b chi.CJ565.m -j 1_000_000 -s 8 > heliconius_Am_8.out

Note that a given value of -s corresponds to block_size = (2 * value) + 1. For example, searching 4 bases either side of a hetAB mutation is equivalent to a block size of 9 bases. It is always recommended to calculate A_m across a range of block sizes. This can be done using a loop, although this is much slower than submiting commands in parallel.

for i in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768; do estimate_Am.py -v heliconius_20220202.vcf.gz -a ros.CJ2071.m -b chi.CJ565.m -j 1_000_000 -s $i; done | grep "Final" > heliconius_Am.out

The script requires the modules docopt, scikit-allel and numpy, as well as tabix from htslib. These can all be installed via conda.

conda install conda-forge::docopt conda-forge::scikit-allel anaconda::numpy bioconda::tabix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

two taxon asymmetry

Files

README.md

Latest commit

History

README.md

File metadata and controls

two taxon asymmetry