Liu and Fu's StairwayPlot program modified by Dan Schrider to estimate a mispolarization parameter.
This repository contains the code for a slightly modified version Liu and Fu's StairwayPlot program for inferring population size histories [1]. This program takes as input an unfolded site frequency spectrum, and assumes the ancestral state for each polymorphism is known. However, there is always some uncertainty surrounding the ancestral state. This version of StairwayPlot estimates the value of one additional parameter, the probability that the inferred ancestral state is actually the derived allele. In practice, this probability can be quite small, while still substantially skewing the site frequency spectrum, particularly inflating the number of high frequency derived alleles. By estimating a mispolarization parameter, and modifying the likelihood function accordingly, we can mitigate the influence of mispolarization on the inferred population size history.
In practice, you can use this program in the same manner as the original. However, there are a few changes to the output. First, there will be two additional output lines created by the Stairway_plot_output_summary_commandline program (you must use this rather than the GUI version, which I may have mistakenly incldued in the repository). The first is a header line (always "pMisPol_median\tpMisPol_2.5%\tpMisPol_97.5%"), and the second gives the median, 2.5th percentile, and 97.5th percentile estimates of the mispolarization parameter. These two lines show up just before the lines that give the population size estimates for each time point (i.e. right before the header line that begins with "mutation_per_site"). Of course, the .addTheta files are modified as well, so that Stairway_plot_output_summary_commandline can read in the mispolarization rate estimates. But this doesn't matter unless you plan on looking at these intermediates. In particular, the only change to these files is the addition of the current mispolarization probability parameter estimate as the last field in each of the lines following the "fitness trail:" header (including the "final model:" line, which of course contains the final estimate of this parameter along with each of the other model parameters).
If you use this thing, be sure to cite Lui and Fu (reference given below), but also give the link to this repository so readers can find this version of the code. Maybe someone else would want to use it?
See readme02.txt (unmodified from the original program) for more information
[1] Liu X, Fu YX. Exploring population size changes using SNP frequency spectra. Nat Genet. 2015. 47:555-559. doi: 10.1038/ng.3254.