When is it appropriate to use add_mass_migration()? #1911
-
Hello again! I am interested in studying the coalescent properties of introgression, which for me involves comparing simulated results vs analytical solutions. However, I just noticed a small quirk that I didn't initially think about. In the legacy version of import msprime
def one_pulse(
seq_len,
Ne_P1,
Ne_P2,
Ne_P3,
Tp3,
Tp2,
Tgf,
mu,
gamma,
r=None,
num_seq=1,
random_seed=None,
):
"""
Simulates three populations ((P1, P2), P3) where the only
demographic events are population divergences and one pulse of
introgression. Sequence length is in base pairs and both mutation and
recombination rate are in base pairs per generation. This function returns
a tree sequence file.
"""
# Define simulation parameters
samp_size = num_seq # Number of sequences
prob_introgress = gamma # Probability of a lineage introgressing
# Define samples
samples = [msprime.Sample(population=0,time=0)]*samp_size # P1
samples.extend([msprime.Sample(population=1,time=0)]*samp_size) # P2
samples.extend([msprime.Sample(population=2,time=0)]*samp_size) # P3
# Configure populations with a constant population size
population_configurations = [
msprime.PopulationConfiguration(initial_size=Ne_P1),
msprime.PopulationConfiguration(initial_size=Ne_P2),
msprime.PopulationConfiguration(initial_size=Ne_P3),
]
# Define demographic events
demographic_events = [
msprime.MassMigration(
time=Tgf, source=1, destination=2, proportion=prob_introgress, # Introgression event
),
msprime.MassMigration(
time=Tp2, source=1, destination=0, proportion=1.0, # P1 and P2 merge
),
msprime.MassMigration(
time=Tp3, source=2, destination=0, proportion=1.0, # P12 and P3 merge
),
]
# Simulate in msprime
ts = msprime.simulate(
samples=samples,
length=seq_len,
recombination_rate=r,
population_configurations=population_configurations,
demographic_events=demographic_events,
random_seed=random_seed,
)
return ts When "migrating" over to the new import msprime
def IUA_pop_split(
N,
f,
Tgf,
Tp2,
Tp3,
donor='P3',
recipient='P2',
r=None,
reps=None,
sequence_length=None,
random_seed=None,
):
"""
Simulates three lineages ((P1, P2), P3) where the only
demographic events are population divergences and one pulse
of introgression. This function returns a tree sequence file.
"""
# Intialize the demographic model.
demography = msprime.Demography()
# We assume constant and equal effective population sizes for
# all lineages.
demography.add_population(name='P1', initial_size=N)
demography.add_population(name='P2', initial_size=N)
demography.add_population(name='P3', initial_size=N)
demography.add_population(name='P12', initial_size=N)
demography.add_population(name='P123', initial_size=N)
# Introgression from P3 to P2 with a probaility f at time Tgf.
demography.add_mass_migration(
time=Tgf, source=recipient, dest=donor, proportion=f,
)
# P1 and P2 merge into P12
demography.add_population_split(
time=Tp2, derived=['P1', 'P2'], ancestral='P12'
)
# P12 and P3 merge into P123
demography.add_population_split(
time=Tp3, derived=['P12', 'P3'], ancestral='P123'
)
# Simulate!
ts = msprime.sim_ancestry(
samples=[
msprime.SampleSet(1, ploidy=1, population='P1'),
msprime.SampleSet(1, ploidy=1, population='P2'),
msprime.SampleSet(1, ploidy=1, population='P3'),
],
recombination_rate=r,
demography=demography,
random_seed=random_seed,
num_replicates=reps,
)
# As a sanity check let's also print the demograpghy debugger.
#print(demography.debug())
return ts Then I came across @jeromekelleher and Konrad Lohse's introgression example in the import msprime
# Define an IUA model of introgression.
def IUA_migration(
N,
f,
Tgf,
Tp2,
Tp3,
donor='P3',
recipient='P2',
r=None,
reps=None,
sequence_length=None,
random_seed=None,
):
"""
Simulates three lineages ((P1, P2), P3) where the only
demographic events are population divergences and one pulse
of introgression. This function returns a tree sequence file.
"""
# Intialize the demographic model.
demography = msprime.Demography()
# We assume constant and equal effective population sizes for
# all lineages.
demography.add_population(name='P1', initial_size=N)
demography.add_population(name='P2', initial_size=N)
demography.add_population(name='P3', initial_size=N)
# Introgression from P3 to P2 with a probaility f at time Tgf.
demography.add_mass_migration(
time=Tgf, source=recipient, dest=donor, proportion=f,
)
# P1 and P2 merge into P12
demography.add_mass_migration(
time=Tp2, source='P2', dest='P1', proportion=1,
)
# P12 and P3 merge into P123
demography.add_mass_migration(
time=Tp3, source='P3', dest='P1', proportion=1,
)
# Simulate!
ts = msprime.sim_ancestry(
samples=[
msprime.SampleSet(1, ploidy=1, population='P1'),
msprime.SampleSet(1, ploidy=1, population='P2'),
msprime.SampleSet(1, ploidy=1, population='P3'),
],
recombination_rate=r,
demography=demography,
random_seed=random_seed,
num_replicates=reps,
)
# As a sanity check let's also print the demograpghy debugger.
#print(demography.debug())
return ts So I then planned to compare the two models by calculating the average internal branch length out of 1,000,000 replicates that would generate an ABBA or BABA site and compare the two models against the analytical solutions provided by Durand et al, 2011. This process required a bit more code, so for brevity, I will just show the results but I am happy to share the code and output files if need be! Tables: f represents the introgression proportion, E[X] represents the analytical solutions, Mig (X) represents simulations using the
Lastly, I made a plot to see if there were any stark differences. So this is a long-winded way to ask when studying introgression does it make more sense to model it using the Thanks in advance and I apologize in advance if anything I said was unclear. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Update: After writing my first post I realized I should probably also benchmark the model written in the legacy version of
Since the model using Thanks again! |
Beta Was this translation helpful? Give feedback.
-
The "mass migration" is a power-user feature @David-Peede, and can be used to imitate the effects population split and admixture events. It's appropriate to use mass migrations whenever you're modelling something that isn't a population split or admixture, so the introgression situation is one of these occasions. I'm guessing the tskit tutorial doesn't use population splits because it was ported from the old code as simply as possible, and we didn't think of using the population split (PR welcome if you'd like to update it!) As an aside, have you considered using Demes to construct your model?There is excellent tooling available for visualising models in demes and it's quite a bit more readable than the msprime notation. |
Beta Was this translation helpful? Give feedback.
The "mass migration" is a power-user feature @David-Peede, and can be used to imitate the effects population split and admixture events. It's appropriate to use mass migrations whenever you're modelling something that isn't a population split or admixture, so the introgression situation is one of these occasions. I'm guessing the tskit tutorial doesn't use population splits because it was ported from the old code as simply as possible, and we didn't think of using the population split (PR welcome if you'd like to update it!)
As an aside, have you considered using Demes to construct your model?There is excellent tooling available for visualising models in demes and it's quite a bit more re…