Skip to content

Epidemiological modeling framework for pathogen population genetics and evolution.

License

Notifications You must be signed in to change notification settings

pablocarderam/opqua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Opqua

DOI

opqua (opkua, upkua) [Chibcha/muysccubun]

I. noun. ailment, disease, illness

II. noun. cause, reason [for which something occurs]

Taken from D. F. GĂłmez Aldana's muysca-spanish dictionary.

Contents

About

Opqua is an epidemiological modeling framework for pathogen population genetics and evolution.

Opqua stochastically simulates pathogens with distinct, evolving genotypes that spread through populations of hosts which can have specific immune profiles.

Opqua is a useful tool to test out scenarios, explore hypotheses, make predictions, and teach about the relationship between pathogen evolution and epidemiology.

Among other things, Opqua can model

  • host-host, vector-borne, and vertical transmission
  • pathogen evolution through mutation, recombination, and/or reassortment
  • host recovery, death, and birth
  • metapopulations with complex structure and demographic interactions
  • interventions and events altering demographic, ecological, or evolutionary parameters
  • treatment and immunization of hosts or vectors
  • influence of pathogen genome sequences on transmission and evolution, as well as host demographic dynamics
  • intra- and inter-host competition and evolution of pathogen strains across user-specified adaptive landscapes

Check out the changelog file for information on recent updates.

Opqua has been used in-depth to study pathogen evolution across fitness valleys. Check out the peer-reviewed preprint on biorXiv, now peer-reviewed here.

Opqua is developed by Pablo Cárdenas.

The first publication using Opqua was created in collaboration with Vladimir Corredor and Mauricio Santos-Vega. Follow their science antics on BlueSky @pcr-guy or Twitter at @pcr_guy and @msantosvega.

Opqua is available on PyPI and is distributed under an MIT License.

Example Plots

These are some of the plots Opqua is able to produce, but you can output the raw simulation data yourself to make your own analyses and plots. These are all taken from the examples in the examples/tutorials folder—try them out yourself! See the [Requirements and Installation](#Requirements and Installation) and Usage sections for more details.

Population genetic composition plots for pathogens

An optimal pathogen genome arises through de novo mutation and outcompetes all others through intra-host competition. See fitness_function_mutation_example.py in the examples/tutorials/evolution folder. Fitness_function_mutation_example composition plot

An optimal pathogen genome arises through independent reassortment of chromosomes or genome segments, outcompeting all others through increases in transmissibility and intra-host competition. See transmissibility_function_reassortment_example.py in the examples/tutorials/evolution folder. Similar code can be used to achieve genetic recombination by setting num_crossover_host to greater than 0. Transmissibility_function_reassortment_example composition plot

Host/vector compartment plots

A population with natural birth and death dynamics shows the effects of a pathogen. "Dead" denotes deaths caused by pathogen infection. See vector-borne_birth-death_example.py in the examples/tutorials/vital_dynamics folder. Vector-borne_birth-death_example compartments plot

Plots of a host/vector compartment across different populations in a metapopulation

Pathogens spread through a network of interconnected populations of hosts. Lines denote infected pathogens. See metapopulations_migration_example.py in the examples/tutorials/metapopulations folder. Metapopulations_migration_example populations plot

Host/vector compartment plots

A population undergoes different interventions, including changes in epidemiological parameters and vaccination. "Recovered" denotes immunized, uninfected hosts. See intervention_examples.py in the examples/tutorials/interventions folder. Intervention_examples compartments plot

Pathogen phylogenies

Phylogenies can be computed for pathogen genomes that emerge throughout the simulation. See fitness_function_mutation_example.py in the examples/tutorials/evolution folder. Fitness_function_mutation_example clustermap plot

Mutant fitness landscapes

Mutant genotypes connect to each other across fitness landscapes according to their rate of emerging and establishing in intrahost evolution. See landscape_example.py in the examples/tutorials/landscapes folder. Mutation network screenshot plot

For advanced examples (including multiple parameter sweeps), check out this separate repository (preprint on biorXiv, now peer-reviewed here).

Requirements and Installation

Opqua runs on Python. A good place to get the latest version it if you don't have it is Anaconda.

Opqua is available on PyPI to install through pip, as explained below.

If you haven't yet, install pip:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

Install Opqua by running

pip install opqua

The pip installer should take care of installing the necessary packages. However, for reference, the versions of the packages used for opqua's development are saved in requirements.txt

Usage

To run any Opqua model (including the tutorials in the examples/tutorials folder), save the model as a .py file and execute from the console using python my_model.py.

You may also run the models from a notebook environment such as Jupyter or an integrated development environment (IDE) such as Spyder, both available through Anaconda.

Minimal example

The simplest model you can make using Opqua looks like this:

# This simulates a pathogen with genome "AAAAAAAAAA" spreading in a single
# population of 100 hosts, 20 of which are initially infected, under example
# preset conditions for host-host transmission.

from opqua.model import Model

my_model = Model()
my_model.newSetup('my_setup', preset='host-host')
my_model.newPopulation('my_population', 'my_setup', num_hosts=100)
my_model.addPathogensToHosts( 'my_population',{'AAAAAAAAAA':20} )
my_model.run(0,100)
data = my_model.saveToDataFrame('my_model.csv')
graph = my_model.compartmentPlot('my_model.png', data)

For more example usage, have a look at the examples folder. For an overview of how Opqua models work, check out the Materials and Methods section on the manuscript here. A summarized description is shown below in the How Does Opqua Work? section. For more information on the details of each function, head over to the Model Documentation section.

How Does Opqua Work?

Basic concepts

Opqua models are composed of populations containing hosts and/or vectors, which themselves may be infected by a number of pathogens with different genomes.

A genome is represented as a string of characters. All genomes must be of the same length (a set number of loci), and each position within the genome can have one of a number of different characters specified by the user (corresponding to different alleles). Different loci in the genome may have different possible alleles available to them. Genomes may be composed of separate chromosomes, separated by the "/" character, which is reserved for this purpose.

Each population may have its own unique parameters dictating the events that happen inside of it, including how pathogens are spread between its hosts and vectors.

Events

There are different kinds of events that may occur to hosts and vectors in a population:

  • contact between an infectious host/vector and another host/vector in the same population (intra-population contact) or in a different population ("population contact")
  • migration of a host/vector from one population to another
  • recovery of an infected host/vector
  • birth of a new host/vector from an existing host/vector
  • death of a host/vector due to pathogen infection or by "natural" causes
  • mutation of a pathogen in an infected host/vector
  • recombination of two pathogens in an infected host/vector

Events illustration

The likelihood of each event occurring is determined by the population's parameters (explained in the newSetup function documentation) and the number of infected and healthy hosts and/or vectors in the population(s) involved. Crucially, it is also determined by the genome sequences of the pathogens infecting those hosts and vectors. The user may specify arbitrary functions to evaluate how a genome sequence affects any of the above kinds of rates. This is once again done through arguments of the newSetup function. As an example, a specific genome sequence may result in increased transmission within populations but decreased migration of infected hosts, or increased mutation rates. These custom functions may be different across populations, resulting in different adaptive landscapes within different populations.

Contacts within and between populations may happen by any combination of host-host, host-vector, and/or vector-host routes, depending on the populations' parameters. When a contact occurs, each pathogen genome present in the infecting host/vector may be transferred to the receiving host/vector as long as one "infectious unit" is inoculated. The number of infectious units inoculated is randomly distributed based on a Poisson probability distribution. The mean of this distribution is set by the receiving host/vector's population parameters, and is multiplied by the fraction of total intra-host fitness of each pathogen genome. For instance, consider the mean inoculum size for a host in a given population is 10 units and the infecting host/vector has two pathogens with fitnesses of 0.3 and 0.7, respectively. This would make the means of the Poisson distributions used to generate random infections for each pathogen equal to 3 and 7, respectively.

Inter-population contacts occur via the same mechanism as intra-population contacts, with the distinction that the two populations must be linked in a compatible way. As an example, if a vector-borne model with two separate populations is to allow vectors from Population A to contact hosts in Population B, then the contact rate of vectors in Population A and the contact rate of hosts in Population B must both be greater than zero. Migration of hosts/vectors from one population to another depends on a single rate defining the frequency of vector/host transport events from a given population to another. Therefore, Population A would have a specific migration rate dictating transport to Population B, and Population B would have a separate rate governing transport towards A.

Recovery of an infected host or vector results in all pathogens being removed from the host/vector. Additionally, the host/vector may optionally gain protection from pathogens that contain specific genome sequences present in the genomes of the pathogens it recovered from, representing immune memory. The user may specify a population parameter delimiting the contiguous loci in the genome that are saved on the recovered host/vector as "protection sequences". Pathogens containing any of the host/vector's protection sequences will not be able to infect the host/vector.

Births result in a new host/vector that may optionally inherit its parent's protection sequences. Additionally, a parent may optionally infect its offspring at birth following a Poisson sampling process equivalent to the one described for other contact events above. Deaths of existing hosts/vectors can occur both naturally or due to infection mortality. Only deaths due to infection are tracked and recorded in the model's history.

De novo mutation of a pathogen in a given host/vector results in a single locus within a pathogen's genome being randomly assigned a new allele from the possible alleles at that position. Recombination of two pathogens in a given host/vector creates two new genomes based on the independent segregation of chromosomes (or reassortment of genome segments, depending on the field) from the two parent genomes. In addition, there may be a Poisson-distributed random number of crossover events between homologous parent chromosomes. Recombination by crossover event will result in all the loci in the chromosome on one side of the crossover event location being inherited from one of the parents, while the remainder of the chromosome is inherited from the other parent. The locations of crossover events are distributed throughout the genome following a uniform random distribution.

Interventions

Furthermore, the user may specify changes in model behavior at specific timepoints during the simulation. These changes are known as "interventions". Interventions can include any kind of manipulation to populations in the model, including:

  • adding new populations
  • changing a population's event parameters and adaptive landscape functions
  • linking and unlinking populations through migration or inter-population contact
  • adding and removing hosts and vectors to a population

Interventions can also include actions that involve specific hosts or vectors in a given population, such as:

  • adding pathogens with specific genomes to a host/vector
  • removing all protection sequences from some hosts/vectors in a population
  • applying a "treatment" in a population that cures some of its hosts/vectors of pathogens
  • applying a "vaccine" in a population that protects some of its hosts/vectors from pathogens

For these kinds of interventions involving specific pathogens in a population, the user may choose to apply them to a randomly-sampled fraction of hosts/vectors in a population, or to a specific group of individuals. This is useful when simulating consecutive interventions on the same specific group within a population. A single model may contain multiple groups of individuals and the same individual may be a member of multiple different groups. Individuals remain in the same group even if they migrate away from the population they were chosen in.

When a host/vector is given a "treatment", it removes all pathogens within the host/vector that do not contain a collection of "resistance sequences". A treatment may have multiple resistance sequences. A pathogen must contain all of these within its genome in order to avoid being removed. On the other hand, applying a vaccine consists of adding a specific protection sequence to hosts/vectors, which behaves as explained above for recovered hosts/vectors when they acquire immune protection, if the model allows it.

Simulation

Models are simulated using an implementation of the Gillespie algorithm (optionally with tau-leaping approximation) in which the rates of different kinds of events across different populations are computed with each population's parameters and current state, and are then stored in a matrix. In addition, each population has host and vector matrices containing coefficients that represent the contribution of each host and vector, respectively, to the rates in the master model rate matrix. Each coefficient is dependent on the genomes of the pathogens infecting its corresponding vector or host. Whenever an event occurs, the corresponding entries in the population matrix are updated, and the master rate matrix is recomputed based on this information.

Simulation illustration

Tau-leaping is achieved by setting a time step threshold under which the step size defaults to a larger fixed step size. The number of events of each type occurring in every population of the model within this time span is then calculated as a random value from a Poisson distribution with mean equal to the corresponding rate. All events for this time span are then executed. The choice of the time step threshold and the minimum step size determines the accuracy of the approximation. By default, the minimum step size is set to be the minimum of all intrahost growth threshold times for hosts and vectors across all model populations, or the minimum mean time before transmission. This ensures that the likelihood of multiple events occurring within the same host in the same tau-leap is low. The threshold time is lower by a factor of the number of event types across all populations to account for the fact that this tau-leaping approximation only becomes less intensive if the number of events in the leap is larger than the number of types of events. This threshold is excessively stringent but provides a good bound.

The model's state at any given time comprises all populations, their hosts and vectors, and the pathogen genomes infecting each of these. A copy of the model's state is saved at every time point, or at intermittent intervals throughout the course of the simulation. A random sample of hosts and/or vectors may be saved instead of the entire model as a means of reducing memory footprint.

Output

The output of a model can be saved in multiple ways. The model state at each saved timepoint may be output in a single, raw pandas DataFrame, and saved as a tabular file. Other data output types include counts of pathogen genomes or protection sequences for the model, as well as time of first emergence for each pathogen genome and genome distance matrices for every timepoint sampled. The user can also create different kinds of plots to visualize the results. These include:

  • plots of the number of hosts and/or vectors in different epidemiological compartments (naive, infected, recovered, and dead) across simulation time
  • plots of the number of individuals in a compartment for different populations
  • plots of the genomic composition of the pathogen population over time
  • phylogenies of pathogen genomes

Users can also use the data output formats to make their own custom plots.

Model Documentation

All usage is handled through the Opqua Model class. The Model class contains populations, setups, and interventions to be used in simulation. It also contains groups of hosts/vectors for manipulations and stores model history as snapshots for specific time points.

To use it, import the class as

from opqua.model import Model

You can find a detailed account of everything Model does in the Model attributes and Model class methods list sections.

Model class attributes

  • populations -- dictionary with keys=population IDs, values=Population objects
  • setups -- dictionary with keys=setup IDs, values=Setup objects
  • interventions -- contains model interventions in the order they will occur
  • groups -- dictionary with keys=group IDs, values=lists of hosts/vectors
  • history -- dictionary with keys=time values, values=Model objects that are snapshots of Model at that timepoint
  • global_trackers -- dictionary keeping track of some global indicators over all the course of the simulation
  • custom_condition_trackers -- dictionary with keys=ID of custom condition, values=functions that take a Model object as argument and return True or False; every time True is returned by a function in custom_condition_trackers, the simulation time will be stored under the corresponding ID inside global_trackers['custom_condition']
  • t_var -- variable that tracks time in simulations

The dictionary global_trackers contains the following keys:

  • num_events: dictionary with the number of each kind of event in the simulation
  • last_event_time: time point at which the last event in the simulation happened
  • genomes_seen: list of all unique genomes that have appeared in the simulation
  • custom_conditions: dictionary with keys=ID of custom condition, values=lists of times; every time True is returned by a function in custom_condition_trackers, the simulation time will be stored under the corresponding ID inside global_trackers['custom_condition']

The dictionary num_events inside of global_trackers contains the following keys:

  • MIGRATE_HOST
  • MIGRATE_VECTOR
  • POPULATION_CONTACT_HOST_HOST
  • POPULATION_CONTACT_HOST_VECTOR
  • POPULATION_CONTACT_VECTOR_HOST
  • CONTACT_HOST_HOST
  • CONTACT_HOST_VECTOR
  • CONTACT_VECTOR_HOST
  • RECOVER_HOST
  • RECOVER_VECTOR
  • MUTATE_HOST
  • MUTATE_VECTOR
  • RECOMBINE_HOST
  • RECOMBINE_VECTOR
  • KILL_HOST
  • KILL_VECTOR
  • DIE_HOST
  • DIE_VECTOR
  • BIRTH_HOST
  • BIRTH_VECTOR

KILL_HOST and KILL_VECTOR denote death due to infection, whereas DIE_HOST and DIE_VECTOR denote death by natural means.

Model class methods list

Model initialization and simulation

  • setRandomSeed -- set random seed for numpy random number generator
  • newSetup -- creates a new Setup, save it in setups dict under given name
  • saveSetup -- saves Setup parameters to given file location as a CSV file
  • loadSetup -- loads Setup parameters from CSV file at given location
  • newIntervention -- creates a new intervention executed during simulation
  • run -- simulates model for a specified length of time
  • runReplicates -- simulate replicates of a model, save only end results
  • runParamSweep -- simulate parameter sweep with a model, save only end results
  • copyState -- copies a slimmed-down representation of model state
  • deepCopy -- copies current model with inner references

Data Output and Plotting

  • saveToDataFrame -- saves status of model to data frame, writes to file
  • getCompositionData -- create dataframe with counts for pathogen genomes or resistance
  • getPathogens -- creates data frame with counts for all pathogen genomes
  • getProtections -- creates data frame with counts for all protection sequences
  • populationsPlot -- plots aggregated totals per population across time
  • compartmentPlot -- plots number of naive, infected, recovered, dead hosts/vectors vs time
  • compositionPlot -- plots counts for pathogen genomes or resistance vs. time
  • clustermap -- plots heatmap and dendrogram of all pathogens in given data
  • pathogenDistanceHistory -- calculates pairwise distances for pathogen genomes at different times
  • getGenomeTimes -- create DataFrame with times genomes first appeared during simulation
  • visualizeMutationNetwork -- creates interactive visualization of mutation network

Model interventions

Make and connect populations:

Manipulate hosts and vectors in population:

Modify population parameters:

  • setSetup -- assigns a given set of parameters to this population

Compute a fitness landscape:

Utility:

Preset fitness functions

  • peakLandscape -- evaluates genome numeric phenotype by decreasing with distance from optimal sequence
  • valleyLandscape -- evaluates genome numeric phenotype by increasing with distance from worst sequence

Detailed Model method list

setRandomSeed

setRandomSeed(seed)

Set random seed for numpy random number generator.

Arguments:

  • seed -- int for the random seed to be passed to numpy (int)

Model

Model()

Class constructor; create a new Model object.

newSetup

newSetup(name, preset=None, **kwargs)

Create a new Setup, save it in setups dict under given name.

Two preset setups exist: "vector-borne" and "host-host". You may select one of the preset setups with the preset keyword argument and then modify individual parameters with additional keyword arguments, without having to specify all of them. You may also not select a preset setup.

"host-host":

num_loci = 10
possible_alleles = 'ATCG'
allele_groups_host = #LIST:
allele_groups_vector = #LIST:
max_depth_host = 0
max_depth_vector = 0
peak_pathogen_population_host = 100
steady_pathogen_population_host = 100
peak_pathogen_population_vector = 100
steady_pathogen_population_vector = 100
population_threshold_host = 0
population_threshold_vector = 0
selection_threshold_host = 100
selection_threshold_vector = 100
generation_time_host = 1
max_generation_size_host = 10
generation_time_vector = 1
max_generation_size_vector = 10
max_generations_survival_host = 30
max_generations_survival_vector = 30
fitnessHost = (lambda g: 1)
contactHost = (lambda g: 1)
receiveContactHost = (lambda g: 1)
mortalityHost = (lambda g: 1)
natalityHost = (lambda g: 1)
recoveryHost = (lambda g: 1)
migrationHost = (lambda g: 1)
populationContactHost = (lambda g: 1)
receivePopulationContactHost = (lambda g: 1)
mutationHost = (lambda g: 1)
recombinationHost = (lambda g: 1)
fitnessVector = (lambda g: 1)
contactVector = (lambda g: 1)
receiveContactVector = (lambda g: 1)
mortalityVector = (lambda g: 1)
natalityVector = (lambda g: 1)
recoveryVector = (lambda g: 1)
migrationVector = (lambda g: 1)
populationContactVector = (lambda g: 1)
receivePopulationContactVector = (lambda g: 1)
mutationVector = (lambda g: 1)
recombinationVector = (lambda g: 1)
contact_rate_host_vector = 0
transmission_efficiency_host_vector = 0
transmission_efficiency_vector_host = 0
contact_rate_host_host = 2e-1
transmission_efficiency_host_host = 1
mean_inoculum_host = 1e1
mean_inoculum_vector = 0
variance_inoculum_host = 3
variance_inoculum_vector = 0
recovery_rate_host = 1e-1
recovery_rate_vector = 0
mortality_rate_host = 0
mortality_rate_vector = 0
recombine_in_host = 1e-4
recombine_in_vector = 0
num_crossover_host = 1
num_crossover_vector = 0
mutate_in_host = 1e-6
mutate_in_vector = 0
death_rate_host = 0
death_rate_vector = 0
birth_rate_host = 0
birth_rate_vector = 0
vertical_transmission_host = 0
vertical_transmission_vector = 0
inherit_protection_host = 0
inherit_protection_vector = 0
protection_upon_recovery_host = None
protection_upon_recovery_vector = None

"vector-borne":

num_loci = 10
possible_alleles = 'ATCG'
allele_groups_host = #LIST:
allele_groups_vector = #LIST:
max_depth_host = 0
max_depth_vector = 0
peak_pathogen_population_host = 100
steady_pathogen_population_host = 100
peak_pathogen_population_vector = 100
steady_pathogen_population_vector = 100
population_threshold_host = 0
population_threshold_vector = 0
selection_threshold_host = 100
selection_threshold_vector = 100
generation_time_host = 1
max_generation_size_host = 10
generation_time_vector = 1
max_generation_size_vector = 10
max_generations_survival_host = 30
max_generations_survival_vector = 30
fitnessHost = (lambda g: 1)
contactHost = (lambda g: 1)
receiveContactHost = (lambda g: 1)
mortalityHost = (lambda g: 1)
natalityHost = (lambda g: 1)
recoveryHost = (lambda g: 1)
migrationHost = (lambda g: 1)
populationContactHost = (lambda g: 1)
receivePopulationContactHost = (lambda g: 1)
mutationHost = (lambda g: 1)
recombinationHost = (lambda g: 1)
fitnessVector = (lambda g: 1)
contactVector = (lambda g: 1)
receiveContactVector = (lambda g: 1)
mortalityVector = (lambda g: 1)
natalityVector = (lambda g: 1)
recoveryVector = (lambda g: 1)
migrationVector = (lambda g: 1)
populationContactVector = (lambda g: 1)
receivePopulationContactVector = (lambda g: 1)
mutationVector = (lambda g: 1)
recombinationVector = (lambda g: 1)
contact_rate_host_vector = 2e-1
transmission_efficiency_host_vector = 1
transmission_efficiency_vector_host = 1
contact_rate_host_host = 0
transmission_efficiency_host_host = 0
mean_inoculum_host = 3
mean_inoculum_vector = 1
variance_inoculum_host = 3
variance_inoculum_vector = 1
recovery_rate_host = 1e-1
recovery_rate_vector = 1e-1
mortality_rate_host = 0
mortality_rate_vector = 0
recombine_in_host = 0
recombine_in_vector = 1e-4
num_crossover_host = 0
num_crossover_vector = 1
mutate_in_host = 1e-6
mutate_in_vector = 0
death_rate_host = 0
death_rate_vector = 0
birth_rate_host = 0
birth_rate_vector = 0
vertical_transmission_host = 0
vertical_transmission_vector = 0
inherit_protection_host = 0
inherit_protection_vector = 0
protection_upon_recovery_host = None
protection_upon_recovery_vector = None

Arguments:

  • name -- name of setup to be used as a key in model setups dictionary

Keyword arguments:

  • preset -- preset setup to be used: "vector-borne" or "host-host", if None, must define all other keyword arguments (default None; None or String)
  • **kwargs -- setup parameters and values, which may include the following:
  • num_loci -- length of each pathogen genome string (int > 0)
  • possible_alleles -- set of possible characters in all genome string, or at each position in genome string; to specify lists in parameter files, use prefix #LIST: (String or list of Strings with num_loci elements)
  • allele_groups -- relevant alleles affecting fitness, each element contains a list of strings, each string contains a group of alleles that all have equivalent fitness behavior; if single list provided, then those groups are used for all loci; to specify lists in parameter files, use prefix #LIST: (list of lists of Strings)
  • max_depth_host -- maximum number of mutations considered when evaluating establishment rates in hosts (integer >0)
  • max_depth_vector -- maximum number of mutations considered when evaluating establishment rates in vectors (integer >0)
  • peak_pathogen_population_host -- peak intrahost pathogen population (integer >0)
  • steady_pathogen_population_host -- semi steady-state intrahost pathogen population (integer >0)
  • peak_pathogen_population_vector -- peak intrahost pathogen population (integer >0)
  • steady_pathogen_population_vector -- semi steady-state intrahost pathogen population (integer >0)
  • population_threshold_host -- any intrahost variant that still drifts over this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
  • population_threshold_vector -- any intravector variant that still drifts over this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
  • selection_threshold_host -- any intrahost variant with a selection coefficient under this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
  • selection_threshold_vector -- any intravector variant with a selection coefficient under this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
  • generation_time_host -- pathogen replication cycle time in hosts (number >0)
  • generation_time_vector -- pathogen replication cycle time in vectors (number >0)
  • max_generation_size_host -- maximum growth within hosts, in units of pathogens per replication cycle (number>1)
  • max_generation_size_vector -- maximum growth within vectors, in units of pathogens per replication cycle (number>1)
  • max_generations_survival_host -- number of generations used to compute rates and probabilities of genotype emergence in hosts (integer)
  • max_generations_survival_vector -- number of generations used to compute rates and probabilities of genotype emergence in vectors (integer)
  • fitnessHost -- function that evaluates relative fitness in head-to-head competition for different genomes within the same host (function object, takes a String argument and returns a number >= 0)
  • contactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infector in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • receiveContactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infected in a contact event, based on genome sequence of pathogen
  • mortalityHost -- function that returns coefficient modifying death rate for a given host, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • natalityHost -- function that returns coefficient modifying birth rate for a given host, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • recoveryHost -- function that returns coefficient modifying recovery rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • migrationHost -- function that returns coefficient modifying migration rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • populationContactHost -- function that returns coefficient modifying population contact rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • receivePopulationContactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infected in a population contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • mutationHost -- function that returns coefficient modifying mutation rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • recombinationHost -- function that returns coefficient modifying recombination rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • fitnessVector -- function that evaluates relative fitness in head-to- head competition for different genomes within the same vector (function object, takes a String argument and returns a number >= 0)
  • contactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infector in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • receiveContactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infected in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • mortalityVector -- function that returns coefficient modifying death rate for a given vector, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • natalityVector -- function that returns coefficient modifying birth rate for a given vector, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • recoveryVector -- function that returns coefficient modifying recovery rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • migrationVector -- function that returns coefficient modifying migration rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • populationContactVector -- function that returns coefficient modifying population contact rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • receivePopulationContactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infected in a population contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • mutationVector -- function that returns coefficient modifying mutation rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • recombinationVector -- function that returns coefficient modifying recombination rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
  • contact_rate_host_vector -- ("biting") rate of host-vector contact events, not necessarily transmission, assumes constant population density; events/(vector*time) (number >= 0)
  • transmission_efficiency_host_vector -- fraction of host-vector contacts that result in successful transmission
  • transmission_efficiency_vector_host -- fraction of vector-host contacts that result in successful transmission
  • contact_rate_host_host -- rate of host-host contact events, not necessarily transmission, assumes constant population density; events/time (number >= 0)
  • transmission_efficiency_host_host -- fraction of host-host contacts that result in successful transmission
  • mean_inoculum_host -- mean number of pathogens that are transmitted from a vector or host into a new host during a contact event (int >= 0)
  • mean_inoculum_vector -- mean number of pathogens that are transmitted from a host to a vector during a contact event (int >= 0)
  • variance_inoculum_host -- variance in number of pathogens that are transmitted from a host/vector to a host during a contact event (num >=0)
  • variance_inoculum_vector -- variance in number of pathogens that are transmitted from a host to a vector during a contact (num >=0)
  • recovery_rate_host -- rate at which hosts clear all pathogens; 1/time (number >= 0)
  • recovery_rate_vector -- rate at which vectors clear all pathogens; 1/time (number >= 0)
  • recovery_rate_vector -- rate at which vectors clear all pathogens; 1/time (number >= 0)
  • mortality_rate_host -- rate at which infected hosts die from disease; 1/time (number >= 0)
  • mortality_rate_vector -- rate at which infected vectors die from disease; 1/time (number >= 0)
  • recombine_in_host -- rate at which recombination occurs in host; events/time (number >= 0)
  • recombine_in_vector -- rate at which recombination occurs in vector; events/time (number >= 0)
  • num_crossover_host -- mean of a Poisson distribution modeling the number of crossover events of host recombination events (number >= 0)
  • num_crossover_vector -- mean of a Poisson distribution modeling the number of crossover events of vector recombination events (number >= 0)
  • mutate_in_host -- rate at which mutation occurs in host; events/generation (number >= 0)
  • mutate_in_vector -- rate at which mutation occurs in vector; events/generation (number >= 0)
  • death_rate_host -- natural host death rate; 1/time (number >= 0)
  • death_rate_vector -- natural vector death rate; 1/time (number >= 0)
  • birth_rate_host -- infected host birth rate; 1/time (number >= 0)
  • birth_rate_vector -- infected vector birth rate; 1/time (number >= 0)
  • vertical_transmission_host -- probability that a host is infected by its parent at birth (number 0-1)
  • vertical_transmission_vector -- probability that a vector is infected by its parent at birth (number 0-1)
  • inherit_protection_host -- probability that a host inherits all protection sequences from its parent (number 0-1)
  • inherit_protection_vector -- probability that a vector inherits all protection sequences from its parent (number 0-1)
  • protection_upon_recovery_host -- defines indexes in genome string that define substring to be added to host protection sequences after recovery (None or array-like of length 2 with int 0-num_loci)
  • protection_upon_recovery_vector -- defines indexes in genome string that define substring to be added to vector protection sequences after recovery (None or array-like of length 2 with int 0-num_loci)

saveSetup

saveSetup(setup_id, save_to_file)

Saves Setup parameters to given file location as a CSV file. Functions (e.g. fitness functions) cannot be saved in this format.

Arguments:

  • setup_id -- name of setup used as a key in setups dictionary
  • save_to_file -- file path and name to save parameters under (String)

loadSetup

loadSetup(setup_id, file, preset=None)

Loads Setup parameters from CSV file at given location.

Arguments:

  • setup_id -- name of setup to be used as a key in setups dictionary
  • file -- file path to CSV file with parameters (String)

Keyword arguments:

  • preset -- if using preset parameters, 'host-host' or 'vector-borne' (String, default None)
  • **kwargs -- setup parameters and values

newIntervention

newIntervention(time, method_name, args)

Create a new intervention to be carried out at a specific time.

Arguments:

  • time -- time at which intervention will take place (number)
  • method_name -- intervention to be carried out, must correspond to the name of a method of the Model object (String)
  • args -- contains arguments for function in positinal order (array-like)

addCustomConditionTracker

addCustomConditionTracker(condition_id, trackerFunction)

Add a function to track occurrences of custom events in simulation.

Adds function trackerFunction to dictionary custom_condition_trackers under key condition_id. Function trackerFunction will be executed at every event in the simulation. Every time True is returned, the simulation time will be stored under the corresponding condition_id key inside global_trackers['custom_condition']

Arguments:

  • condition_id -- ID of this specific condition (String)
  • trackerFunction -- function that take a Model object as argument and returns True or False; (Function)

run

run(
t0,tf,method='approximated',time_sampling=0,host_sampling=0,vector_sampling=0,
skip_uninfected=False)

Simulate model for a specified time between two time points.

Simulates a time series using the Gillespie algorithm.

Saves a dictionary containing model state history, with keys=times and values=Model objects with model snapshot at that time point under this model's history attribute.

Arguments:

  • t0 -- initial time point to start simulation at (number)
  • tf -- initial time point to end simulation at (number)

Keyword arguments:

  • method -- algorithm to be used; default is approximated solver ('approximated' or 'exact'; default 'approximated')
  • dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
  • dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
  • time_sampling -- how many events to skip before saving a snapshot of the system state (saves all by default), if <0, saves only final state (int, default 0)
  • host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int, default 0)
  • vector_sampling -- how many vectors to skip before saving one in a
    snapshot of the system state (saves all by default) (int, default 0)
  • skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)

runReplicates

runReplicates(
t0,tf,replicates,method='approximated',host_sampling=0,vector_sampling=0,
skip_uninfected=False,**kwargs)

Simulate replicates of a model, save only end results.

Simulates replicates of a time series using a variation of the exact Gillespie algorithm (can also use the tau-leaping method).

Saves a dictionary containing model end state state, with keys=times and values=Model objects with model snapshot. The time is the final timepoint.

Arguments:

  • t0 -- initial time point to start simulation at (number >= 0)
  • tf -- initial time point to end simulation at (number >= 0)
  • replicates -- how many replicates to simulate (int >= 1)

Keyword arguments:

  • method -- algorithm to be used; default is approximated solver ('approximated' or 'exact'; default 'approximated')
  • dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
  • dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
  • host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
  • **kwargs -- additional arguents for joblib multiprocessing

Returns: List of Model objects with the final snapshots

runParamSweep

runParamSweep(
t0,tf,setup_id,
param_sweep_dic={},pop_ids_param_sweep=[],
host_population_size_sweep={}, vector_population_size_sweep={},
host_migration_sweep_dic={}, vector_migration_sweep_dic={},
host_host_population_contact_sweep_dic={},
host_vector_population_contact_sweep_dic={},
vector_host_population_contact_sweep_dic={},
replicates=1,method='approximated',host_sampling=0,vector_sampling=0,
skip_uninfected=False,n_cores=0,
**kwargs)

Simulate a parameter sweep with a model, save only end results.

Simulates replicates of a time series using a variation of the exact Gillespie algorithm (can also use the tau-leaping method).

Saves a dictionary containing model end state state, with keys=times and values=Model objects with model snapshot. The time is the final timepoint.

Arguments:

  • t0 -- initial time point to start simulation at (number >= 0)
  • tf -- initial time point to end simulation at (number >= 0)
  • setup_id -- ID of setup to be assigned (String)

Keyword Arguments:

  • param_sweep_dic -- dictionary with keys=parameter names (attributes of Setup), values=list of values for parameter (list, class of elements depends on parameter)
  • host_population_size_sweep -- dictionary with keys=population IDs (Strings), values=list of values with host population sizes (must be greater than original size set for each population, list of numbers)
  • vector_population_size_sweep -- dictionary with keys=population IDs (Strings), values=list of values with vector population sizes (must be greater than original size set for each population, list of numbers)
  • host_migration_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
  • vector_migration_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
  • host_host_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
  • host_vector_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
  • vector_host_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
  • replicates -- how many replicates to simulate (int >= 1)
  • method -- algorithm to be used; default is approximated solver (can be either 'approximated' or 'exact')
  • dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
  • dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
  • host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
  • n_cores -- number of cores to parallelize file export across, if 0, all cores available are used (default 0; int >= 0)
  • **kwargs -- additional arguents for joblib multiprocessing

Returns:

  • DataFrame with parameter combinations, list of Model objects with the final snapshots

copyState

copyState(host_sampling=0,vector_sampling=0,skip_uninfected=False)

Returns a slimmed-down representation of the current model state.

Keyword arguments:

  • host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
  • skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)

Returns: Model object with current population host and vector lists.

deepCopy

deepCopy()

Returns a full copy of the current model with inner references.

Returns: copied Model object

saveToDataFrame

saveToDataFrame(save_to_file,n_cores=0,**kwargs)

Save status of model to dataframe, write to file location given.

Creates a pandas Dataframe in long format with the given model history, with one host or vector per simulation time in each row, and columns:

  • Time - simulation time of entry
  • Population - ID of this host/vector's population
  • Organism - host/vector
  • ID - ID of host/vector
  • Pathogens - all genomes present in this host/vector separated by ;
  • Protection - all genomes present in this host/vector separated by ;
  • Alive - whether host/vector is alive at this time, True/False

Arguments:

  • save_to_file -- file path and name to save model data under (String)

Keyword Arguments:

  • n_cores -- number of cores to parallelize file export across, if 0, all cores available are used (default 0; int)
  • **kwargs -- additional arguents for joblib multiprocessing

Returns:

  • pandas dataframe with model history as described above

getCompositionData

getCompositionData(
    data=None, populations=[], type_of_composition='Pathogens',
    hosts=True, vectors=False, num_top_sequences=-1,
    track_specific_sequences=[], genomic_positions=[],
    count_individuals_based_on_model=None, save_data_to_file="", n_cores=0,
    **kwargs)

Create dataframe with counts for pathogen genomes or resistance.

Creates a pandas Dataframe with dynamics of the pathogen strains or protection sequences across selected populations in the model, with one time point in each row and columns for pathogen genomes or protection sequences.

Of note: sum of totals for all sequences in one time point does not necessarily equal the number of infected hosts and/or vectors, given multiple infections in the same host/vector are counted separately.

Keyword Arguments:

  • data -- dataframe with model history as produced by saveToDf function; if None, computes this dataframe and saves it under 'raw_data_'+save_data_to_file (DataFrame, default None)
  • populations -- IDs of populations to include in analysis; if empty, uses all populations in model (default empty list; list of Strings)
  • type_of_composition -- field of data to count totals of, can be either 'Pathogens' or 'Protection' (default 'Pathogens'; String)
  • hosts -- whether to count hosts (default True, Boolean)
  • vectors -- whether to count vectors (default False, Boolean)
  • num_top_sequences -- how many sequences to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all genomes in model (default -1; int)
  • track_specific_sequences -- contains specific sequences to have as a separate column if not part of the top num_top_sequences sequences (default empty list; list of Strings)
  • genomic_positions -- list in which each element is a list with loci positions to extract (e.g. genomic_positions=[ [0,3], [5,6] ] extracts positions 0, 1, 2, and 5 from each genome); if empty, takes full genomes(default empty list; list of lists of int)
  • count_individuals_based_on_model -- Model object with populations and fitness functions used to evaluate the most fit pathogen genome in each host/vector in order to count only a single pathogen per host/vector, asopposed to all pathogens within each host/vector; if None, counts all pathogens (default None; None or Model)
  • save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
  • n_cores -- number of cores to parallelize processing across, if 0, all cores available are used (default 0; int)
  • **kwargs -- additional arguents for joblib multiprocessing

Returns:

  • pandas dataframe with model sequence composition dynamics as described above

getPathogens

getPathogens(dat, save_to_file="")

Create Dataframe with counts for all pathogen genomes in data.

Returns sorted pandas Dataframe with counts for occurrences of all pathogen genomes in data passed.

Arguments:

  • data -- dataframe with model history as produced by saveToDf function

Keyword Arguments:

  • save_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)

Returns:

  • pandas dataframe with Series as described above

getProtections

getProtections(dat, save_to_file="")

Create Dataframe with counts for all protection sequences in data.

Returns sorted pandas Dataframe with counts for occurrences of all protection sequences in data passed.

Arguments:

  • data -- dataframe with model history as produced by saveToDf function

Keyword Arguments:

  • save_to_file -- file path and name to save model data under, no saving
  • occurs if empty string (default ''; String)

Returns:

  • pandas dataframe with Series as described above

populationsPlot

populationsPlot(
file_name, data, compartment='Infected',
hosts=True, vectors=False, num_top_populations=7,
track_specific_populations=[], save_data_to_file="",
x_label='Time', y_label='Hosts', figsize=(8, 4), dpi=200,
palette=CB_PALETTE, stacked=False)

Create plot with aggregated totals per population across time.

Creates a line or stacked line plot with dynamics of a compartment across populations in the model, with one line for each population.

A host or vector is considered part of the recovered compartment if it has protection sequences of any kind and is not infected.

Arguments:

  • file_name -- file path, name, and extension to save plot under (String)
  • data -- dataframe with model history as produced by saveToDf function (DataFrame)

Keyword Arguments:

  • compartment -- subset of hosts/vectors to count totals of, can be either 'Naive','Infected','Recovered', or 'Dead' (default 'Infected'; String)
  • hosts -- whether to count hosts (default True, Boolean)
  • vectors -- whether to count vectors (default False, Boolean)
  • num_top_populations -- how many populations to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all populations in model (default 7; int)
  • track_specific_populations -- contains IDs of specific populations to have as a separate column if not part of the top num_top_populations populations (list of Strings)
  • save_data_to_file -- file path and name to save model plot data under, no saving occurs if empty string (default ''; String)
  • x_label -- X axis title (default 'Time', String)
  • y_label -- Y axis title (default 'Hosts', String)
  • legend_title -- legend title (default 'Population', String)
  • legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
  • figsize -- dimensions of figure (default (8,4), array-like of two ints)
  • dpi -- figure resolution (default 200, int)
  • palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
  • stacked -- whether to draw a regular line plot or a stacked one (default False, Boolean)

Returns:

  • axis object for plot with model population dynamics as described above

compartmentPlot

compartmentPlot(
file_name, data, populations=[], hosts=True, vectors=False,
save_data_to_file="", x_label='Time', y_label='Hosts',
figsize=(8, 4), dpi=200, palette=CB_PALETTE, stacked=False)

Create plot with number of naive, infected, recovered, dead hosts/vectors vs. time.

Creates a line or stacked line plot with dynamics of all compartments (naive, infected, recovered, dead) across selected populations in the model, with one line for each compartment.

A host or vector is considered recovered if it has protection sequences of any kind and is not infected.

Arguments:

  • file_name -- file path, name, and extension to save plot under (String)
  • data -- dataframe with model history as produced by saveToDf function (DataFrame)

Keyword Arguments:

  • populations -- IDs of populations to include in analysis; if empty, uses
  • all populations in model (default empty list; list of Strings)
  • hosts -- whether to count hosts (default True, Boolean)
  • vectors -- whether to count vectors (default False, Boolean)
  • save_data_to_file -- file path and name to save model data under, no saving
  • occurs if empty string (default ''; String)
  • x_label -- X axis title (default 'Time', String)
  • y_label -- Y axis title (default 'Hosts', String)
  • legend_title -- legend title (default 'Population', String)
  • legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
  • figsize -- dimensions of figure (default (8,4), array-like of two ints)
  • dpi -- figure resolution (default 200, int)
  • palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
  • stacked -- whether to draw a regular line plot or a stacked one (default False, Boolean)

Returns:

  • axis object for plot with model compartment dynamics as described above

compositionPlot

compositionPlot(
file_name, data, composition_dataframe=None, populations=[],
type_of_composition='Pathogens', hosts=True, vectors=False,
num_top_sequences=7, track_specific_sequences=[],
genomic_positions=[], count_individuals_based_on_model=None,
remove_legend=False, population_fraction=False,
save_data_to_file="", x_label='Time', y_label='Infections',
legend_title='Genotype', legend_values=[],
figsize=(8, 4), dpi=200, palette=CB_PALETTE, stacked=True,
**kwargs)

Create plot with counts for pathogen genomes or resistance vs. time.

Creates a line or stacked line plot with dynamics of the pathogen strains or protection sequences across selected populations in the model, with one line for each pathogen genome or protection sequence being shown.s

Of note: sum of totals for all sequences in one time point does not necessarily equal the number of infected hosts and/or vectors, given multiple infections in the same host/vector are counted separately.

Arguments:

  • file_name -- file path, name, and extension to save plot under (String)
  • data -- dataframe with model history as produced by saveToDf function

Keyword Arguments:

  • composition_dataframe -- output of compositionDf() if already computed (Pandas DataFrame, None by default)
  • populations -- IDs of populations to include in analysis; if empty, uses all populations in model (default empty list; list of Strings)
  • type_of_composition -- field of data to count totals of, can be either 'Pathogens' or 'Protection' (default 'Pathogens'; String)
  • hosts -- whether to count hosts (default True, Boolean)
  • vectors -- whether to count vectors (default False, Boolean)
  • num_top_sequences -- how many sequences to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all genomes in model (default 7; int)
  • track_specific_sequences -- contains specific sequences to have as a separate column if not part of the top num_top_sequences sequences (list of Strings)
  • genomic_positions -- list in which each element is a list with loci positions to extract (e.g. genomic_positions=[ [0,3], [5,6] ] extracts positions 0, 1, 2, and 5 from each genome); if empty, takes full genomes (default empty list; list of lists of int)
  • count_individuals_based_on_model -- Model object with populations and fitness functions used to evaluate the most fit pathogen genome in each host/vector in order to count only a single pathogen per host/vector, as opposed to all pathogens within each host/vector; if None, counts all pathogens (default None; None or Model)
  • save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
  • x_label -- X axis title (default 'Time', String)
  • y_label -- Y axis title (default 'Hosts', String)
  • legend_title -- legend title (default 'Population', String)
  • legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
  • figsize -- dimensions of figure (default (8,4), array-like of two ints)
  • dpi -- figure resolution (default 200, int)
  • palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
  • stacked -- whether to draw a regular line plot instead of a stacked one (default False, Boolean).
  • remove_legend -- whether to print the sequences on the figure legend instead of printing them on a separate csv file (default True; Boolean)
  • population_fraction -- whether to graph fractions of pathogen population instead of pathogen counts (default False, Boolean)
  • **kwargs -- additional arguents for joblib multiprocessing

Returns:

  • axis object for plot with model sequence composition dynamics as described

clustermap

clustermap(file_name, data, num_top_sequences=-1,
track_specific_sequences=[], seq_names=[], n_cores=0, method='weighted',
metric='euclidean',save_data_to_file="", legend_title='Distance',
legend_values=[], figsize=(10,10), dpi=200, color_map=DEF_CMAP)

Create a heatmap and dendrogram for pathogen genomes in data passed.

Arguments:

  • file_name -- file path, name, and extension to save plot under (String)
  • data -- dataframe with model history as produced by saveToDf function

Keyword arguments:

  • num_top_sequences -- how many sequences to include in matrix; if <0, includes all genomes in data passed (default -1; int)
  • track_specific_sequences -- contains specific sequences to include in matrixif not part of the top num_top_sequences sequences (default empty list; list of Strings)
  • seq_names -- list with names to be used for sequence labels in matrix must be of same length as number of sequences to be displayed; if empty, uses sequences themselves (default empty list; list of Strings)
  • n_cores -- number of cores to parallelize distance compute across, if 0, all cores available are used (default 0; int)
  • method -- clustering algorithm to use with seaborn clustermap (default 'weighted'; String)
  • metric -- distance metric to use with seaborn clustermap (default 'euclidean'; String)
  • save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
  • legend_title -- legend title (default 'Distance', String)
  • figsize -- dimensions of figure (default (8,4), array-like of two ints)
  • dpi -- figure resolution (default 200, int)
  • color_map -- color map to use for traces (default DEF_CMAP, cmap object)

Returns:

  • figure object for plot with heatmap and dendrogram as described

pathogenDistanceHistory

pathogenDistanceHistory(data, samples=-1, num_top_sequences=-1,
track_specific_sequences=[], seq_names=[], n_cores=0, save_to_file='')

Create a long-format dataframe with pairwise distances for pathogen genomes in data passed for different time points.

Arguments: data -- dataframe with model history as produced by saveToDf function

Keyword Arguments:

  • samples -- how many timepoints to uniformly sample from the total timecourse; if <0, takes all timepoints (default -1; int)
  • num_top_sequences -- how many sequences to include in matrix; if <0, includes all genomes in data passed (default -1; int)
  • track_specific_sequences -- contains specific sequences to include in matrixif not part of the top num_top_sequences sequences (default empty list; list of Strings)
  • seq_names -- list with names to be used for sequence labels in matrix must be of same length as number of sequences to be displayed; if empty, uses sequences themselves (default empty list; list of Strings)
  • n_cores -- number of cores to parallelize distance compute across, if 0, all cores available are used (default 0; int)
  • method -- clustering algorithm to use with seaborn clustermap (default 'weighted'; String)
  • metric -- distance metric to use with seaborn clustermap (default 'euclidean'; String)
  • save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)

Returns:

  • long-format Pandas dataframe with pairwise distances for pathogen genomes in data passed for different time points.

getGenomeTimes

getGenomeTimes(
      data, samples=-1, num_top_sequences=-1, track_specific_sequences=[],
      seq_names=[], n_cores=0, save_to_file='')

Create DataFrame with times genomes first appeared during simulation.

Arguments:

  • data -- dataframe with model history as produced by saveToDf function

Keyword arguments:

  • samples -- how many timepoints to uniformly sample from the total timecourse; if <0, takes all timepoints (default 1; int)
  • save_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
  • n_cores -- number of cores to parallelize across, if 0, all cores available are used (default 0; int)

Returns:

  • pandas dataframe with genomes and times as described above

visualizeMutationNetwork

visualizeMutationNetwork(setup_id,landscape_id,file_name,
      toggle_physics=True,
      node_color='rgba(215,140,10,1)',
      peak_border_color='rgba(150,100,10,1)',
      edge_color='rgba(215,190,150,1)', show_labels=True)

Create a network visualization for pathogen genomes in landscape

Arguments:

  • setup_id -- ID of setup with associated parameters (String)
  • landscape_id -- ID of landscape (String)
  • file_name -- file path and name to save html graph under (String)
  • toggle_physics -- whether graph moves (Boolean)
  • node_color -- node color (String)
  • peak_border_color -- color of borders on peak nodes (String)
  • edge_color -- edge color (String)
  • show_labels -- whether to show genomes on nodes (Boolean)

newPopulation

newPopulation(id, setup_id, num_hosts=100, num_vectors=100)

Create a new Population object with setup parameters.

If population ID is already in use, appends _2 to it

Arguments:

  • id -- unique identifier for this population in the model (String)
  • setup_id -- setup object with parameters for this population (Setup)

Keyword Arguments:

  • num_hosts -- number of hosts to initialize population with (default 100; int)
  • num_vectors -- number of hosts to initialize population with (default 100; int)

linkPopulationsHostMigration

linkPopulationsHostMigration(pop1_id, pop2_id, rate)

Set host migration rate from one population towards another.

Arguments:

  • pop1_id -- origin population for which migration rate will be specified (String)
  • pop1_id -- destination population for which migration rate will be specified (String)
  • rate -- migration rate from one population to the neighbor; events/time (number >= 0)

linkPopulationsVectorMigration

linkPopulationsVectorMigration(pop1_id, pop2_id, rate)

Set vector migration rate from one population towards another.

Arguments:

  • pop1_id -- origin population for which migration rate will be specified (String)
  • pop1_id -- destination population for which migration rate will be specified (String)
  • rate -- migration rate from one population to the neighbor; events/time (number >= 0)

linkPopulationsHostHostContact

linkPopulationsHostHostContact(pop1_id, pop2_id, rate)

Set host-host inter-population contact rate from one population towards another.

Arguments:

  • pop1_id -- origin population for which migration rate will be specified (String)
  • pop1_id -- destination population for which migration rate will be specified (String)
  • rate -- migration rate from one population to the neighbor; events/time (number >= 0)

linkPopulationsHostVectorContact

linkPopulationsHostVectorContact(pop1_id, pop2_id, rate)

Set host-vector inter-population contact rate from one population to another.

Arguments:

  • pop1_id -- origin population for which migration rate will be specified (String)
  • pop1_id -- destination population for which migration rate will be specified (String)
  • rate -- migration rate from one population to the neighbor; events/time (number >= 0)

linkPopulationsVectorHostContact

linkPopulationsVectorHostContact(pop1_id, pop2_id, rate)

Set vector-host inter-population contact rate from one population to another.

Arguments:

  • pop1_id -- origin population for which migration rate will be specified (String)
  • pop1_id -- destination population for which migration rate will be specified (String)
  • rate -- migration rate from one population to the neighbor; events/time (number >= 0)

createInterconnectedPopulations

createInterconnectedPopulations(
      num_populations, id_prefix, setup_id,
      host_migration_rate=0, vector_migration_rate=0,
      host_host_contact_rate=0,
      host_vector_contact_rate=0, vector_host_contact_rate=0,
      num_hosts=100, num_vectors=100)

Create new populations, link all of them to each other.

All populations in this cluster are linked with the same migration rate, starting number of hosts and vectors, and setup parameters. Their IDs are numbered onto prefix given as 'id_prefix_0', 'id_prefix_1', 'id_prefix_2', etc.

Arguments:

  • num_populations -- number of populations to be created (int)
  • id_prefix -- prefix for IDs to be used for this population in the model, (String)
  • setup_id -- setup object with parameters for all populations (Setup)

Keyword arguments:

  • host_migration_rate -- host migration rate between populations; events/time (default 0; number >= 0)
  • vector_migration_rate -- vector migration rate between populations; events/time (default 0; number >= 0)
  • host_host_contact_rate -- host-host inter-population contact rate between populations; events/time (default 0; number >= 0)
  • host_vector_contact_rate -- host-vector inter-population contact rate between populations; events/time (default 0; number >= 0)
  • vector_host_contact_rate -- vector-host inter-population contact rate between populations; events/time (default 0; number >= 0)
  • num_hosts -- number of hosts to initialize population with (default 100; int)
  • num_vectors -- number of hosts to initialize population with (default 100; int)

newHostGroup

newHostGroup(pop_id, group_id, num_hosts, healthy=False)

Return a list of random (healthy or any) hosts in population.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • group_id -- ID to call this group by (String)
  • num_vectors -- number of vectors to be sampled randomly (int)

Keyword Arguments:

  • healthy -- whether to sample healthy hosts only (default True; Boolean)

Returns:

  • list containing sampled hosts

newVectorGroup

newVectorGroup(pop_id, group_id, num_vectors, healthy=False)

Return a list of random (healthy or any) vectors in population.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • group_id -- ID to call this group by (String)
  • num_vectors -- number of vectors to be sampled randomly (int)

Keyword Arguments:

  • healthy -- whether to sample healthy vectors only (default True; Boolean)

Returns:

  • list containing sampled vectors

addHosts

addHosts(pop_id, num_hosts)

Add a number of healthy hosts to population, return list with them.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • num_hosts -- number of hosts to be added (int)

Returns:

  • list containing new hosts

addVectors

addVectors(pop_id, num_vectors)

Add a number of healthy vectors to population, return list with them.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • num_vectors -- number of vectors to be added (int)

Returns:

  • list containing new vectors

removeHosts

removeHosts(pop_id, num_hosts_or_list)

Remove a number of specified or random hosts from population.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • num_hosts_or_list -- number of hosts to be sampled randomly for removal or list of hosts to be removed, must be hosts in this population (int or list of Hosts)

removeVectors

removeVectors(pop_id, num_vectors_or_list)

Remove a number of specified or random vectors from population.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • num_vectors_or_list -- number of vectors to be sampled randomly for removal or list of vectors to be removed, must be vectors in this population (int or list of Vectors)

addPathogensToHosts

addPathogensToHosts(pop_id, genomes_numbers, group_id="")

Add specified pathogens to random hosts, optionally from a list.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • genomes_numbers -- dictionary containing pathogen genomes to add as keys and number of hosts each one will be added to as values (dict with keys=Strings, values=int)

Keyword Arguments:

  • group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)

addPathogensToVectors

addPathogensToVectors(pop_id, genomes_numbers, group_id="")

Add specified pathogens to random vectors, optionally from a list.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • genomes_numbers -- dictionary containing pathogen genomes to add as keys and number of vectors each one will be added to as values (dict with keys=Strings, values=int)

Keyword Arguments:

  • group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)

treatHosts

treatHosts(pop_id, frac_hosts, resistance_seqs, group_id="")

Treat random fraction of infected hosts against some infection.

Removes all infections with genotypes susceptible to given treatment. Pathogens are removed if they are missing at least one of the sequences in resistance_seqs from their genome. Removes this organism from population infected list and adds to healthy list if appropriate.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • frac_hosts -- fraction of hosts considered to be randomly selected (number between 0 and 1)
  • resistance_seqs -- contains sequences required for treatment resistance (list of Strings)

Keyword Arguments:

  • group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)

treatVectors

treatVectors(pop_id, frac_vectors, resistance_seqs, group_id="")

Treat random fraction of infected vectors against some infection.

Removes all infections with genotypes susceptible to given treatment. Pathogens are removed if they are missing at least one of the sequences in resistance_seqs from their genome. Removes this organism from population infected list and adds to healthy list if appropriate.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • frac_vectors -- fraction of vectors considered to be randomly selected (number between 0 and 1)
  • resistance_seqs -- contains sequences required for treatment resistance (list of Strings)

Keyword Arguments:

  • group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)

protectHosts

protectHosts(pop_id, frac_hosts, protection_sequence, group_id="")

Protect a random fraction of infected hosts against some infection.

Adds protection sequence specified to a random fraction of the hosts specified. Does not cure them if they are already infected.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • frac_hosts -- fraction of hosts considered to be randomly selected (number between 0 and 1)
  • protection_sequence -- sequence against which to protect (String)

Keyword Arguments:

  • group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)

protectVectors

protectVectors(pop_id, frac_vectors, protection_sequence, group_id="")

Protect a random fraction of infected vectors against some infection.

Adds protection sequence specified to a random fraction of the vectors specified. Does not cure them if they are already infected.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • frac_vectors -- fraction of vectors considered to be randomly selected (number between 0 and 1)
  • protection_sequence -- sequence against which to protect (String)

Keyword Arguments:

  • group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)

wipeProtectionHosts

wipeProtectionHosts(pop_id, group_id="")

Removes all protection sequences from hosts.

Arguments:

  • pop_id -- ID of population to be modified (String)

Keyword Arguments:

  • group_id -- ID of group to sample hosts to sample from, if empty, takes whole population (default empty String; String)

wipeProtectionVectors

wipeProtectionVectors(pop_id, group_id="")

Removes all protection sequences from vectors.

Arguments:

  • pop_id -- ID of population to be modified (String)

Keyword Arguments:

  • group_id -- ID of group to sample vectors to sample from, if empty, takes whole population (default empty String; String)

setSetup

setSetup(pop_id, setup_id)

Assign parameters stored in Setup object to this population.

Arguments:

  • pop_id -- ID of population to be modified (String)
  • setup_id -- ID of setup to be assigned (String)

newLandscape

newLandscape(setup_id, landscape_id, fitnessFunc=None,
          mutate=None, generation_time=None,
          population_threshold=None, selection_threshold=None,
          max_depth=None, allele_groups=None)

Create a new Landscape

Arguments:

  • setup_id -- ID of setup with associated parameters (String)
  • landscape_id -- ID of landscape (String)

Keyword arguments:

  • fitnessFunc -- fitness function used to evaluate genomes (function taking a genome for argument and returning a fitness value >0, default None)
  • mutate -- mutation rate per generation (number>0, default None)
  • generation_time -- time between pathogen generations (number>0, default None)
  • population_threshold -- pathogen threshold under which drift is assumed to dominate (number >1, default None)
  • selection_threshold -- selection coefficient threshold under which drift is assumed to dominate; related to population_threshold (number >1, default None)
  • max_depth -- max number of mutations considered when evaluating establishment rates (integer >0, default None)
  • allele_groups -- relevant alleles affecting fitness, each element contains a list of strings, each string contains a group of alleles that all have equivalent fitness behavior (list of lists of Strings)

mapLandscape

mapLandscape(setup_id,landscape_id,seed_genomes)

Maps and evaluates relevant mutations given object parameters

Saves result in landscape's mutation_network property.

Arguments:

  • setup_id -- ID of setup with associated parameters (String)
  • landscape_id -- ID of landscape (String)
  • seed_genomes -- genome or list of genomes used as background for mutations (String or list of Strings)

saveLandscape

saveLandscape(setup_id,landscape_id,save_to_file)

Saves mutation network and fitness values stored in landscape

CSV format has the following columns:

  • Genome: reduced genome
  • Neighbors: list of neighboring reduced genomes, separated by semicolons
  • Rates: list of corresponding establishment rates for neighbors, separated by semicolons
  • Sum_rates: number with sum of all rates in previous list

Arguments:

  • setup_id -- ID of setup with associated parameters (String)
  • landscape_id -- ID of landscape (String)
  • save_to_file -- file path and name to save model data under (String)

loadLandscape

loadLandscape(setup_id,landscape_id,file)

Loads mutation network and fitness from file path

CSV format has the following columns:

  • Genome: reduced genome
  • Neighbors: list of neighboring reduced genomes, separated by semicolons
  • Rates: list of corresponding establishment rates for neighbors, separated by semicolons
  • Sum_rates: number with sum of all rates in previous list

Arguments:

  • file -- file path and name to save model data under (String)

customModelFunction

customModelFunction(function)

Returns output of given function, passing this model as a parameter.

Arguments:

  • function -- function to be evaluated; must take a Model object as the only parameter (function)

Returns:

  • Output of function passed as parameter

peakLandscape

peakLandscape(genome, peak_genome, min_value)

Evaluate a genome's numerical phenotype by decreasing with distance from optimal seq.

Originally meant as a purifying selection fitness function based on exponential decay of fitness as genomes move away from the optimal sequence. Distance is measured as percent Hamming distance from an optimal genome sequence.

Can be used to evaluate mortality as well as transmissibility.

Arguments:

  • genome -- the genome to be evaluated (String)
  • peak_genome -- the genome sequence to measure distance against, has value of 1 (String)
  • min_value -- minimum value at maximum distance from optimal genome (number > 0)

Returns:

  • value of genome (number)

valleyLandscape

valleyLandscape(genome, worst_genome, min_fitness)

Evaluate a genome's numerical phenotype by increasing with distance from worst seq.

Originally meant as a disruptive selection fitness function based on exponential decay of fitness as genomes move closer to the worst possible sequence. Distance is measured as percent Hamming distance from the worst possible genome sequence.

Can be used to evaluate mortality as well as transmissibility.

Arguments:

  • genome -- the genome to be evaluated (String)
  • valley_genome -- the genome sequence to measure distance against, has value of min_value (String)
  • min_value -- fitness value of worst possible genome (number > 0)

Returns:

  • value of genome (number)

About

Epidemiological modeling framework for pathogen population genetics and evolution.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published