-
Notifications
You must be signed in to change notification settings - Fork 37
Distribution of genomes
The distribution will reflect the relative amount genomes will have in a sample. A log normal distribution is always used to drawn genome abundances for the first sample. The configuration file of the pipeline has five options that influence the distribution.
- log_mu is the mean used for a log normal distribution.
- log_sigma influences the deviation to the mean of a log normal distribution.
- gauss_mu is the mean used for a normal distribution.
- gauss_sigma influences the deviation to the mean of a normal distribution.
- mode is only important if more than one sample is simulated.
There are four possible options for mode:
'differential', 'replicates', 'timeseries_normal', 'timeseries_lognormal'
With a mean and a standard deviation given, a distribution for each sample is independently drawn from a log normal distribution.
Using the values of the initial log normal distribution as basis, each consecutive sample will have a gauss distribution added to this initial distribution.
The initial samples distribution is drawn from a log normal distribution. Using the distribution of the previous sample, a gauss distribution is added to each consecutive sample.
With a mean and a standard deviation given, initial samples distribution is drawn from a log normal distribution. Using the distribution of the previous sample, a log normal distribution is added and the sum divided by 2 for each consecutive sample. The division by two is done to archive a smoother transition from one sample to another.
Since the initial values are drawn from log normal, there are many genomes with a low abundance. There is a chance, if a gauss distribution is added, that a genome will have a negative abundance. This would mean many genomes die off over time and the last sample in a time series using gauss to have far fewer genomes. This is why extinction is prevented by setting it to a low number close to zero if a abundance turns negative.