-
Notifications
You must be signed in to change notification settings - Fork 37
Strain Simulation
If more genomes are requested than are available, artificial strains will be generated using sgEvolver. In the configuration file every community has the options genomes_total and genomes_real. The difference of both is the number of artificial strains that will be generated.
The aim is to have up to ~10 artificial strains of a genome, geometrically drawn (p=0.3).
This is done for random genomes, until the given maximum number of genomes of a sample is reached.
In a rare case that more than a 9 is drawn, it will be lowered to 9.
This way a total number of strains can be 10, the original strain included.
To generate artificial strains the tool sgEvolver developed by Aaron Darling is used.
For any given strain, artificial strains are simulated based on a distance tree in newick format.
The file given in the strain_simulation_template option of the configuration file. The number of leaves determine the number of strains simulated. The distance to the root determines how strongly a artificial strain will be evolved. The default template simulates 40 artificial strains for each given strain.
A gff formatted file with annotated genes is required so genes regions can be handled different from other regions. For the strain simulation to work, the genome_to_id.tsv
and genome_to_gff.tsv
files have to contain absolute paths to the genomes and gff files.
genomes_total=75
genomes_real=30
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2,
3, 3, 3, 3,
4, 4,
5, 5,
6,
7, 7]
In this example a total of 30 genomes would be used. 17 of them will have simulated strains made from them.
The 13 '1' means, that 13 will have no strains simulated.