-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empirical Homoresidue shows unexpected extreme dN/dS values. #1
Comments
These are some entries from the results showing extreme values of omega for the
|
Show below is a fragment from an alignment of deduplicated sequences sampled from fit=0.01025 repetition=3 generation=20000 that has an omega value of 56.5. As expected from the homoresidue nature of the starting population, almost all AA positions are Lysine (Lys/K). non-synonymous substitutions are infrequently scattered across the genome. What's really strange is that all the nucleotide substitutions are A->C |
The problem is SANTA is always generating A -> C substitutions and no other. I think this is a bug. BackgroundNucleotide mutators in SANTA can be configured with a rate bias matrix to influence the selection of substitution mutations based on the existing nucleotide, e.g.
However for the configuration used for this dN/dS project, the nucleotide mutator is configured without a
When this form of configuration is used without a rate bias matrix, the probability of a transversion, stored in That is why we only see Adenine to Cytosine substitutions. |
After modifying the config file to generate a full spectrum of mutations (and modifying the default behavior in SANTA), the analysis of dN/dS rations in a simulated population still shows anomalies. The first is the extreme running time of Running time AnomaliesThe most extreme runtime was created from a single sample,
This sample also produced an extreme dN/dS ration of This sample appears to have a normal number of unique sequences, about the same as other samples.
The codon usage within the sample also looks consistent with other samples. Extreme dN/dS valuesBe very suspicious of dn/dS values of
Next StepRerun the simulation with an amino acid that has more than 2 possible codons. |
Reconfigured the simulation to start with all Arginine and to favor Arginine at a To encourage retention of synonymous mutations, we start off the population with genomes containing only one of the 4-way synonymous codons for Arginine, CGT. The fitness function applied across all 156 sites on the genome favors any of the six codons that code for Arginine, while all the others are considered uniformly less favored.
Results of dN/dS calculationCodeml did not exhibit the extreme long running times seen with the previous configuration, but a majority of dN/dS values are still extreme/failing.
It seems highly unusual that all the samples collected at 5000 generations exhibit meaningles dN/dS ratios but none of the samples collected at other time points do. Run more fitness values (aside from just fit_0.0105) and make some plots of these values for comparison to the previous configuration. |
Does anything jump out just starting at the extreme-value alignments? |
Adding link to plot of codon usage vs. ML dN/dS ratio as calculated by codeml for homoresidue empirical fitness constraint . Re-running simulation with a 100x higher mutation rate to see how that affects statistics. Simulation is running - this will take several hours to complete.
Still running as of 1:45pm. Looks like it is working on some of the high mutation rate samples, which is good because this are the ones I want to see. |
Bumping the mutation rate had the effect of increasing the number of unique sequences in the samples taken at 5000, and 20000 generations. Now there are ~100 unique sequences, and I'm going to put a filter step in to cut down the number of unique sequences sent to codeml. Re-running now Tue Oct 11 16:53:44 PDT 2016 fixed a bug in the non-selective santa sim; running to completion now |
This is a new interactive plot of dnds that has some graphical issues and I realized it is based on bogus simulations. ( didn't have the homoresidue sequence initialzed correctly. ) This is just for reference of what an interactive dnds plot would look like. Mostly this was done to get familiar with plotly. New simulations are running now... |
Updated plot of dnds under two different mutation rates can be found at, |
The Empirical Homoresidue model (scatter column furthest right) for high- and middle-fitness values exhibits some unexpectedly extreme dN/dS values. Check out the sampled sequences from those points to understand why these occur.
The text was updated successfully, but these errors were encountered: