You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My lab currently uses this TN93 implementation both by itself and as part of HIVTrace. We have a need for a pure python implementation, which I've been working on. I'm comparing the output from my implementation to the output from this implementation to ensure I'm producing consistent results. As I've been looking at the results produced when using the SKIP match mode I'm seeing a difference in the values produced when dealing with a sequence that has ambiguities.
I looked deeper into the final matrix of counts, and I'm finding that the ambiguities seem to be being counted rather than skipped. I want to make sure I have the behavior correct - should the SKIP mode pass over ambiguous nucleotides the same way it passes over gaps?
I have a small example file to demonstrate what I mean (I had to add a .txt suffix, but it's a standard fasta file). The file has two 240nt sequences in it - the first has no ambiguous nucleotides, and the second has seven ambiguous nucleotides.
When I run tn93 on this file (setting the ambiguity mode to SKIP and setting the reporting threshold value to 1) I get a distance of 0.0190 and these counts:
A C G T
A 89.5 0 1 0.5
C 0 37 0.5 0.5
G 1 0.5 55.5 0
T 0 0.5 0 53.5
The counts for this sum to 240 - the full length of the sequence.
When I run the python version that I've developed I'm getting a much lower distance (0.0043) and I'm getting these counts:
A C G T
A 89 0 1 0
C 0 36 0 0
G 1 0 54 0
T 0 0 0 53
This sums to 233, which is the number I expect - the full sequence minus the seven positions where there were ambiguous nucleotides. Should there be fractional counts when SKIP is selected? Since it doesn't try any resolutions I thought the counts would all be integers.
This may have been an issue related to #24 although I am not sure. The current tn93 version, with some debugging text added yields the following results...
Thanks for getting back to me so quickly. I've had a chance to look at it, and I can get it to produce the expected value and counts.
What I was doing wrong was passing the ambiguity mode in ALL CAPS. When I tried again using the lower case then it works correctly. The results I'm seeing must be the software not recognizing the mode and using "subset" and treating the all caps word as the nucleotides to resolve.
I really appreciate the assistance, I'll go ahead and close this.
Hello,
My lab currently uses this TN93 implementation both by itself and as part of HIVTrace. We have a need for a pure python implementation, which I've been working on. I'm comparing the output from my implementation to the output from this implementation to ensure I'm producing consistent results. As I've been looking at the results produced when using the SKIP match mode I'm seeing a difference in the values produced when dealing with a sequence that has ambiguities.
I looked deeper into the final matrix of counts, and I'm finding that the ambiguities seem to be being counted rather than skipped. I want to make sure I have the behavior correct - should the SKIP mode pass over ambiguous nucleotides the same way it passes over gaps?
I have a small example file to demonstrate what I mean (I had to add a .txt suffix, but it's a standard fasta file). The file has two 240nt sequences in it - the first has no ambiguous nucleotides, and the second has seven ambiguous nucleotides.
When I run tn93 on this file (setting the ambiguity mode to SKIP and setting the reporting threshold value to 1) I get a distance of 0.0190 and these counts:
The counts for this sum to 240 - the full length of the sequence.
When I run the python version that I've developed I'm getting a much lower distance (0.0043) and I'm getting these counts:
This sums to 233, which is the number I expect - the full sequence minus the seven positions where there were ambiguous nucleotides. Should there be fractional counts when SKIP is selected? Since it doesn't try any resolutions I thought the counts would all be integers.
Thanks very much!
short_example_seqs.fasta.txt
The text was updated successfully, but these errors were encountered: