Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about results when using SKIP as the match mode #27

Closed
reagank opened this issue Sep 12, 2022 · 2 comments
Closed

Question about results when using SKIP as the match mode #27

reagank opened this issue Sep 12, 2022 · 2 comments

Comments

@reagank
Copy link

reagank commented Sep 12, 2022

Hello,

My lab currently uses this TN93 implementation both by itself and as part of HIVTrace. We have a need for a pure python implementation, which I've been working on. I'm comparing the output from my implementation to the output from this implementation to ensure I'm producing consistent results. As I've been looking at the results produced when using the SKIP match mode I'm seeing a difference in the values produced when dealing with a sequence that has ambiguities.

I looked deeper into the final matrix of counts, and I'm finding that the ambiguities seem to be being counted rather than skipped. I want to make sure I have the behavior correct - should the SKIP mode pass over ambiguous nucleotides the same way it passes over gaps?

I have a small example file to demonstrate what I mean (I had to add a .txt suffix, but it's a standard fasta file). The file has two 240nt sequences in it - the first has no ambiguous nucleotides, and the second has seven ambiguous nucleotides.

When I run tn93 on this file (setting the ambiguity mode to SKIP and setting the reporting threshold value to 1) I get a distance of 0.0190 and these counts:

       A     C     G    T
A   89.5     0      1     0.5 
C      0     37     0.5   0.5 
G      1     0.5    55.5   0 
T      0     0.5    0     53.5

The counts for this sum to 240 - the full length of the sequence.

When I run the python version that I've developed I'm getting a much lower distance (0.0043) and I'm getting these counts:

     A     C     G    T
A    89     0     1     0 
C    0     36     0     0 
G    1      0     54    0 
T    0      0     0    53

This sums to 233, which is the number I expect - the full sequence minus the seven positions where there were ambiguous nucleotides. Should there be fractional counts when SKIP is selected? Since it doesn't try any resolutions I thought the counts would all be integers.

Thanks very much!

short_example_seqs.fasta.txt

@spond
Copy link
Member

spond commented Sep 12, 2022

Dear @reagank,

This may have been an issue related to #24 although I am not sure. The current tn93 version, with some debugging text added yields the following results...

./tn93 -q -a skip ~/Downloads/short_example_seqs.fasta.txt 

(0,0) = 89
(0,1) = 0
(0,2) = 1
(0,3) = 0
(1,0) = 0
(1,1) = 36
(1,2) = 0
(1,3) = 0
(2,0) = 0
(2,1) = 0
(2,2) = 54
(2,3) = 0
(3,0) = 0
(3,1) = 0
(3,2) = 0
(3,3) = 53

which seems correct since a direct examination of character by character differences returns, so only one difference (A=>G) should be counted.

C=>S
C=>K
G=>M
A=>G
A=>W
G=>R
G=>S
T=>Y

Would you try with v1.0.9 and see if the issue is still there?

Best,
Sergei

@reagank
Copy link
Author

reagank commented Sep 13, 2022

Hi Sergei,

Thanks for getting back to me so quickly. I've had a chance to look at it, and I can get it to produce the expected value and counts.

What I was doing wrong was passing the ambiguity mode in ALL CAPS. When I tried again using the lower case then it works correctly. The results I'm seeing must be the software not recognizing the mode and using "subset" and treating the all caps word as the nucleotides to resolve.

I really appreciate the assistance, I'll go ahead and close this.

@reagank reagank closed this as completed Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants