-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid character results in wrong error message ("All sequences must have the same length") #19
Comments
Actually, yes, it seems as though the
I would suggest perhaps having a more descriptive error message (e.g. " |
Dear @niemasd, The acceptable list of characters is shown at Line 31 in e98a046
And they are indeed IUPAC based. I agree that a more descriptive error message might be in order (to suggest that the user looks at non-IUPAC letters), but the current assumption is that most FASTA files are gonna have some non-sequence characters (e.g. new lines, spaces, etc). Best, |
I have one sequence (
hCoV_19_Norway_1539_2020_EPI_ISL_417487
) thattn93
keeps thinking has one fewer characters than it actually has (or at least seems to have). I have attached a minimal working example below:example.txt
I tried to run
tn93
as follows:cat example.aln | tn93 -l 1 -t 1
But I get the following error message:
However, I tried checking it in Python (
lines[3]
is the problematic sequence):Excluding the newline character after every line (which is included in the lengths printed by the above code), each sequence has exactly 29811 characters.
The only weird character I see in the problematic sequence is
I
, which doesn't seem to be a standard IUPAC character. Thoughts?The text was updated successfully, but these errors were encountered: