Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uppercase option #14

Open
hyphaltip opened this issue Oct 5, 2023 · 1 comment
Open

uppercase option #14

hyphaltip opened this issue Oct 5, 2023 · 1 comment

Comments

@hyphaltip
Copy link
Member

while the lowercase sequences in an alignment match are useful output from HMMER it doens't mean much for the phylogenetic analyses. Tools like VeryFastTree ignore lowercase so it would be better if there was an option to force uppercase sequence writing when reporting concatenated or single alignments I think. Or make uppercase force the default (common use) but allow user to turn this off.

Command: VeryFastTree -double-precision concat_alignments.aa.mfa
VeryFastTree Version 4.0.3 (OpenMP, SSE) Double precision with SSE3
Alignment: concat_alignments.aa.mfa
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
Ignored unknown character a (seen 15053031 times)
Ignored unknown character c (seen 696310 times)
Ignored unknown character d (seen 8021126 times)
Ignored unknown character e (seen 8620113 times)
Ignored unknown character f (seen 2958267 times)
Ignored unknown character g (seen 9532367 times)
Ignored unknown character h (seen 2355402 times)
Ignored unknown character i (seen 3133116 times)
Ignored unknown character k (seen 5589872 times)
Ignored unknown character l (seen 9940350 times)
Ignored unknown character m (seen 1777397 times)
Ignored unknown character n (seen 2948421 times)
Ignored unknown character p (seen 8696418 times)
Ignored unknown character q (seen 4590872 times)
Ignored unknown character r (seen 7790859 times)
Ignored unknown character s (seen 10591681 times)
Ignored unknown character t (seen 6746813 times)
Ignored unknown character v (seen 6743874 times)
Ignored unknown character w (seen 1089347 times)
Ignored unknown character y (seen 2028882 times)
@chtsai0105
Copy link
Collaborator

chtsai0105 commented Oct 5, 2023

This was being addressed by the phylotree module, all sequences are forced uppercase before sending to VeryFastTree.

for line in f.read().splitlines():
if not line.startswith(">"):
line = line.upper()

But we can definitely do that earlier in output msa results. I was just not sure if the lowercase is meaningful so I simply preserve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants