Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble guessing alphabet when first proteins are low complexity #24

Open
hyphaltip opened this issue Sep 3, 2024 · 1 comment
Open
Assignees
Labels

Comments

@hyphaltip
Copy link
Member

Is there a way to force phyling/easel to know the input sequences are a particular alphabet type?

This protein set:

more ../input/Allomyces_macrogynus_ATCC_38327.proteins.fa
>F5BE82F7_000001-T1 F5BE82F7_000001
DSYDSYGYGHDDGKKGDDYGHDSYDSYGGYGHDDKKDSYGYDSYDSYGYGHDKKDDYDSYDSYDSYGYGHDKKDDYDSYD
SYDSYGYGYDKKDDYDSYNSYDSYGYGHDDKKDDYYGHDSYDSYGYGYDGKDDYYGHDSYDSYGYGYDGKDDYYGHDSYD
SYGYGYDNKDDYDSYDSYDSYGYGHDDKKDDYYGHDSYD
>F5BE82F7_000002-T1 F5BE82F7_000002
MTNSYDYDSYDSYGYGHDDGKKGDDYGHDSYDSYGGYGHDDKKDSYGYDSYDSYGYGHDKKDDYDSYDSYDSYGYGHDKK
DDYDSYDSYDSYGYGYDKKDDYDSYNSYDSYGYDSYDSYGYGHDDKKDDYYGHDSYDSYGYGHDDKDDYYGHDSYDSYGY
GYDKKDDYNSYDSYDSYGYGHDDKKDDYYGHDSYDSYGYGYDNKDDYYGHDSYDSYGYGYDNKDDYYGHDSYDSYGYGYD
KKDDYYGHDSYDSYGYGYDSYDHGYGHGHC
>F5BE82F7_000003-T1 F5BE82F7_000003
MTTWLLAVLSLVNTVAALFFLFSVPGRVLVIGQLVLVPLILLVTARAAWKLSPASWKQQHGSASAAASPSANSVPRGLRV
FGLDPAAIAVPTSDPAGGASSHRLARTASSASTTSTGSAPRQRHSIERTALGSRRSQSLGPRRIPTRESLASTSDLVRGM
QVLSVDTNGARSRNATKLDGNQQWRTTLTSIVLVLNAALQVTLTGLTLAAVFGPWLSQSDVIDNVEFARLGHVNATHATI
TVRLRPDRVPAPGADLAITFRANNGMTEWQRVDRTIRTSSDTDFTFAVHLPDLDPATEYEYRIAPAMGNDGTAWLTGRVR
TFPRFADALDTTKPLFTFVAGSCVKPSTPWATETGIRGFRVLADQVKPDLLLFLGDFIYADVPWWFPPSLATYRWHYRFT
YSVNETRRLLATTPSYLSMTCDHEFSNNWDNGEQFPFPVASQAYDEYLGNGNPRSYGATTQYYHFTLGPACFFVADLRRY
RTAPDAENATILGAQQWADLEAWFQTPNCAWRIVAASVPVTNNWAIDKDTWVGYPRDRKRLLDLAHRATGTTVIVSGDRH
AVGIQRLKQYGEVVELSISPISQFYSPIPFYGLFVKDEDRDEVIFEKSMGNVQLGVFNVFANRIAFRLLDGEGVEQFKYD
IYQRKL

is getting thrown for an error:

PHYling ERROR Could not determine alphabet of file: PosixPath('FILE-XYZ.fa')
@hyphaltip
Copy link
Member Author

I can work around this by moving these low complexity sequences a few down in the Fasta file. It would be nice of easel would allow for reading a few more sequences in to make its guess about alphabet type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants