-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Add support for arbitrary sequences of hashable objects #128
Conversation
This is a workaround until Martinsos#90 is implemented. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment.
f3d27be
to
08b8e0a
Compare
@jbaiter thanks this is very cool :)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a detailed look, looks pretty good!
There are few things I hope we could slightly improve before merging, I left comments, please check them out.
- Refactor input mapping code into separate function - Allow no more than 256 unique values (was: 255) - Also map additional equalities if query or target need mapping - Update docstrings - Fix code style
ddc5975
to
2478821
Compare
Thanks for the thorough review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, I think we are almost there, left a few more comments and that should be it I think. Thanks for the patience :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM!
I will merge it now, and will publish the new version of python package in couple of days when I catch some time.
Thanks a lot :)!
This implements @Martinsos' suggestion from #79 to add support for sequences of arbitrary hashable objects in the Python bindings. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment.
One limitation that we can't get around at the moment is that the query and target sequence together must not contain more than 256 unique values.
It certainly is not the ideal way to go about this, but it should serve as an acceptable workaround for a lot of use cases until #90 is implemented.
This should help with #123, #114, #109, #104, #89 and #79.