Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
Thank you for developing and releasing Tranception :)
During testing of this model, i noticed that the MSA_processing in tranception/utils/msa_utils.py was a limiting step if the msa was too large.
This PR adds a Fast_MSA_processing class with improved speed 🔥 at the cost of more memory.
For example for a msa with 21 k sequences:
Instead of doing sequences by sequences comparisons in the original code, i parallelize the calculation.
For big msa, doing the whole comparisons at once is not possible so i used sub_arrays to split the calculation.
I am sure there is a better/faster way to do this calculation but this method worked for me
Do not hesitate if you have any questions,
Best wishes,
Raphaël