Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add faster msa preprocessing #22

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

RaphaelBouvet
Copy link

Hi,
Thank you for developing and releasing Tranception :)

During testing of this model, i noticed that the MSA_processing in tranception/utils/msa_utils.py was a limiting step if the msa was too large.

This PR adds a Fast_MSA_processing class with improved speed 🔥 at the cost of more memory.
For example for a msa with 21 k sequences:

    fast processing : 13 sec
    base processing : 472 sec

Instead of doing sequences by sequences comparisons in the original code, i parallelize the calculation.
For big msa, doing the whole comparisons at once is not possible so i used sub_arrays to split the calculation.

  • The resulting weights are identical compared to the prereleased weights in my tests.
  • The memory usage can be adjusted manually by changing the size of the subarrays (maybe we can adjust depending on user ram)
  • The code might not work if there is empty sequences in the msa (not tested)

I am sure there is a better/faster way to do this calculation but this method worked for me

Do not hesitate if you have any questions,
Best wishes,
Raphaël

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant