Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up anndata writing speed after define_clonotype_clusters #556

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

felixpetschko
Copy link
Collaborator

The writing speed of the anndata object after the define_clonotype_clusters function is too slow which is currently one of the biggest bottlenecks in scirpy's analysis pipeline. The reason for that is the clonotype_id -> cell_ids mapping (dict[str, np.ndarray[str]) that gets stored to the anndata.uns attribute. Now I implemented a version where this mapping has the datatype dict[str, list[str]) and gets converted to a json object before storing it. This is quite fast and is very similar to the previous implementation such that only minor changes in the rest of the code were necessary.

@felixpetschko
Copy link
Collaborator Author

I made some measurements on my laptop:

image
In the first image we can see why we have a problem. Storing the anndata object takes ~28 times longer than the define_clonotype_clusters function itself.

Bild_2024-09-18_194502953
With the json approach the storage time can be reduced drastically and for around 100k cells it takes around the same execution time as the function itself.

Bild_2024-09-18_194533408
Here we see the speedup of the new approach.

@grst
Copy link
Collaborator

grst commented Sep 20, 2024

Maybe this can be fixed in anndata directly. There's no good reason for this to be slow. I'll open a ticket.

If they cant or dont want to fix it, then we can go with this workaround.

@grst
Copy link
Collaborator

grst commented Sep 21, 2024

scverse/anndata#1684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants