Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): cache arrays in BaseCompressedSparseDataset #1744

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Nov 8, 2024

  • Noticed while creating the screenshot tutorial for the announcement
  • Tests added
  • Release note added (or unnecessary)

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.55%. Comparing base (d61e09c) to head (1dcf7ad).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1744      +/-   ##
==========================================
- Coverage   87.01%   84.55%   -2.46%     
==========================================
  Files          40       40              
  Lines        6059     6075      +16     
==========================================
- Hits         5272     5137     -135     
- Misses        787      938     +151     
Files with missing lines Coverage Δ
src/anndata/_core/sparse_dataset.py 93.50% <100.00%> (+0.48%) ⬆️
src/anndata/_io/specs/lazy_methods.py 100.00% <100.00%> (ø)

... and 8 files with indirect coverage changes

---- 🚨 Try these New Features:

@ilan-gold ilan-gold marked this pull request as ready for review November 8, 2024 11:19
@ilan-gold ilan-gold marked this pull request as draft November 8, 2024 11:19
Base automatically changed from ig/fix_chunking to main November 8, 2024 17:45
@ilan-gold ilan-gold marked this pull request as ready for review November 11, 2024 14:28
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what it promises and has nice tests.

However I think Isaac only cached the longest array intentionally. I forgot the reason though.

Sure, we now have much bigger data so maybe if there was a tradeoff, things are now different. But maybe worth checking.

src/anndata/tests/helpers.py Outdated Show resolved Hide resolved
@ilan-gold
Copy link
Contributor Author

However I think Isaac only cached the longest array intentionally. I forgot the reason though.

We only cached indptr previously as a np.ndarray i.e., we read it into memory because it needs to be read in anywya ever time you make an access.

As for the other arrays, I don't think there was a good reason - I think we wanted to just start with that but I posted on zulip.

@ilan-gold
Copy link
Contributor Author

(I was the one who started doing the caching)

@flying-sheep
Copy link
Member

Hm, since these arrays are storage classes anyway, and probably don’t cost any noticable amount of memory, I don‘t foresee a problem.

@ilan-gold ilan-gold modified the milestones: 0.11.1, 0.11.2 Nov 12, 2024
@ilan-gold
Copy link
Contributor Author

Let's review this in the absence of an answer

@ilan-gold
Copy link
Contributor Author

(Or rather, give approval since you already seem to have reviewed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants