Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): raising errors where backed is not supported #3048

Merged
merged 40 commits into from
May 21, 2024

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented May 8, 2024

The idea here is to raise errors where I have checked that things currently don't work, regardless of the reason why, and do not make any attempt to fix this problem. Once scverse/anndata#1469 is merged, we can make concrete recommendations for how to handle out-of-core data.

I think a decorator could work but we would have to check the type in the decorator like (instead of relying on current checks like in filter_genes):

if isinstance(arg1, AnnData) and arg1.isbacked:
    raise NotImplementedErrror(...)

But then there is something like log1p where we quasi-support backed via this chunked kwarg, which would no really fit the above paradigm.

Nonetheless, I think I need to go one-by-one through the functions to check what we support and don't.

Separately, we may want to drop support where it exists already (which from my searching, is only obs_df and var_df and then subsample_counts).

  • Release notes not necessary because:

Copy link

codecov bot commented May 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.87%. Comparing base (23c20bc) to head (b0ca228).
Report is 47 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3048      +/-   ##
==========================================
+ Coverage   75.80%   75.87%   +0.06%     
==========================================
  Files         110      110              
  Lines       12502    12533      +31     
==========================================
+ Hits         9477     9509      +32     
+ Misses       3025     3024       -1     
Files Coverage Δ
scanpy/_utils/__init__.py 75.05% <100.00%> (+0.48%) ⬆️
scanpy/preprocessing/_pca.py 92.93% <100.00%> (+0.15%) ⬆️
scanpy/preprocessing/_scale.py 91.96% <100.00%> (+0.07%) ⬆️
scanpy/preprocessing/_simple.py 87.95% <100.00%> (+0.38%) ⬆️
scanpy/tools/_dendrogram.py 86.95% <100.00%> (+0.28%) ⬆️
scanpy/tools/_ingest.py 77.33% <100.00%> (+0.10%) ⬆️
scanpy/tools/_rank_genes_groups.py 94.33% <100.00%> (+0.01%) ⬆️
scanpy/tools/_score_genes.py 85.54% <100.00%> (+0.35%) ⬆️
scanpy/tools/_tsne.py 93.18% <100.00%> (+0.15%) ⬆️

... and 1 file with indirect coverage changes

@ilan-gold ilan-gold added this to the 1.10.2 milestone May 8, 2024
@ilan-gold ilan-gold self-assigned this May 8, 2024
@ilan-gold
Copy link
Contributor Author

@flying-sheep What do you think here? If the plan looks good for this subset of functions, I'd expand it.

@flying-sheep
Copy link
Member

I like the idea! Better error messages, and getting our modalities a bit under control is a great goal as well!

@ilan-gold
Copy link
Contributor Author

Looking at this again, now that I have gone through everything, I think we actually need to check types directly and shouldn't rely on isbacked because it is possible to do something like adata.layers['foo'] = sparse_dataset(g_layer) and this should also error our with a helpful message.

@flying-sheep
Copy link
Member

If that’s actually supported, we need to rethink isbacked anyway.

@ilan-gold
Copy link
Contributor Author

ilan-gold commented May 14, 2024

If that’s actually supported

If what is actually supported? sparse_dataset is exported from experimental and in any case, does being more exhaustive hurt? I think we have been telling people to use sparse_dataset if it suits them.

@ilan-gold
Copy link
Contributor Author

ilan-gold commented May 15, 2024

  1. How do we xfail stuff from dev? https://dev.azure.com/scverse/scanpy/_build/results?buildId=6692&view=logs&j=cb4d9293-b492-5d67-02b0-e6a595893958&t=99aeec2e-a40e-57fc-1ab3-27c1a626c3e0&l=108 It looks like the UMAP package via pynndescent is using something that has been removed (np.infty) in an upcoming release of numpy
  2. Codecov, I think, is outright wrong aklthough that might have to do with the failing dev test

@ilan-gold
Copy link
Contributor Author

@flying-sheep flying-sheep self-requested a review May 16, 2024 08:44
@flying-sheep
Copy link
Member

How do we xfail stuff from dev?

pytest.mark.xfail takes a condition:

xfail_if_dev_tests = pytest.mark.xfail(
    os.environ.get("DEPENDENCIES_VERSION", "latest") == "pre-release",
    reason="...",
)

@xfail_if_dev_tests
def test_xzy(): ...

You probably need to change the tests so it makes the CI variable visible as an env variable, I’m not an Azure expert so I don’t know if it already is.

Codecov, I think, is outright wrong aklthough that might have to do with the failing dev test

Yeah, maybe, let’s see once everything passes.

I’m also OK with lowering the percentage, I just set it to 75% to have some indication if codecov is broken or working. (Before it would report 20% for a PR and there would be no visual indication that that’s a problem)

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@ilan-gold
Copy link
Contributor Author

@ilan-gold ilan-gold merged commit 3ba3f46 into main May 21, 2024
14 checks passed
@ilan-gold ilan-gold deleted the ig/backed_not_implemented branch May 21, 2024 08:19
meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request May 21, 2024
flying-sheep added a commit that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AxisError when calculating QC metrics on backed data Out of Memory filtering returns TypeError
2 participants