-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Awkward Arrays in AnnData's X #1235
Comments
Hi! Thanks for the feature request. I think that’s feasible, but I need to discuss this with @ivirshup and @ilan-gold. We need to formalize what the supported array types in all of anndata’s fields are. |
I had hoped that this gets eventually solved with #244. Back in the PR that introduced awkward arrays, we decided against implementing it in X (for now) as it would have required duplication of a lot of custom code. Checking the constraints on X is already a huge mess and adding the checks for awkward arrays makes it worse. Personally, I'd suggest you set |
This'd mean that people that load in complex EHR data will have an "empty" object. Yeah, everything is in a layer, but one needs to either always use the layer argument when doing stuff with it or copy it to It'd also deviate from the rest of the scverse workflows where the working data is usually in I want everything in layers but scverse is not there yet. |
In scirpy, It of course depends on your interface, but at least in the scirpy case only very advanced users would want to interact with the awkward array directly. All others only access it through scirpy API calls (including a |
and why repeat the old mistake for new packages |
I would suggest you try working with it in I would be interested in hearing how this goes. |
Because it builds upon scanpy which has the assumption that it works with |
Please describe your wishes and possible alternatives to achieve the desired result.
I'm thrilled to see that AnnData now supports awkward arrays. This feature has been incredibly useful. I'd like to inquire if there are plans to extend this support to the X of AnnData. Implementing this would significantly benefit our ongoing projects with ehrapy 2.0 (https://github.com/theislab/ehrapy) and EHRData.
To explain further, in our current use of AnnData with ehrapy, each patient is represented as a row with several variables. However, as shown in the figure below, some of the variables couldn't be fit into current X (numpy array) because they are lists-of-lists or lists-of-dicts. But users expect processing on these data, for example, getting statistics (min/max/avg), perform imputation, etc. So we don't want to save these variables in .layers, .obsm, or in .varm. Because it is not user-friendly and adds complexity to integrating this data into computational workflow.
Is there an estimated timeline for when we might expect this feature? Thanks for your continuous efforts in improving AnnData!
The text was updated successfully, but these errors were encountered: