Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first attempt to support awkward arrays #647

Merged
merged 129 commits into from
Feb 7, 2023
Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
5604eac
first attempt to support awkward arrays
giovp Nov 14, 2021
7dbe908
remove comments
giovp Nov 14, 2021
c0bbf5a
better comment
giovp Nov 14, 2021
0281324
add type to gen_adata
giovp Nov 15, 2021
624a529
first attempt at concat
giovp Nov 15, 2021
05c6c75
remove comment
giovp Nov 15, 2021
3d359de
add outer concat
giovp Nov 15, 2021
9bf0cb9
add awkward to test dep
giovp Nov 15, 2021
974040c
add awk arr to data gen
giovp Nov 15, 2021
13c4d59
fix test base
giovp Nov 16, 2021
74ae9e3
init test for concat
giovp Nov 16, 2021
1d0e629
fix concatenate tests
giovp Nov 19, 2021
aeba549
create mock class for awkward array
giovp Nov 25, 2021
88a5c83
remove space
giovp Nov 25, 2021
15b3d1a
import ak when needed
giovp Nov 28, 2021
7e6beaa
relative import of awk array
giovp Nov 28, 2021
77d5b6c
fix optional dep import
giovp Nov 29, 2021
4aa3d26
resolve conflicts
giovp Nov 29, 2021
3670d8b
Merge branch 'master' into val_shape
giovp Nov 29, 2021
72977c8
Merge branch 'master' into val_shape
giovp Feb 10, 2022
06032b2
merge and pre-commits
giovp Mar 3, 2022
1704aa7
fix merge
giovp Mar 3, 2022
ccc28c2
draft IO for akward arrays
giovp Mar 4, 2022
e524389
add awkward to docs and save form to attrs
giovp Mar 4, 2022
bd2f28d
Merge remote-tracking branch 'origin/master' into val_shape
grst Jul 19, 2022
a928198
Update dependencies
grst Jul 19, 2022
fee56ee
Update dim_len
grst Jul 19, 2022
0775e53
ignore vscode directory
grst Jul 19, 2022
4b89a9b
Validate that awkward arrays align to axes
grst Jul 19, 2022
9d56157
Fix reindexing during merge
grst Jul 19, 2022
d14de3e
fix lint
grst Jul 19, 2022
4d62e7e
remove duplicate import
grst Jul 19, 2022
c4c1b3f
Test different types of awkward arrays in different slots
grst Jul 19, 2022
339bce8
Better function to generate awkward arrays
grst Jul 19, 2022
012de5e
Better dim_len for awkward arrays
grst Jul 19, 2022
7884598
Working out how to best check the dim_len
grst Jul 19, 2022
e16ae35
Only accept awkward arrays that are "regular" in the aligned dimension
grst Jul 20, 2022
0bced2f
Merge remote-tracking branch 'origin/master' into val_shape
grst Aug 12, 2022
588b6af
Switch to v2 API
grst Aug 12, 2022
e687e19
WIP rewrite awkward array generation
grst Aug 12, 2022
41b1423
Improve awkward array generation and dim_len check
grst Aug 12, 2022
fa8a386
Switch to new awkward array generation in all tests
grst Aug 12, 2022
733937a
Fix test_transpose
grst Aug 12, 2022
ed532a2
Fix/workaround more tests
grst Aug 12, 2022
e5706c3
Add test for setting anndata slots to awkward arrays
grst Aug 13, 2022
fceab1b
enable tests for 3d ragged array in layers
grst Aug 13, 2022
ef0637a
Cleanup
grst Aug 13, 2022
06608e9
Fix that X could not be set when creating AnnData object from scratch.
grst Aug 13, 2022
3c46363
Remove code to make awkward array regular after merge.
grst Aug 19, 2022
285e3b3
Do not explicitly copy awkward arrays
grst Aug 29, 2022
1dc93a6
Merge branch 'master' into val_shape
grst Aug 29, 2022
7fa65dd
Implement transposing awkward arrays
grst Aug 29, 2022
32c44cf
Add docs stub and update type hints
grst Aug 29, 2022
5e1c1da
Fix: dtype not available during merge if both X are awkward
grst Aug 29, 2022
08154f7
Fix IO
ivirshup Aug 29, 2022
71f0471
Request pre-release version of awkward
grst Aug 30, 2022
2e66409
Exclude awkward layer in loom tests
grst Aug 30, 2022
6cdcaa0
Merge branch 'master' into val_shape
grst Aug 30, 2022
741af1c
Pull in only changes relevant to obsm/varm
grst Aug 30, 2022
c027669
Merge branch 'val_shape' of github.com:scverse/anndata into val_shape
grst Aug 30, 2022
7f7ebb6
Update tests
grst Aug 30, 2022
771b2ab
Fix type hints
grst Aug 30, 2022
4922603
Update error message in algined mapping
grst Aug 31, 2022
c5c5335
Use compat module to support both awkward v1.9rc and 2.x
grst Aug 31, 2022
8e7a725
Merge branch 'master' into val_shape
grst Aug 31, 2022
c3ccf2f
restructure tests
grst Aug 31, 2022
a8e1648
Add tests for copies and view
grst Aug 31, 2022
c9a6417
Remove unused imoport
grst Aug 31, 2022
d836999
Fix how actual shape is computed in aligned mapping
grst Sep 2, 2022
ed95d8f
Attempt to support views with ak.behavior
grst Sep 2, 2022
f7edc67
Use shallow copy
grst Sep 2, 2022
5a1d056
Add dim_len_awkward function including tests
grst Sep 3, 2022
83effad
Test that assigning an awkward v1 arrays fails
grst Sep 3, 2022
21a4b5f
Add stub for element-wise IO tests
grst Sep 3, 2022
4ff5851
Restructur dim_len_awkward
grst Sep 4, 2022
2c59b19
Add more test cases for awkward IO
grst Sep 4, 2022
988579e
WIP add tests for concatenating AwkArrays with missing values
grst Sep 4, 2022
504cae1
Fix AwkwardArrayView
grst Sep 4, 2022
4dc0826
Simplify awkward array view code
grst Sep 4, 2022
3ab5646
Use None to remove name from awkward array
grst Sep 5, 2022
371f66e
Mark test_no_awkward_v1 as xfail for uns
grst Sep 5, 2022
d2eaf66
Add test for categorical arrays
grst Sep 6, 2022
7b57167
Update docs/fileformat-prose.rst
grst Sep 6, 2022
b3678b6
Update anndata/_core/aligned_mapping.py
grst Sep 6, 2022
d523c89
Update anndata/tests/helpers.py
grst Sep 6, 2022
222998d
Update awkward tests to use assert_equal with exact=True
grst Sep 6, 2022
3fc9817
Bump required version
ivirshup Sep 8, 2022
6a6657b
Update categorical syntax, add new categorical test
ivirshup Sep 8, 2022
7ac4a0c
Start concat tests for awkward
ivirshup Sep 13, 2022
c016725
Merge branch 'master' into val_shape
ivirshup Sep 13, 2022
0340151
Add release notes
grst Sep 26, 2022
02365a6
Add testcases for dim_len with awkward arrays of strings
grst Sep 26, 2022
c20cc31
Fix dim_len for arrays of strings
grst Sep 26, 2022
8421ee6
Merge branch 'master' into val_shape
giovp Dec 14, 2022
2d024f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2022
ddefdcf
Merge branch 'master' into val_shape
grst Jan 2, 2023
50a8dc3
Awkward v2 fixes
grst Jan 2, 2023
fe27b74
Exclude awkward arrays from fill_value concat test
grst Jan 2, 2023
2aed5b6
fix flake8
grst Jan 2, 2023
746ba7d
Merge branch 'master' into val_shape
ivirshup Jan 23, 2023
a589820
Add IO testcase for AIRR data
grst Jan 27, 2023
c26db5b
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
cd1a451
Fix link
ivirshup Jan 30, 2023
9b2ff61
Get inner join working for concatenation
ivirshup Jan 30, 2023
7637fe3
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
52a804a
Bump some concatenation cases to a later PR
ivirshup Jan 31, 2023
75e7526
Generate empty arrays for outer join
ivirshup Feb 1, 2023
d3d1d26
Raise NotImplementedError when creating a view of an awkward array wi…
grst Feb 1, 2023
77e3953
Add warning when setting awkward array in aligned mapping
grst Feb 1, 2023
536f729
Merge branch 'master' into val_shape
grst Feb 1, 2023
cfe200e
Get much more of concatenation 'working'
ivirshup Feb 1, 2023
d6d35bd
Merge branch 'master' into val_shape
ivirshup Feb 1, 2023
cf4ad03
Use warning instead of logging
ivirshup Feb 1, 2023
46d553f
extend todo comment about views
grst Feb 2, 2023
e8eeb54
Fix IO, and to_memory for views of awkward arrays
ivirshup Feb 2, 2023
5ab0708
Removed a number of test cases that we're not targeting
ivirshup Feb 2, 2023
5b39691
Implement outer indexing on axis 0 of an awkward array
ivirshup Feb 2, 2023
45a9958
Fix gen_awkward when one of the dimensions has size 0
ivirshup Feb 2, 2023
94aa4ef
Fix equality function for awkward arrays. Was throwing an error when …
ivirshup Feb 2, 2023
99853d5
Modify outer concatenation test to accept current behaviour of awkwar…
ivirshup Feb 2, 2023
cd2abdd
Merge branch 'master' into val_shape
ivirshup Feb 2, 2023
96bfe31
Add tests for mixed type concatenation with awkward arrays
ivirshup Feb 2, 2023
4a6d119
Add warning about outer joins
ivirshup Feb 3, 2023
4243ccc
Call ak._util.arrays_approx_equal instead of rolling our own
ivirshup Feb 3, 2023
5ad915a
update awkward to 2.0.7 (unfortunately: errors)
ivirshup Feb 6, 2023
07246cc
remove unnecessary checks from AwkwardArrayView
grst Feb 6, 2023
fb137af
Workaround scikit-hep/awkward#2209
ivirshup Feb 7, 2023
6e32637
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2023
3883bb0
Removed extra layer of nesting from on-disk format for awkward arrays
ivirshup Feb 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ test.h5ad

# IDEs
/.idea/
/.vscode/

31 changes: 22 additions & 9 deletions anndata/_core/aligned_mapping.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from abc import ABC, abstractmethod
from collections import abc as cabc
from copy import copy
from typing import Union, Optional, Type, ClassVar, TypeVar # Special types
from typing import Iterator, Mapping, Sequence # ABCs
from typing import Tuple, List, Dict # Generic base types
Expand All @@ -8,11 +9,12 @@
import pandas as pd
from scipy.sparse import spmatrix

from ..utils import deprecated, ensure_df_homogeneous
from ..utils import deprecated, ensure_df_homogeneous, dim_len
from . import raw, anndata
from .views import as_view
from .access import ElementRef
from .index import _subset
from anndata.compat import AwkArray


OneDIdx = Union[Sequence[int], Sequence[bool], slice]
Expand Down Expand Up @@ -47,14 +49,22 @@ def _ipython_key_completions_(self) -> List[str]:
def _validate_value(self, val: V, key: str) -> V:
"""Raises an error if value is invalid"""
for i, axis in enumerate(self.axes):
if self.parent.shape[axis] != val.shape[i]:
if self.parent.shape[axis] != dim_len(val, i):
right_shape = tuple(self.parent.shape[a] for a in self.axes)
raise ValueError(
f"Value passed for key {key!r} is of incorrect shape. "
f"Values of {self.attrname} must match dimensions "
f"{self.axes} of parent. Value had shape {val.shape} while "
f"it should have had {right_shape}."
)
actual_shape = tuple(dim_len(val, a) for a, _ in enumerate(self.axes))
if actual_shape[i] is None and isinstance(val, AwkArray):
raise ValueError(
f"The AwkwardArray is of variable length in dimension {i}.",
f"Try ak.to_regular(array, {i}) before including the array in AnnData",
)
grst marked this conversation as resolved.
Show resolved Hide resolved
else:
raise ValueError(
f"Value passed for key {key!r} is of incorrect shape. "
f"Values of {self.attrname} must match dimensions "
f"{self.axes} of parent. Value had shape {actual_shape} while "
f"it should have had {right_shape}."
)

if not self._allow_df and isinstance(val, pd.DataFrame):
name = self.attrname.title().rstrip("s")
val = ensure_df_homogeneous(val, f"{name} {key!r}")
Expand Down Expand Up @@ -84,7 +94,10 @@ def parent(self) -> Union["anndata.AnnData", "raw.Raw"]:
def copy(self):
d = self._actual_class(self.parent, self._axis)
for k, v in self.items():
d[k] = v.copy()
if isinstance(v, AwkArray):
d[k] = copy(v)
grst marked this conversation as resolved.
Show resolved Hide resolved
else:
d[k] = v.copy()
return d

def _view(self, parent: "anndata.AnnData", subset_idx: I):
Expand Down
7 changes: 4 additions & 3 deletions anndata/_core/anndata.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
)
from .sparse_dataset import SparseDataset
from .. import utils
from ..utils import convert_to_dict, ensure_df_homogeneous
from ..utils import convert_to_dict, ensure_df_homogeneous, dim_len
from ..logging import anndata_logger as logger
from ..compat import (
ZarrArray,
Expand All @@ -56,6 +56,7 @@
_move_adj_mtx,
_overloaded_uns,
OverloadedDict,
AwkArray,
)


Expand Down Expand Up @@ -1852,7 +1853,7 @@ def _check_dimensions(self, key=None):
if "obsm" in key:
obsm = self._obsm
if (
not all([o.shape[0] == self._n_obs for o in obsm.values()])
not all([dim_len(o, 0) == self._n_obs for o in obsm.values()])
and len(obsm.dim_names) != self._n_obs
):
raise ValueError(
Expand All @@ -1862,7 +1863,7 @@ def _check_dimensions(self, key=None):
if "varm" in key:
varm = self._varm
if (
not all([v.shape[0] == self._n_vars for v in varm.values()])
not all([dim_len(v, 0) == self._n_vars for v in varm.values()])
and len(varm.dim_names) != self._n_vars
):
raise ValueError(
Expand Down
9 changes: 8 additions & 1 deletion anndata/_core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
import pandas as pd
from scipy.sparse import spmatrix, issparse

from ..compat import AwkArray

Index1D = Union[slice, int, str, np.int64, np.ndarray]
Index = Union[Index1D, Tuple[Index1D, Index1D], spmatrix]
Expand Down Expand Up @@ -140,6 +140,13 @@ def _subset_df(df: pd.DataFrame, subset_idx: Index):
return df.iloc[subset_idx]


@_subset.register(AwkArray)
def _subset_awkarray(a: AwkArray, subset_idx: Index):
if all(isinstance(x, cabc.Iterable) for x in subset_idx):
subset_idx = np.ix_(*subset_idx)
return a[subset_idx]


# Registration for SparseDataset occurs in sparse_dataset.py
@_subset.register(h5py.Dataset)
def _subset_dataset(d, subset_idx):
Expand Down
62 changes: 59 additions & 3 deletions anndata/_core/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
from scipy.sparse import spmatrix

from .anndata import AnnData
from ..compat import Literal
from ..utils import asarray
from ..compat import Literal, AwkArray
from ..utils import asarray, dim_len

T = TypeVar("T")

Expand Down Expand Up @@ -128,6 +128,16 @@ def equal_sparse(a, b) -> bool:
return False


@equal.register(AwkArray)
def equal_awkward(a, b) -> bool:
from ..compat import awkward as ak

if dim_len(a, 0) == dim_len(b, 0):
return ak.all(a == b)
else:
return False


def as_sparse(x):
if not isinstance(x, sparse.spmatrix):
return sparse.csr_matrix(x)
Expand Down Expand Up @@ -341,12 +351,14 @@ def apply(self, el, *, axis, fill_value=None):

Missing values are to be replaced with `fill_value`.
"""
if self.no_change and (el.shape[axis] == len(self.old_idx)):
if self.no_change and (dim_len(el, axis) == len(self.old_idx)):
return el
if isinstance(el, pd.DataFrame):
return self._apply_to_df(el, axis=axis, fill_value=fill_value)
elif isinstance(el, sparse.spmatrix):
return self._apply_to_sparse(el, axis=axis, fill_value=fill_value)
elif isinstance(el, AwkArray):
return self._apply_to_awkward(el, axis=axis, fill_value=fill_value)
else:
return self._apply_to_array(el, axis=axis, fill_value=fill_value)

Expand Down Expand Up @@ -424,6 +436,21 @@ def _apply_to_sparse(self, el: spmatrix, *, axis, fill_value=None) -> spmatrix:

return out

def _apply_to_awkward(self, el: AwkArray, *, axis, fill_value=None):
if dim_len(el, axis) is None:
# Do not reindex variable-length dimensions
return el
else:
indexer = self.old_idx.get_indexer(self.new_idx)
if -1 in indexer:
raise NotImplementedError(
"Outer join operations are currently not supported with AwkwardArrays"
)
if axis == 0:
return el[indexer]
if axis == 1:
return el[:, indexer]


def merge_indices(
inds: Iterable[pd.Index], join: Literal["inner", "outer"]
Expand Down Expand Up @@ -490,6 +517,19 @@ def concat_arrays(arrays, reindexers, axis=0, index=None, fill_value=None):
)
df.index = index
return df
elif any(isinstance(a, AwkArray) for a in arrays):
from ..compat import awkward as ak

if not all(
isinstance(a, AwkArray) or a is MissingVal or 0 in a.shape for a in arrays
):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)

return ak.concatenate(
[f(a, axis=1 - axis) for f, a in zip(reindexers, arrays)], axis=axis
)
elif any(isinstance(a, sparse.spmatrix) for a in arrays):
sparse_stack = (sparse.vstack, sparse.hstack)[axis]
return sparse_stack(
Expand Down Expand Up @@ -535,6 +575,14 @@ def gen_inner_reindexers(els, new_index, axis: Literal[0, 1] = 0):
lambda x, y: x.intersection(y), (df_indices(el) for el in els)
)
reindexers = [Reindexer(df_indices(el), common_ind) for el in els]
elif any(isinstance(el, AwkArray) for el in els if not_missing(el)):
if not all(isinstance(el, AwkArray) for el in els if not_missing(el)):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)
# do not reindex awkward arrays
# TODO unintended behaviour?
reindexers = [lambda *args, **kwargs: args[0] for _ in els]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these reindexers do during merge and what happens if we don't reindex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking what Reindexers do generally, or specifically what the ones defined here do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ones specifically.

I think they are relevant for subsetting columns if obsm/varm is a data frame? I that case I think it would indeed be the right thing to do nothing for awkward arrays, as the second axis is not aligned to anything and could be of variable length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are relevant for subsetting columns if obsm/varm is a data frame?

Yes. It's for taking the intersection of the available columns, so we are just concatenating those for an inner join.

For non-labelled arrays (ndarrays and sparse) we take minimum of the column axis size.

I think it would indeed be the right thing to do nothing for awkward arrays, as the second axis is not aligned to anything and could be of variable length.

That's where I get a little confused, since it depends on the dimensionality of the awkward array.

I think what inner join would be, would be to take the intersection of all axes apart from the one we're concatenating along.

For example, lets say we have:

awk_a = ak.Array([
    {"a": [1, 2, 3], "b": [1, 2]},
    {"a": [4, 5], "b": [3, 4]},
    {"a": [6], "b": [5]},
])


awk_b = ak.Array([
    {"a": [1, 2, 3]},
    {"a": [4, 5]},
    {"a": [6]},
])

For an inner join I think we take the intersection of keys from the Record dimension, dropping the "b" key from the record type.

For an outer join, I think I would expect an array where b is an optional type.

However: I see awkard array makes a union type here – so I'm not too sure what's right here.

Copy link
Contributor

@grst grst Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only work for a RecordType, what would you do for a ListType?

awk_a  = ak.Array([
    [1,2,3],
    [1,2,3,4],
])

awk_b = ak.Array([
  [],
  [1,2]
  [1,2,3,4,5,6]
])

Also, even when you are using a RecordType, unlike for a dataframe, there's no need that every record has the same columns. So I would keep it simple and just concatenate them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would you do for a ListType?

A record type has a fixed set of keys, while a list type is inherently unsized. I'd consider ListTypes equivalent, while RecordTypes are only aligned if the keys are the same.

I sorta see it as we have already defined inner (intersection) and outer (union) for Record and Regular dimensions. I think they don't really have a meaning for List dimensions, so don't do anything.

Also, even when you are using a RecordType, unlike for a dataframe, there's no need that every record has the same columns. So I would keep it simple and just concatenate them.

This is one option. That join has no meaning for awkward arrays.

But, I think there is quite a lot of usefulness to the inner join. We are dropping fields which aren't represented for all samples. Do you actually want to always keep those around?

I think it also becomes confusing in the case where one of the dimensions is Regular. Now we can have different Regular dimensions per row. What does this mean for concatenating values in X?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit afraid that this ends up being a rabbit hole, as the arrays can be arbitrarily nested. If you have a record type that's embedded in a variable-length list type in the 6th dimension, would you still want to attempt an inner/outer join? There might also be nasty UnionType combinations (technically it is possible to have both records and lists in the same ListType).

Could we get away with not performing a merge for now, and implement a merging strategy later (if requested) in a backwards-compatible manner?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get away with not performing a merge for now, and implement a merging strategy later (if requested) in a backwards-compatible manner?

I'm not sure how we could do this in a backwards compatible way, since the current logic isn't a subset of the intersection. If you want to punt on this I think we can say awkward array support is experimental and the concatenation logic will change in the future.


I don't think the arbitrarily nested types would be too difficult to work with. We could just iterate through the levels applying the same logic at each level.

UnionTypes however, I'm not so sure how to deal with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we go with

Also, even when you are using a RecordType, unlike for a dataframe, there's no need that every record has the same columns. So I would keep it simple and just concatenate them.

This is one option. That join has no meaning for awkward arrays.

for now.

We can create an issue to track adding support for inner joins of awkward arrays (or at least various special cases thereof, such as RecordTypes in the second dimension. Since we decided on releasing this as experimental, I wouldn't worry to much about backwards compatibility.

else:
min_ind = min(el.shape[alt_axis] for el in els)
reindexers = [
Expand All @@ -552,6 +600,14 @@ def gen_outer_reindexers(els, shapes, new_index: pd.Index, *, axis=0):
else (lambda x: pd.DataFrame(index=range(shape)))
for el, shape in zip(els, shapes)
]
elif any(isinstance(el, AwkArray) for el in els if not_missing(el)):
if not all(isinstance(el, AwkArray) for el in els if not_missing(el)):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)
# do not reindex awkward arrays
# TODO unintended behaviour?
reindexers = [lambda *args, **kwargs: args[0] for _ in els]
else:
# if fill_value is None:
# fill_value = default_fill_value(els)
Expand Down
67 changes: 65 additions & 2 deletions anndata/_core/views.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from contextlib import contextmanager
from copy import deepcopy
from copy import copy, deepcopy
from enum import Enum
from functools import reduce, singledispatch, wraps
from typing import Any, KeysView, Optional, Sequence, Tuple
import warnings
Expand All @@ -11,7 +12,7 @@

from anndata._warnings import ImplicitModificationWarning
from .access import ElementRef
from ..compat import ZappyArray
from ..compat import ZappyArray, AwkArray


class _SetItemMixin:
Expand Down Expand Up @@ -157,6 +158,68 @@ def as_view_zappy(z, view_args):
return z


try:
from ..compat import awkward as ak
import weakref

# Registry to store weak references from AwkwardArrayViews to their parent AnnData container
_registry = weakref.WeakValueDictionary()
_PARAM_NAME = "_view_args"

class AwkwardArrayView(_ViewMixin, AwkArray):
ivirshup marked this conversation as resolved.
Show resolved Hide resolved
@property
def _view_args(self):
"""Override _view_args to retrieve the values from awkward arrays parameters.

Awkward arrays cannot be subclassed like other python objects. Instead subclasses need
to be attached as "behavior". These "behaviors" cannot take any additional parameters (as we do
for other data types to store `_view_args`). Therefore, we need to store `_view_args` using awkward's
parameter mechanism. These parameters need to be json-serializable, which is why we can't store
ElementRef directly, but need to replace the reference to the parent AnnDataView container with a weak
reference.
"""
parent_key, attrname, keys = self.layout.parameter(_PARAM_NAME)
if parent_key is None or attrname is None or keys is None:
raise KeyError(
"AwkwardArrayView does not hold reference to original AnnData object."
)
else:
try:
grst marked this conversation as resolved.
Show resolved Hide resolved
parent = _registry[parent_key]
except KeyError:
raise KeyError(
"AwkwardArrayView has invalid reference to original AnnData object."
)
else:
return ElementRef(parent, attrname, keys)

def __copy__(self) -> AwkArray:
"""
Turn the AwkwardArrayView into an actual AwkwardArray with no special behavior.

Need to override __copy__ instead of `.copy()` as awkward arrays don't implement `.copy()`
and are copied using python's standard copy mechanism in `aligned_mapping.py`.
"""
array = self
# makes a shallow copy and removes the reference to the original AnnData object
array = ak.with_parameter(self, _PARAM_NAME, None)
array = ak.with_name(array, None)
return array

@as_view.register(AwkArray)
def as_view_awkarray(array, view_args):
parent, attrname, keys = view_args
parent_key = f"target-{id(parent)}"
_registry[parent_key] = parent
array = ak.with_parameter(array, _PARAM_NAME, (parent_key, attrname, keys))
return ak.with_name(array, name="AwkwardArrayView")
grst marked this conversation as resolved.
Show resolved Hide resolved

ak.behavior["*", "AwkwardArrayView"] = AwkwardArrayView

except ImportError:
pass


def _resolve_idxs(old, new, adata):
t = tuple(_resolve_idx(old[i], new[i], adata.shape[i]) for i in (0, 1))
return t
Expand Down
30 changes: 30 additions & 0 deletions anndata/_io/specs/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
)
from anndata._io.utils import report_write_key_on_error, check_key, H5PY_V3
from anndata._warnings import OldFormatWarning
from anndata.compat import AwkArray

from .registry import (
_REGISTRY,
Expand Down Expand Up @@ -481,6 +482,35 @@ def read_sparse_partial(elem, *, items=None, indices=(slice(None), slice(None)))
return SparseDataset(elem)[indices]


#################
# Awkward array #
#################


@_REGISTRY.register_write(H5Group, AwkArray, IOSpec("awkward-array", "0.1.0"))
@_REGISTRY.register_write(ZarrGroup, AwkArray, IOSpec("awkward-array", "0.1.0"))
def write_awkward(f, k, v, dataset_kwargs=MappingProxyType({})):
from anndata.compat import awkward as ak

group = f.create_group(k)
form, length, container = ak.to_buffers(ak.packed(v))
group.attrs["length"] = length
group.attrs["form"] = form.to_json()
write_elem(group, "container", container, dataset_kwargs=dataset_kwargs)


@_REGISTRY.register_read(H5Group, IOSpec("awkward-array", "0.1.0"))
@_REGISTRY.register_read(ZarrGroup, IOSpec("awkward-array", "0.1.0"))
def read_awkward(elem):
from anndata.compat import awkward as ak

form = _read_attr(elem.attrs, "form")
length = _read_attr(elem.attrs, "length")
container = read_elem(elem["container"])

return ak.from_buffers(form, length, container)


##############
# DataFrames #
##############
Expand Down
Loading