Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first attempt to support awkward arrays #647

Merged
merged 129 commits into from
Feb 7, 2023
Merged
Show file tree
Hide file tree
Changes from 124 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
5604eac
first attempt to support awkward arrays
giovp Nov 14, 2021
7dbe908
remove comments
giovp Nov 14, 2021
c0bbf5a
better comment
giovp Nov 14, 2021
0281324
add type to gen_adata
giovp Nov 15, 2021
624a529
first attempt at concat
giovp Nov 15, 2021
05c6c75
remove comment
giovp Nov 15, 2021
3d359de
add outer concat
giovp Nov 15, 2021
9bf0cb9
add awkward to test dep
giovp Nov 15, 2021
974040c
add awk arr to data gen
giovp Nov 15, 2021
13c4d59
fix test base
giovp Nov 16, 2021
74ae9e3
init test for concat
giovp Nov 16, 2021
1d0e629
fix concatenate tests
giovp Nov 19, 2021
aeba549
create mock class for awkward array
giovp Nov 25, 2021
88a5c83
remove space
giovp Nov 25, 2021
15b3d1a
import ak when needed
giovp Nov 28, 2021
7e6beaa
relative import of awk array
giovp Nov 28, 2021
77d5b6c
fix optional dep import
giovp Nov 29, 2021
4aa3d26
resolve conflicts
giovp Nov 29, 2021
3670d8b
Merge branch 'master' into val_shape
giovp Nov 29, 2021
72977c8
Merge branch 'master' into val_shape
giovp Feb 10, 2022
06032b2
merge and pre-commits
giovp Mar 3, 2022
1704aa7
fix merge
giovp Mar 3, 2022
ccc28c2
draft IO for akward arrays
giovp Mar 4, 2022
e524389
add awkward to docs and save form to attrs
giovp Mar 4, 2022
bd2f28d
Merge remote-tracking branch 'origin/master' into val_shape
grst Jul 19, 2022
a928198
Update dependencies
grst Jul 19, 2022
fee56ee
Update dim_len
grst Jul 19, 2022
0775e53
ignore vscode directory
grst Jul 19, 2022
4b89a9b
Validate that awkward arrays align to axes
grst Jul 19, 2022
9d56157
Fix reindexing during merge
grst Jul 19, 2022
d14de3e
fix lint
grst Jul 19, 2022
4d62e7e
remove duplicate import
grst Jul 19, 2022
c4c1b3f
Test different types of awkward arrays in different slots
grst Jul 19, 2022
339bce8
Better function to generate awkward arrays
grst Jul 19, 2022
012de5e
Better dim_len for awkward arrays
grst Jul 19, 2022
7884598
Working out how to best check the dim_len
grst Jul 19, 2022
e16ae35
Only accept awkward arrays that are "regular" in the aligned dimension
grst Jul 20, 2022
0bced2f
Merge remote-tracking branch 'origin/master' into val_shape
grst Aug 12, 2022
588b6af
Switch to v2 API
grst Aug 12, 2022
e687e19
WIP rewrite awkward array generation
grst Aug 12, 2022
41b1423
Improve awkward array generation and dim_len check
grst Aug 12, 2022
fa8a386
Switch to new awkward array generation in all tests
grst Aug 12, 2022
733937a
Fix test_transpose
grst Aug 12, 2022
ed532a2
Fix/workaround more tests
grst Aug 12, 2022
e5706c3
Add test for setting anndata slots to awkward arrays
grst Aug 13, 2022
fceab1b
enable tests for 3d ragged array in layers
grst Aug 13, 2022
ef0637a
Cleanup
grst Aug 13, 2022
06608e9
Fix that X could not be set when creating AnnData object from scratch.
grst Aug 13, 2022
3c46363
Remove code to make awkward array regular after merge.
grst Aug 19, 2022
285e3b3
Do not explicitly copy awkward arrays
grst Aug 29, 2022
1dc93a6
Merge branch 'master' into val_shape
grst Aug 29, 2022
7fa65dd
Implement transposing awkward arrays
grst Aug 29, 2022
32c44cf
Add docs stub and update type hints
grst Aug 29, 2022
5e1c1da
Fix: dtype not available during merge if both X are awkward
grst Aug 29, 2022
08154f7
Fix IO
ivirshup Aug 29, 2022
71f0471
Request pre-release version of awkward
grst Aug 30, 2022
2e66409
Exclude awkward layer in loom tests
grst Aug 30, 2022
6cdcaa0
Merge branch 'master' into val_shape
grst Aug 30, 2022
741af1c
Pull in only changes relevant to obsm/varm
grst Aug 30, 2022
c027669
Merge branch 'val_shape' of github.com:scverse/anndata into val_shape
grst Aug 30, 2022
7f7ebb6
Update tests
grst Aug 30, 2022
771b2ab
Fix type hints
grst Aug 30, 2022
4922603
Update error message in algined mapping
grst Aug 31, 2022
c5c5335
Use compat module to support both awkward v1.9rc and 2.x
grst Aug 31, 2022
8e7a725
Merge branch 'master' into val_shape
grst Aug 31, 2022
c3ccf2f
restructure tests
grst Aug 31, 2022
a8e1648
Add tests for copies and view
grst Aug 31, 2022
c9a6417
Remove unused imoport
grst Aug 31, 2022
d836999
Fix how actual shape is computed in aligned mapping
grst Sep 2, 2022
ed95d8f
Attempt to support views with ak.behavior
grst Sep 2, 2022
f7edc67
Use shallow copy
grst Sep 2, 2022
5a1d056
Add dim_len_awkward function including tests
grst Sep 3, 2022
83effad
Test that assigning an awkward v1 arrays fails
grst Sep 3, 2022
21a4b5f
Add stub for element-wise IO tests
grst Sep 3, 2022
4ff5851
Restructur dim_len_awkward
grst Sep 4, 2022
2c59b19
Add more test cases for awkward IO
grst Sep 4, 2022
988579e
WIP add tests for concatenating AwkArrays with missing values
grst Sep 4, 2022
504cae1
Fix AwkwardArrayView
grst Sep 4, 2022
4dc0826
Simplify awkward array view code
grst Sep 4, 2022
3ab5646
Use None to remove name from awkward array
grst Sep 5, 2022
371f66e
Mark test_no_awkward_v1 as xfail for uns
grst Sep 5, 2022
d2eaf66
Add test for categorical arrays
grst Sep 6, 2022
7b57167
Update docs/fileformat-prose.rst
grst Sep 6, 2022
b3678b6
Update anndata/_core/aligned_mapping.py
grst Sep 6, 2022
d523c89
Update anndata/tests/helpers.py
grst Sep 6, 2022
222998d
Update awkward tests to use assert_equal with exact=True
grst Sep 6, 2022
3fc9817
Bump required version
ivirshup Sep 8, 2022
6a6657b
Update categorical syntax, add new categorical test
ivirshup Sep 8, 2022
7ac4a0c
Start concat tests for awkward
ivirshup Sep 13, 2022
c016725
Merge branch 'master' into val_shape
ivirshup Sep 13, 2022
0340151
Add release notes
grst Sep 26, 2022
02365a6
Add testcases for dim_len with awkward arrays of strings
grst Sep 26, 2022
c20cc31
Fix dim_len for arrays of strings
grst Sep 26, 2022
8421ee6
Merge branch 'master' into val_shape
giovp Dec 14, 2022
2d024f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2022
ddefdcf
Merge branch 'master' into val_shape
grst Jan 2, 2023
50a8dc3
Awkward v2 fixes
grst Jan 2, 2023
fe27b74
Exclude awkward arrays from fill_value concat test
grst Jan 2, 2023
2aed5b6
fix flake8
grst Jan 2, 2023
746ba7d
Merge branch 'master' into val_shape
ivirshup Jan 23, 2023
a589820
Add IO testcase for AIRR data
grst Jan 27, 2023
c26db5b
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
cd1a451
Fix link
ivirshup Jan 30, 2023
9b2ff61
Get inner join working for concatenation
ivirshup Jan 30, 2023
7637fe3
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
52a804a
Bump some concatenation cases to a later PR
ivirshup Jan 31, 2023
75e7526
Generate empty arrays for outer join
ivirshup Feb 1, 2023
d3d1d26
Raise NotImplementedError when creating a view of an awkward array wi…
grst Feb 1, 2023
77e3953
Add warning when setting awkward array in aligned mapping
grst Feb 1, 2023
536f729
Merge branch 'master' into val_shape
grst Feb 1, 2023
cfe200e
Get much more of concatenation 'working'
ivirshup Feb 1, 2023
d6d35bd
Merge branch 'master' into val_shape
ivirshup Feb 1, 2023
cf4ad03
Use warning instead of logging
ivirshup Feb 1, 2023
46d553f
extend todo comment about views
grst Feb 2, 2023
e8eeb54
Fix IO, and to_memory for views of awkward arrays
ivirshup Feb 2, 2023
5ab0708
Removed a number of test cases that we're not targeting
ivirshup Feb 2, 2023
5b39691
Implement outer indexing on axis 0 of an awkward array
ivirshup Feb 2, 2023
45a9958
Fix gen_awkward when one of the dimensions has size 0
ivirshup Feb 2, 2023
94aa4ef
Fix equality function for awkward arrays. Was throwing an error when …
ivirshup Feb 2, 2023
99853d5
Modify outer concatenation test to accept current behaviour of awkwar…
ivirshup Feb 2, 2023
cd2abdd
Merge branch 'master' into val_shape
ivirshup Feb 2, 2023
96bfe31
Add tests for mixed type concatenation with awkward arrays
ivirshup Feb 2, 2023
4a6d119
Add warning about outer joins
ivirshup Feb 3, 2023
4243ccc
Call ak._util.arrays_approx_equal instead of rolling our own
ivirshup Feb 3, 2023
5ad915a
update awkward to 2.0.7 (unfortunately: errors)
ivirshup Feb 6, 2023
07246cc
remove unnecessary checks from AwkwardArrayView
grst Feb 6, 2023
fb137af
Workaround scikit-hep/awkward#2209
ivirshup Feb 7, 2023
6e32637
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2023
3883bb0
Removed extra layer of nesting from on-disk format for awkward arrays
ivirshup Feb 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ test.h5ad

# IDEs
/.idea/
/.vscode/

7 changes: 6 additions & 1 deletion anndata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@
read_mtx,
read_zarr,
)
from ._warnings import OldFormatWarning, WriteWarning, ImplicitModificationWarning
from ._warnings import (
OldFormatWarning,
WriteWarning,
ImplicitModificationWarning,
ExperimentalFeatureWarning,
)

# backwards compat / shortcut for default format
from ._io import read_h5ad as read
48 changes: 39 additions & 9 deletions anndata/_core/aligned_mapping.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
from abc import ABC, abstractmethod
from collections import abc as cabc
from copy import copy
from typing import Union, Optional, Type, ClassVar, TypeVar # Special types
from typing import Iterator, Mapping, Sequence # ABCs
from typing import Tuple, List, Dict # Generic base types
import warnings

import numpy as np
import pandas as pd
from scipy.sparse import spmatrix

from ..utils import deprecated, ensure_df_homogeneous
from ..utils import deprecated, ensure_df_homogeneous, dim_len
from . import raw, anndata
from .views import as_view
from .access import ElementRef
from .index import _subset
from anndata.compat import AwkArray
from anndata._warnings import ExperimentalFeatureWarning


OneDIdx = Union[Sequence[int], Sequence[bool], slice]
Expand Down Expand Up @@ -46,15 +50,37 @@ def _ipython_key_completions_(self) -> List[str]:

def _validate_value(self, val: V, key: str) -> V:
"""Raises an error if value is invalid"""
if isinstance(val, AwkArray):
warnings.warn(
grst marked this conversation as resolved.
Show resolved Hide resolved
"Support for Awkward Arrays is currently experimental. "
"Behavior may change in the future. Please report any issues you may encounter!",
ExperimentalFeatureWarning,
# stacklevel=3,
)
# Prevent from showing up every time an awkward array is used
# You'd think `once` works, but it doesn't at the repl and in notebooks
warnings.filterwarnings(
"ignore",
category=ExperimentalFeatureWarning,
message="Support for Awkward Arrays is currently experimental.*",
)
for i, axis in enumerate(self.axes):
if self.parent.shape[axis] != val.shape[i]:
if self.parent.shape[axis] != dim_len(val, i):
right_shape = tuple(self.parent.shape[a] for a in self.axes)
raise ValueError(
f"Value passed for key {key!r} is of incorrect shape. "
f"Values of {self.attrname} must match dimensions "
f"{self.axes} of parent. Value had shape {val.shape} while "
f"it should have had {right_shape}."
)
actual_shape = tuple(dim_len(val, a) for a, _ in enumerate(self.axes))
if actual_shape[i] is None and isinstance(val, AwkArray):
raise ValueError(
f"The AwkwardArray is of variable length in dimension {i}.",
f"Try ak.to_regular(array, {i}) before including the array in AnnData",
)
grst marked this conversation as resolved.
Show resolved Hide resolved
else:
raise ValueError(
f"Value passed for key {key!r} is of incorrect shape. "
f"Values of {self.attrname} must match dimensions "
f"{self.axes} of parent. Value had shape {actual_shape} while "
f"it should have had {right_shape}."
)

if not self._allow_df and isinstance(val, pd.DataFrame):
name = self.attrname.title().rstrip("s")
val = ensure_df_homogeneous(val, f"{name} {key!r}")
Expand Down Expand Up @@ -84,7 +110,11 @@ def parent(self) -> Union["anndata.AnnData", "raw.Raw"]:
def copy(self):
d = self._actual_class(self.parent, self._axis)
for k, v in self.items():
d[k] = v.copy()
if isinstance(v, AwkArray):
# Shallow copy since awkward array buffers are immutable
d[k] = copy(v)
else:
d[k] = v.copy()
return d

def _view(self, parent: "anndata.AnnData", subset_idx: I):
Expand Down
7 changes: 4 additions & 3 deletions anndata/_core/anndata.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
)
from .sparse_dataset import SparseDataset
from .. import utils
from ..utils import convert_to_dict, ensure_df_homogeneous
from ..utils import convert_to_dict, ensure_df_homogeneous, dim_len
from ..logging import anndata_logger as logger
from ..compat import (
ZarrArray,
Expand All @@ -55,6 +55,7 @@
_move_adj_mtx,
_overloaded_uns,
OverloadedDict,
AwkArray,
)


Expand Down Expand Up @@ -1861,7 +1862,7 @@ def _check_dimensions(self, key=None):
if "obsm" in key:
obsm = self._obsm
if (
not all([o.shape[0] == self._n_obs for o in obsm.values()])
not all([dim_len(o, 0) == self._n_obs for o in obsm.values()])
and len(obsm.dim_names) != self._n_obs
):
raise ValueError(
Expand All @@ -1871,7 +1872,7 @@ def _check_dimensions(self, key=None):
if "varm" in key:
varm = self._varm
if (
not all([v.shape[0] == self._n_vars for v in varm.values()])
not all([dim_len(v, 0) == self._n_vars for v in varm.values()])
and len(varm.dim_names) != self._n_vars
):
raise ValueError(
Expand Down
12 changes: 11 additions & 1 deletion anndata/_core/file_backing.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from . import anndata
from .sparse_dataset import SparseDataset
from ..compat import ZarrArray, DaskArray
from ..compat import ZarrArray, DaskArray, AwkArray


class AnnDataFileManager:
Expand Down Expand Up @@ -123,3 +123,13 @@ def _(x, copy=True):
@to_memory.register(Mapping)
def _(x: Mapping, copy=True):
return {k: to_memory(v, copy=copy) for k, v in x.items()}


@to_memory.register(AwkArray)
def _(x, copy=True):
from copy import copy

if copy:
return copy(x)
else:
return x
9 changes: 8 additions & 1 deletion anndata/_core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
import pandas as pd
from scipy.sparse import spmatrix, issparse
from ..compat import DaskArray, Index, Index1D
from ..compat import AwkArray, DaskArray, Index, Index1D


def _normalize_indices(
Expand Down Expand Up @@ -145,6 +145,13 @@ def _subset_df(df: pd.DataFrame, subset_idx: Index):
return df.iloc[subset_idx]


@_subset.register(AwkArray)
def _subset_awkarray(a: AwkArray, subset_idx: Index):
if all(isinstance(x, cabc.Iterable) for x in subset_idx):
subset_idx = np.ix_(*subset_idx)
return a[subset_idx]


# Registration for SparseDataset occurs in sparse_dataset.py
@_subset.register(h5py.Dataset)
def _subset_dataset(d, subset_idx):
Expand Down
88 changes: 81 additions & 7 deletions anndata/_core/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Literal,
)
import typing
from warnings import warn
from warnings import warn, filterwarnings

from natsort import natsorted
import numpy as np
Expand All @@ -27,9 +27,10 @@
from scipy.sparse import spmatrix

from .anndata import AnnData
from ..utils import asarray
from ..compat import DaskArray
from ..compat import AwkArray, DaskArray
from ..utils import asarray, dim_len
from .index import _subset, make_slice
from anndata._warnings import ExperimentalFeatureWarning

T = TypeVar("T")

Expand Down Expand Up @@ -154,6 +155,13 @@ def equal_sparse(a, b) -> bool:
return False


@equal.register(AwkArray)
def equal_awkward(a, b) -> bool:
from ..compat import awkward as ak

return ak._util.arrays_approx_equal(a, b)


def as_sparse(x):
if not isinstance(x, sparse.spmatrix):
return sparse.csr_matrix(x)
Expand Down Expand Up @@ -366,12 +374,14 @@ def apply(self, el, *, axis, fill_value=None):

Missing values are to be replaced with `fill_value`.
"""
if self.no_change and (el.shape[axis] == len(self.old_idx)):
if self.no_change and (dim_len(el, axis) == len(self.old_idx)):
return el
if isinstance(el, pd.DataFrame):
return self._apply_to_df(el, axis=axis, fill_value=fill_value)
elif isinstance(el, sparse.spmatrix):
return self._apply_to_sparse(el, axis=axis, fill_value=fill_value)
elif isinstance(el, AwkArray):
return self._apply_to_awkward(el, axis=axis, fill_value=fill_value)
elif isinstance(el, DaskArray):
return self._apply_to_dask_array(el, axis=axis, fill_value=fill_value)
else:
Expand Down Expand Up @@ -468,6 +478,22 @@ def _apply_to_sparse(self, el: spmatrix, *, axis, fill_value=None) -> spmatrix:

return out

def _apply_to_awkward(self, el: AwkArray, *, axis, fill_value=None):
import awkward as ak

if self.no_change:
return el
elif axis == 1: # Indexing by field
if self.new_idx.isin(self.old_idx).all(): # inner join
return el[self.new_idx]
else: # outer join
# TODO: this code isn't actually hit, we should refactor
raise Exception("This should be unreachable, please open an issue.")
else:
if len(self.new_idx) > len(self.old_idx):
el = ak.pad_none(el, 1, axis=axis) # axis == 0
return el[self.old_idx.get_indexer(self.new_idx)]


def merge_indices(
inds: Iterable[pd.Index], join: Literal["inner", "outer"]
Expand Down Expand Up @@ -534,6 +560,17 @@ def concat_arrays(arrays, reindexers, axis=0, index=None, fill_value=None):
)
df.index = index
return df
elif any(isinstance(a, AwkArray) for a in arrays):
from ..compat import awkward as ak

if not all(
isinstance(a, AwkArray) or a is MissingVal or 0 in a.shape for a in arrays
):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)

return ak.concatenate([f(a) for f, a in zip(reindexers, arrays)], axis=axis)
elif any(isinstance(a, sparse.spmatrix) for a in arrays):
sparse_stack = (sparse.vstack, sparse.hstack)[axis]
return sparse_stack(
Expand Down Expand Up @@ -579,6 +616,15 @@ def gen_inner_reindexers(els, new_index, axis: Literal[0, 1] = 0):
lambda x, y: x.intersection(y), (df_indices(el) for el in els)
)
reindexers = [Reindexer(df_indices(el), common_ind) for el in els]
elif any(isinstance(el, AwkArray) for el in els if not_missing(el)):
if not all(isinstance(el, AwkArray) for el in els if not_missing(el)):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)
common_keys = intersect_keys(el.fields for el in els)
reindexers = [
Reindexer(pd.Index(el.fields), pd.Index(list(common_keys))) for el in els
]
else:
min_ind = min(el.shape[alt_axis] for el in els)
reindexers = [
Expand All @@ -596,10 +642,38 @@ def gen_outer_reindexers(els, shapes, new_index: pd.Index, *, axis=0):
else (lambda x: pd.DataFrame(index=range(shape)))
for el, shape in zip(els, shapes)
]
else:
# if fill_value is None:
# fill_value = default_fill_value(els)
elif any(isinstance(el, AwkArray) for el in els if not_missing(el)):
import awkward as ak

if not all(isinstance(el, AwkArray) for el in els if not_missing(el)):
raise NotImplementedError(
"Cannot concatenate an AwkwardArray with other array types."
)
warn(
"Outer joins on awkward.Arrays will have different return values in the future."
"For details, and to offer input, please see:\n\n\t"
"https://github.com/scverse/anndata/issues/898",
ExperimentalFeatureWarning,
)
filterwarnings(
"ignore",
category=ExperimentalFeatureWarning,
message=r"Outer joins on awkward.Arrays will have different return values.*",
)
# all_keys = union_keys(el.fields for el in els if not_missing(el))
reindexers = []
for el in els:
if not_missing(el):
reindexers.append(lambda x: x)
else:
reindexers.append(
lambda x: ak.pad_none(
ak.Array([]),
len(x),
0,
)
)
else:
max_col = max(el.shape[1] for el in els if not_missing(el))
orig_cols = [el.shape[1] if not_missing(el) else 0 for el in els]
reindexers = [
Expand Down
Loading