Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first attempt to support awkward arrays #647

Merged
merged 129 commits into from
Feb 7, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
5604eac
first attempt to support awkward arrays
giovp Nov 14, 2021
7dbe908
remove comments
giovp Nov 14, 2021
c0bbf5a
better comment
giovp Nov 14, 2021
0281324
add type to gen_adata
giovp Nov 15, 2021
624a529
first attempt at concat
giovp Nov 15, 2021
05c6c75
remove comment
giovp Nov 15, 2021
3d359de
add outer concat
giovp Nov 15, 2021
9bf0cb9
add awkward to test dep
giovp Nov 15, 2021
974040c
add awk arr to data gen
giovp Nov 15, 2021
13c4d59
fix test base
giovp Nov 16, 2021
74ae9e3
init test for concat
giovp Nov 16, 2021
1d0e629
fix concatenate tests
giovp Nov 19, 2021
aeba549
create mock class for awkward array
giovp Nov 25, 2021
88a5c83
remove space
giovp Nov 25, 2021
15b3d1a
import ak when needed
giovp Nov 28, 2021
7e6beaa
relative import of awk array
giovp Nov 28, 2021
77d5b6c
fix optional dep import
giovp Nov 29, 2021
4aa3d26
resolve conflicts
giovp Nov 29, 2021
3670d8b
Merge branch 'master' into val_shape
giovp Nov 29, 2021
72977c8
Merge branch 'master' into val_shape
giovp Feb 10, 2022
06032b2
merge and pre-commits
giovp Mar 3, 2022
1704aa7
fix merge
giovp Mar 3, 2022
ccc28c2
draft IO for akward arrays
giovp Mar 4, 2022
e524389
add awkward to docs and save form to attrs
giovp Mar 4, 2022
bd2f28d
Merge remote-tracking branch 'origin/master' into val_shape
grst Jul 19, 2022
a928198
Update dependencies
grst Jul 19, 2022
fee56ee
Update dim_len
grst Jul 19, 2022
0775e53
ignore vscode directory
grst Jul 19, 2022
4b89a9b
Validate that awkward arrays align to axes
grst Jul 19, 2022
9d56157
Fix reindexing during merge
grst Jul 19, 2022
d14de3e
fix lint
grst Jul 19, 2022
4d62e7e
remove duplicate import
grst Jul 19, 2022
c4c1b3f
Test different types of awkward arrays in different slots
grst Jul 19, 2022
339bce8
Better function to generate awkward arrays
grst Jul 19, 2022
012de5e
Better dim_len for awkward arrays
grst Jul 19, 2022
7884598
Working out how to best check the dim_len
grst Jul 19, 2022
e16ae35
Only accept awkward arrays that are "regular" in the aligned dimension
grst Jul 20, 2022
0bced2f
Merge remote-tracking branch 'origin/master' into val_shape
grst Aug 12, 2022
588b6af
Switch to v2 API
grst Aug 12, 2022
e687e19
WIP rewrite awkward array generation
grst Aug 12, 2022
41b1423
Improve awkward array generation and dim_len check
grst Aug 12, 2022
fa8a386
Switch to new awkward array generation in all tests
grst Aug 12, 2022
733937a
Fix test_transpose
grst Aug 12, 2022
ed532a2
Fix/workaround more tests
grst Aug 12, 2022
e5706c3
Add test for setting anndata slots to awkward arrays
grst Aug 13, 2022
fceab1b
enable tests for 3d ragged array in layers
grst Aug 13, 2022
ef0637a
Cleanup
grst Aug 13, 2022
06608e9
Fix that X could not be set when creating AnnData object from scratch.
grst Aug 13, 2022
3c46363
Remove code to make awkward array regular after merge.
grst Aug 19, 2022
285e3b3
Do not explicitly copy awkward arrays
grst Aug 29, 2022
1dc93a6
Merge branch 'master' into val_shape
grst Aug 29, 2022
7fa65dd
Implement transposing awkward arrays
grst Aug 29, 2022
32c44cf
Add docs stub and update type hints
grst Aug 29, 2022
5e1c1da
Fix: dtype not available during merge if both X are awkward
grst Aug 29, 2022
08154f7
Fix IO
ivirshup Aug 29, 2022
71f0471
Request pre-release version of awkward
grst Aug 30, 2022
2e66409
Exclude awkward layer in loom tests
grst Aug 30, 2022
6cdcaa0
Merge branch 'master' into val_shape
grst Aug 30, 2022
741af1c
Pull in only changes relevant to obsm/varm
grst Aug 30, 2022
c027669
Merge branch 'val_shape' of github.com:scverse/anndata into val_shape
grst Aug 30, 2022
7f7ebb6
Update tests
grst Aug 30, 2022
771b2ab
Fix type hints
grst Aug 30, 2022
4922603
Update error message in algined mapping
grst Aug 31, 2022
c5c5335
Use compat module to support both awkward v1.9rc and 2.x
grst Aug 31, 2022
8e7a725
Merge branch 'master' into val_shape
grst Aug 31, 2022
c3ccf2f
restructure tests
grst Aug 31, 2022
a8e1648
Add tests for copies and view
grst Aug 31, 2022
c9a6417
Remove unused imoport
grst Aug 31, 2022
d836999
Fix how actual shape is computed in aligned mapping
grst Sep 2, 2022
ed95d8f
Attempt to support views with ak.behavior
grst Sep 2, 2022
f7edc67
Use shallow copy
grst Sep 2, 2022
5a1d056
Add dim_len_awkward function including tests
grst Sep 3, 2022
83effad
Test that assigning an awkward v1 arrays fails
grst Sep 3, 2022
21a4b5f
Add stub for element-wise IO tests
grst Sep 3, 2022
4ff5851
Restructur dim_len_awkward
grst Sep 4, 2022
2c59b19
Add more test cases for awkward IO
grst Sep 4, 2022
988579e
WIP add tests for concatenating AwkArrays with missing values
grst Sep 4, 2022
504cae1
Fix AwkwardArrayView
grst Sep 4, 2022
4dc0826
Simplify awkward array view code
grst Sep 4, 2022
3ab5646
Use None to remove name from awkward array
grst Sep 5, 2022
371f66e
Mark test_no_awkward_v1 as xfail for uns
grst Sep 5, 2022
d2eaf66
Add test for categorical arrays
grst Sep 6, 2022
7b57167
Update docs/fileformat-prose.rst
grst Sep 6, 2022
b3678b6
Update anndata/_core/aligned_mapping.py
grst Sep 6, 2022
d523c89
Update anndata/tests/helpers.py
grst Sep 6, 2022
222998d
Update awkward tests to use assert_equal with exact=True
grst Sep 6, 2022
3fc9817
Bump required version
ivirshup Sep 8, 2022
6a6657b
Update categorical syntax, add new categorical test
ivirshup Sep 8, 2022
7ac4a0c
Start concat tests for awkward
ivirshup Sep 13, 2022
c016725
Merge branch 'master' into val_shape
ivirshup Sep 13, 2022
0340151
Add release notes
grst Sep 26, 2022
02365a6
Add testcases for dim_len with awkward arrays of strings
grst Sep 26, 2022
c20cc31
Fix dim_len for arrays of strings
grst Sep 26, 2022
8421ee6
Merge branch 'master' into val_shape
giovp Dec 14, 2022
2d024f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2022
ddefdcf
Merge branch 'master' into val_shape
grst Jan 2, 2023
50a8dc3
Awkward v2 fixes
grst Jan 2, 2023
fe27b74
Exclude awkward arrays from fill_value concat test
grst Jan 2, 2023
2aed5b6
fix flake8
grst Jan 2, 2023
746ba7d
Merge branch 'master' into val_shape
ivirshup Jan 23, 2023
a589820
Add IO testcase for AIRR data
grst Jan 27, 2023
c26db5b
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
cd1a451
Fix link
ivirshup Jan 30, 2023
9b2ff61
Get inner join working for concatenation
ivirshup Jan 30, 2023
7637fe3
Merge branch 'master' into val_shape
ivirshup Jan 30, 2023
52a804a
Bump some concatenation cases to a later PR
ivirshup Jan 31, 2023
75e7526
Generate empty arrays for outer join
ivirshup Feb 1, 2023
d3d1d26
Raise NotImplementedError when creating a view of an awkward array wi…
grst Feb 1, 2023
77e3953
Add warning when setting awkward array in aligned mapping
grst Feb 1, 2023
536f729
Merge branch 'master' into val_shape
grst Feb 1, 2023
cfe200e
Get much more of concatenation 'working'
ivirshup Feb 1, 2023
d6d35bd
Merge branch 'master' into val_shape
ivirshup Feb 1, 2023
cf4ad03
Use warning instead of logging
ivirshup Feb 1, 2023
46d553f
extend todo comment about views
grst Feb 2, 2023
e8eeb54
Fix IO, and to_memory for views of awkward arrays
ivirshup Feb 2, 2023
5ab0708
Removed a number of test cases that we're not targeting
ivirshup Feb 2, 2023
5b39691
Implement outer indexing on axis 0 of an awkward array
ivirshup Feb 2, 2023
45a9958
Fix gen_awkward when one of the dimensions has size 0
ivirshup Feb 2, 2023
94aa4ef
Fix equality function for awkward arrays. Was throwing an error when …
ivirshup Feb 2, 2023
99853d5
Modify outer concatenation test to accept current behaviour of awkwar…
ivirshup Feb 2, 2023
cd2abdd
Merge branch 'master' into val_shape
ivirshup Feb 2, 2023
96bfe31
Add tests for mixed type concatenation with awkward arrays
ivirshup Feb 2, 2023
4a6d119
Add warning about outer joins
ivirshup Feb 3, 2023
4243ccc
Call ak._util.arrays_approx_equal instead of rolling our own
ivirshup Feb 3, 2023
5ad915a
update awkward to 2.0.7 (unfortunately: errors)
ivirshup Feb 6, 2023
07246cc
remove unnecessary checks from AwkwardArrayView
grst Feb 6, 2023
fb137af
Workaround scikit-hep/awkward#2209
ivirshup Feb 7, 2023
6e32637
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2023
3883bb0
Removed extra layer of nesting from on-disk format for awkward arrays
ivirshup Feb 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion anndata/_core/aligned_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@
from typing import Union, Optional, Type, ClassVar, TypeVar # Special types
from typing import Iterator, Mapping, Sequence # ABCs
from typing import Tuple, List, Dict # Generic base types
from functools import singledispatch

import warnings

try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", UserWarning)
import awkward as ak
except ImportError:
ak = None


import numpy as np
import pandas as pd
Expand Down Expand Up @@ -47,7 +58,7 @@ def _ipython_key_completions_(self) -> List[str]:
def _validate_value(self, val: V, key: str) -> V:
"""Raises an error if value is invalid"""
for i, axis in enumerate(self.axes):
if self.parent.shape[axis] != val.shape[i]:
if self.parent.shape[axis] != dim_len(val, i): # val.shape[i]:
right_shape = tuple(self.parent.shape[a] for a in self.axes)
raise ValueError(
f"Value passed for key {key!r} is of incorrect shape. "
Expand Down Expand Up @@ -349,3 +360,15 @@ def __init__(

PairwiseArraysBase._view_class = PairwiseArraysView
PairwiseArraysBase._actual_class = PairwiseArrays


@singledispatch
def dim_len(x, dim):
return x.shape[dim]


@dim_len.register(ak.Array)
def dim_len_array(x, dim):
if dim != 0:
raise IndexError()
return len(x)
2 changes: 1 addition & 1 deletion anndata/_core/anndata.py
Original file line number Diff line number Diff line change
Expand Up @@ -1846,7 +1846,7 @@ def _check_dimensions(self, key=None):
if "obsm" in key:
obsm = self._obsm
if (
not all([o.shape[0] == self._n_obs for o in obsm.values()])
not all([len(o) == self._n_obs for o in obsm.values()])
grst marked this conversation as resolved.
Show resolved Hide resolved
and len(obsm.dim_names) != self._n_obs
):
raise ValueError(
Expand Down
15 changes: 15 additions & 0 deletions anndata/_core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@
import pandas as pd
from scipy.sparse import spmatrix, issparse

import warnings

try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", UserWarning)
import awkward as ak
except ImportError:
ak = None

Index1D = Union[slice, int, str, np.int64, np.ndarray]
Index = Union[Index1D, Tuple[Index1D, Index1D], spmatrix]
Expand Down Expand Up @@ -140,6 +148,13 @@ def _subset_df(df: pd.DataFrame, subset_idx: Index):
return df.iloc[subset_idx]


@_subset.register(ak.Array)
def _subset_awkarray(a: ak.Array, subset_idx: Index):
if all(isinstance(x, cabc.Iterable) for x in subset_idx):
subset_idx = np.ix_(*subset_idx)
return a[subset_idx]


# Registration for SparseDataset occurs in sparse_dataset.py
@_subset.register(h5py.Dataset)
def _subset_dataset(d, subset_idx):
Expand Down
20 changes: 20 additions & 0 deletions anndata/_core/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,15 @@
from ..logging import anndata_logger as logger
from ..compat import ZappyArray

import warnings

try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", UserWarning)
import awkward as ak
except ImportError:
ak = None


class _SetItemMixin:
"""\
Expand Down Expand Up @@ -112,6 +121,12 @@ def drop(self, *args, inplace: bool = False, **kw):
df.drop(*args, inplace=True, **kw)


class AwkwardArrayView(_ViewMixin, ak.Array):
def copy(self, order: str = "C") -> np.ndarray:
# we want a copy of an akward array
return ak.copy(self)


@singledispatch
def as_view(obj, view_args):
raise NotImplementedError(f"No view type has been registered for {type(obj)}")
Expand Down Expand Up @@ -149,6 +164,11 @@ def as_view_zappy(z, view_args):
return z


@as_view.register(ak.Array)
def as_view_awkarray(array, view_args):
return AwkwardArrayView(array, view_args=view_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivirshup, I have no clue what's going on here:

  • as_view_awkarray is correctly called when accessing e.g. adata.obsm["awk"] when adata is a view, but
  • it returns an ak.Array, and not an instance of AwkwardArrayView
  • therefore, it doesn't have the __setitem__ inherited from _ViewMixin, and doesn't copy on modifcation.
>>> from anndata import AnnData
>>> import anndata
>>> import numpy as np
>>> import awkward._v2 as ak

>>> adata = AnnData(np.full((4,4), 1))
>>> adata.obsm["awk"] = ak.Array([{"a": 2}] * 4)
>>> v = adata[:3,:3]
>>> v
View of AnnData object with n_obs × n_vars = 3 × 3
   obsm: 'awk'

>>> v.obsm['awk']
<Array [{a: 2}, {a: 2}, {a: 2}] type='3 * {a: int64}'>

>>> isinstance(v.obsm['awk'], anndata._core.views.AwkwardArrayView)
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is weird. I think it has to do with the behavior attribute. This is also discussed here:

It looks like they don't do subclassing of arrays, but have a different system which is described here: https://awkward-array.readthedocs.io/en/latest/ak.behavior.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, but I'm unsure what the best way is to pass the view_args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivirshup, I implemented the approach with weakref, as discussed in scikit-hep/awkward#1177. The test_view (in test_awkward.py) passes now. Please LMK what you think.



def _resolve_idxs(old, new, adata):
t = tuple(_resolve_idx(old[i], new[i], adata.shape[i]) for i in (0, 1))
return t
Expand Down