Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat_on_disk merge strategies are untested/not implemented #1505

Open
ilan-gold opened this issue May 22, 2024 · 0 comments
Open

concat_on_disk merge strategies are untested/not implemented #1505

ilan-gold opened this issue May 22, 2024 · 0 comments
Milestone

Comments

@ilan-gold
Copy link
Contributor

Please describe your wishes and possible alternatives to achieve the desired result.

We should implement them so that they work properly. I am not really sure if this is a bug since concat_on_disk is experimental and reading through the old PR, I don't see any discussion of it or any tests for it.

Here's an MVCE for first with default arguments, although by adding a merge_type argument to the current test suite for concat_on_disk you can see a full list of problems:

from anndata.tests.helpers import (
    assert_equal,
    gen_adata,
)
import anndata as ad
import numpy as np
from scipy import sparse
import pandas as pd

GEN_ADATA_OOC_CONCAT_ARGS = dict(
    obsm_types=(
        sparse.csr_matrix,
        np.ndarray,
        pd.DataFrame,
    ),
    varm_types=(sparse.csr_matrix, np.ndarray, pd.DataFrame),
    layers_types=(sparse.spmatrix, np.ndarray, pd.DataFrame),
)

adata_1 = gen_adata((100, 200), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_2 = gen_adata((50, 60), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_1.write_h5ad('test_1.h5ad')
adata_2.write_h5ad('test_2.h5ad')
ad.experimental.concat_on_disk(['test_1.h5ad', 'test_2.h5ad'], 'merged.h5ad', merge="first")

raises:

IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
Error raised while writing key 'var_cat' of <class 'h5py._hl.group.Group'> to /var

Here's a full list of the tests that fail from test_anndatas_with_reindex when merge is tested:

Errors + tests
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-unique] - AssertionError: DataFrame are different
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant