Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add path parameter to write_zarr method #1548

Open
antoinegaston opened this issue Jul 1, 2024 · 8 comments
Open

Add path parameter to write_zarr method #1548

antoinegaston opened this issue Jul 1, 2024 · 8 comments

Comments

@antoinegaston
Copy link

Please describe your wishes and possible alternatives to achieve the desired result.

This feature would allow to write the AnnData object to a specific path in a zarr store. It requires very slight changes:

In anndata/_io/zarr.py first

def write_zarr(
    store: MutableMapping | str | Path,
    adata: AnnData,
    path: str | None = None,
    chunks=None,
    **ds_kwargs,
) -> None:
    if isinstance(store, Path):
        store = str(store)
    adata.strings_to_categoricals()
    if adata.raw is not None:
        adata.strings_to_categoricals(adata.raw.var)
    # TODO: Use spec writing system for this
    f = zarr.open(store, mode="w")
    f.attrs.setdefault("encoding-type", "anndata")
    f.attrs.setdefault("encoding-version", "0.1.0")

    def callback(func, s, k, elem, dataset_kwargs, iospec):
        if chunks is not None and not isinstance(elem, sparse.spmatrix) and k == "/X":
            func(s, k, elem, dataset_kwargs=dict(chunks=chunks, **dataset_kwargs))
        else:
            func(s, k, elem, dataset_kwargs=dataset_kwargs)

    write_dispatched(f, f"/{path}", adata, callback=callback, dataset_kwargs=ds_kwargs)

In anndata/_core/anndata.py:

class AnnData(metaclass=utils.DeprecationMixinMeta):
    ...
    def write_zarr(
        self,
        store: MutableMapping | PathLike,
        path: str | None = None,
        chunks: bool | int | tuple[int, ...] | None = None,
    ):
        """\
        Write a hierarchical Zarr array store.

        Parameters
        ----------
        store
            The filename, a :class:`~typing.MutableMapping`, or a Zarr storage class.
        path
            Path within the store at which to write the data.
        chunks
            Chunk shape.
        """
        from .._io import write_zarr

        write_zarr(store, self, path=path, chunks=chunks)

And finally adding a small test to test_readwrite.py:

def test_zarr_path(tmp_path):
    zarr_pth = Path(tmp_path) / "test.zarr"
    adata = gen_adata((100, 100), X_type=np.array)
    adata.write_zarr(zarr_pth, path="test")

    from_zarr = ad.read_zarr(zarr_pth / "test")
    assert_equal(from_zarr, adata)
@ilan-gold
Copy link
Contributor

Hi @antoinegaston could you elaborate a bit your use-case? From what I can see, this seems like quite an unsafe operation.

f = zarr.open(store, mode="w")
f.attrs.setdefault("encoding-type", "anndata")
f.attrs.setdefault("encoding-version", "0.1.0")

you open the store and encode the anndata version/type at the root.

write_dispatched(f, f"/{path}", adata, callback=callback, dataset_kwargs=ds_kwargs)

then you write out to a different location? how would you read this back in? Just want to understand! Like, why not just pass in the store at the location you want it?

@ilan-gold ilan-gold self-assigned this Jul 1, 2024
@antoinegaston
Copy link
Author

Hello @ilan-gold thank you for your comment, indeed I missed to pass the path to path parameter in the zarr.open:

f = zarr.open(store, mode="w", path=path)

To give you more context about the use case, we have a zarr store in which we store not only anndata but other things as well so we wanted to be able to do so without having to create multiple stores targeting the different subpath. We want to keep the flexibility to use choose the kind of store tho' without having to multiply the number of parameters to pass to our processing function. It's just the idea of passing path parameter from zarr.open through the write_zarr method.

@ilan-gold
Copy link
Contributor

@antoinegaston But I believe you can pass in a store of your own into write_zarr as things stand, no? So you could use fsspec to create a store at a location and then pass that in?

@antoinegaston
Copy link
Author

Yes you are write, the issue is that in our case we have a global store that is an ABSStore and we create a root group in it in which we create some other groups and where we want to write our anndata object as a group as well. The thing is that the store of all those groups is still the global one and you cannot specifies the path directly to write_zarr as it's a path within a remote storage. Tell me if it's unclear.

@ilan-gold
Copy link
Contributor

ilan-gold commented Jul 1, 2024

Does something like (not exactly, perhas)

# Combine the original store path with the sub-path
new_path = f"{original_store.path}/{sub_path}"

# Open a new ABSStore at the sub-path
new_store = ABSStore(container_name=new_path)

not work?

A short example would be clarifying.

@antoinegaston
Copy link
Author

It does the trick indeed but it's not always an ABSStore, it depends on the type of the global parent store. It can be DirectoryStore as well in some situations.

@ilan-gold
Copy link
Contributor

# Combine the original store path with the sub-path
new_path = f"{original_store.path}/{sub_path}"

# Open a new ABSStore at the sub-path
new_store = type(OriginalStore)(container_name=new_path)

Or similar. I don't know, I don't think adding an argument here makes sense. I think the solution here would be to allow passing a zarr.Group if that doesn't already work (which it very well might - I think zarr.open is idempotent)

@ivirshup
Copy link
Member

ivirshup commented Jul 23, 2024

@antoinegaston, could you use anndata.experimental.write_elem here? E.g.

adata = ad.AnnData(...)
z = zarr.open("store.zarr")
write_elem(z, "whatever/path/you/want/", adata)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants