Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask Concatenation Should Impute With Dask Array, not Numpy #1777

Open
3 tasks done
ilan-gold opened this issue Nov 22, 2024 · 0 comments
Open
3 tasks done

Dask Concatenation Should Impute With Dask Array, not Numpy #1777

ilan-gold opened this issue Nov 22, 2024 · 0 comments

Comments

@ilan-gold
Copy link
Contributor

ilan-gold commented Nov 22, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

Code:

import anndata as ad
import dask.array
shape = (10, 100)
adata1 = ad.AnnData(layers={'foo': da.random.random(shape)})
adata2 = ad.AnnData(X=da.random.random(shape))
type(ad.concat([adata1, adata2], join="outer").layers['foo'])

Internally this will call missing_element which can be improved as a comment states:

# Handling of missing values here is hacky for dataframes
# We should probably just handle missing elements for all types
result[k] = concat_arrays(
[
el if not_missing(el) else missing_element(n, axis=axis)
for el, n in zip(els, ns)
],

Versions

-----
anndata             0.11.1.dev12+gd61e09c8
session_info        1.0.0
-----
asciitree           NA
cloudpickle         3.1.0
cython_runtime      NA
dask                2024.11.2
dateutil            2.9.0.post0
exceptiongroup      1.2.2
h5py                3.12.1
importlib_metadata  NA
natsort             8.4.0
numcodecs           0.13.1
numpy               2.1.3
packaging           24.2
pandas              2.2.3
psutil              6.1.0
pytz                2024.2
scipy               1.14.1
six                 1.16.0
tlz                 1.0.0
toolz               1.0.0
yaml                6.0.2
zarr                2.18.3
zipp                NA
zoneinfo            NA
-----
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Linux-5.14.0-427.37.1.el9_4.x86_64-x86_64-with-glibc2.34
-----
Session information updated at 2024-11-22 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant