Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter for more resilient concat_on_disk #1602

Open
3 tasks done
DingWB opened this issue Aug 21, 2024 · 4 comments
Open
3 tasks done

Add parameter for more resilient concat_on_disk #1602

DingWB opened this issue Aug 21, 2024 · 4 comments

Comments

@DingWB
Copy link

DingWB commented Aug 21, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

When I concat two adata using the following code:

anndata.experimental.concat_on_disk(adata_path_list, raw_adata_path)

I got an error:

Traceback (most recent call last):
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/bin/pym3c", line 8, in <module>
    sys.exit(main())
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/__init__.py", line 47, in main
    fire.Fire({
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/adata.py", line 103, in merge_adatas
    anndata.experimental.concat_on_disk(adata_path_list, raw_adata_path)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 650, in concat_on_disk
    _write_concat_mappings(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 258, in _write_concat_mappings
    _write_concat_sequence(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 354, in _write_concat_sequence
    _write_concat_arrays(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 310, in _write_concat_arrays
    write_concat_dense(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 176, in write_concat_dense
    res = da.concatenate(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/dask/array/core.py", line 4293, in concatenate
    raise ValueError("Shapes do not align: %s", [x.shape for x in seq2])
ValueError: ('Shapes do not align: %s', [(369626, 50), (135426, 63)])

Because the obsm['X_pca'] has a different shape in two adatas, but I only need to contact the X, I don't need obsm or varm.
Could you please add a parameter to let me skip obsm (or varm) and only concat X?

Versions

0.10.8
@flying-sheep flying-sheep changed the title Error for concat_on_disk Add option to run concat_on_disk only on X/layers Aug 23, 2024
@flying-sheep
Copy link
Member

flying-sheep commented Aug 23, 2024

Hmm, since this function exists explicitly to handle big files, I think it wouldn’t be fair to say “just make new files without these parts”, so this is a reasonable request.

Regarding API: We already have the pairwise option, but its goal is to discourage merging semantically unwise-to-merge graphs instead of topologically unmergeable arrays, so we shouldn’t add a parameter for each attr.

Therefore I think we have two options:

  • specify a list of attrs to include or skip, or we could do ref paths (compare Ref paths #342) for more detailed control
  • add a on_error parameter that can take
    • a function to handle the error on-the-fly, with info about what we failed to concatenate (X, obsm['a'], …). can re-raise or return a value that attr/entry is then set to1
    • "warn": raises a warning, and if every single axis-wise merge failed (uns can always be merged), still throws an error.
    • "raise": the default, just raises an error on any merge failure
      If one wants to completely ignore errors, they could then specify on_error=lambda e, **kw: None or manually ignore warnings, which is intentionally unwieldy because warnings are a good thing.

Footnotes

  1. I’d think signature would be

    Parameters (all except error are keyword-only):

    • error: The error instance
    • attr: AnnData attribute we failed to concat
    • key: String key if attr is e.g. obsm or layers, None if attr is e.g. X or obs
    • maybe more? e.g. a list of elements to-be-concatenated?

    Returns:

    • a value if the user wants to default to something/handle something themselves
    • None if we don’t want to set the thing
    • raise an error (e.g. the original one) if the user wants to forward the error.

@flying-sheep flying-sheep changed the title Add option to run concat_on_disk only on X/layers Add parameter for more resilient concat_on_disk Aug 23, 2024
@DingWB
Copy link
Author

DingWB commented Aug 23, 2024

Hi @flying-sheep ,

I think you misunderstood my meaning. I have two adata files, both have obsm['X_pca'], but the shapes are different: (369626, 50) and (135426, 63). So I got an error.

Is there a way to skip the concat of obsm?

@flying-sheep
Copy link
Member

I understood that perfectly.

No, there isn’t, that’s why I’m brainstorming solutions.

@DingWB
Copy link
Author

DingWB commented Aug 23, 2024

OK. Thanks.

@ilan-gold ilan-gold modified the milestone: 0.12.0 Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants