Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug h5ad saving #580

Merged
merged 23 commits into from
Nov 16, 2023
Merged

Debug h5ad saving #580

merged 23 commits into from
Nov 16, 2023

Conversation

Sichao25
Copy link
Collaborator

@Sichao25 Sichao25 commented Sep 29, 2023

Some parts of adata still have incompatible data types. Including:

  • fate output -> convert to a different format when saving, convert back when loading
  • kmc object -> convert to a different format when saving, convert back when loading
  • umap object -> now umap object will not be saved, it will be constructed with parameters when needed.
  • velocity parameters -> convert to compatible data type
  • cell phase genes -> convert to compatible data type
  • init cells -> convert to compatible data type
  • fixed points and nullclines -> convert to compatible data type, nullclines are replaced by NCx and NCy

In the future, we will use import_h5ad and export_h5ad for saving and loading processed data.

@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2023

Codecov Report

Attention: 79 lines in your changes are missing coverage. Please review.

Comparison is base (e5e9017) 22.51% compared to head (8f3cc0f) 22.13%.
Report is 139 commits behind head on master.

Files Patch % Lines
dynamo/data_io.py 15.38% 33 Missing ⚠️
dynamo/utils.py 17.64% 14 Missing ⚠️
dynamo/tools/connectivity.py 13.33% 13 Missing ⚠️
dynamo/tools/cell_velocities.py 22.22% 7 Missing ⚠️
dynamo/tools/markers.py 0.00% 4 Missing ⚠️
dynamo/plot/topography.py 0.00% 3 Missing ⚠️
dynamo/prediction/fate.py 40.00% 3 Missing ⚠️
dynamo/tools/utils.py 50.00% 1 Missing ⚠️
dynamo/vectorfield/topography.py 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #580      +/-   ##
==========================================
- Coverage   22.51%   22.13%   -0.38%     
==========================================
  Files         165      166       +1     
  Lines       26991    28043    +1052     
==========================================
+ Hits         6077     6208     +131     
- Misses      20914    21835     +921     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sichao25 Sichao25 marked this pull request as ready for review September 29, 2023 18:02
Copy link
Collaborator

@Xiaojieqiu Xiaojieqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sichao, I made some comments. we can briefly discuss about it

Comment on lines +357 to +370
def export_kmc(adata: AnnData) -> None:
"""Save the parameters of kmc and delete the kmc object from anndata."""
kmc = adata.uns["kmc"]
adata.uns["kmc_params"] = {
"P": kmc.P,
"Idx": kmc.Idx,
"eignum": kmc.eignum,
"D": kmc.D,
"U": kmc.U,
"W": kmc.W,
"W_inv": kmc.W_inv,
"Kd": kmc.Kd,
}
adata.uns.pop("kmc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add this to the place where kmc is saved to adata.uns?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, we can keep a kmc object when we run the Dynamo analysis. If we need it, we just read it from the adata.uns.

Umap object uses a different saving strategy, which means umap parameters will be saved instead of the object itself. My idea is that it is possible to avoid creating a umap in the pipeline even if we run the umap dimension reduction. Unless the user wants to perform inverse transform in a specific analysis like fate, they don't need this Umap instance. While the creation of KMC is inevitable if the user enables the kmc method in tl.cell_velocities. Since we have already created KMC, we can keep it for future usage until saving to h5ad.

dynamo/data_io.py Outdated Show resolved Hide resolved
Comment on lines +268 to +270
NCx, NCy = (
[vecfld_dict["NCx"][index] for index in vecfld_dict["NCx"]],
[vecfld_dict["NCy"][index] for index in vecfld_dict["NCy"]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the behaviors for this? will this reorder the x/y coordinates of the nullclines?
[vecfld_dict["NCx"][index] for index in vecfld_dict["NCx"]],
[vecfld_dict["NCy"][index] for index in vecfld_dict["NCy"]],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will read the nullclines from the dictionary to form a list of x and y coordinates. The order should be the same. Here is a reference indicating regular dictionaries have kept their items in the same order that they were inserted into the underlying dictionary since python 3.6.

negative_sample_rate=params["umap_kwargs"]["negative_sample_rate"],
init_pos=params["umap_kwargs"]["init_pos"],
random_state=params["umap_kwargs"]["random_state"],
umap_kwargs=params["umap_kwargs"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umap_kwargs=params["umap_kwargs"]
will this lead to pass duplicated arguments to construct_mapper_umap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params["umap_kwargs"] may pass duplicate arguments. But this should not raise an error because in construct_mapper_umap we use update_dict.

negative_sample_rate=params["umap_kwargs"]["negative_sample_rate"],
init_pos=params["umap_kwargs"]["init_pos"],
random_state=params["umap_kwargs"]["random_state"],
umap_kwargs=params["umap_kwargs"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umap_kwargs=params["umap_kwargs"]
will this lead to pass duplicated arguments to construct_mapper_umap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params["umap_kwargs"] may pass duplicate arguments. But this should not raise an error because in construct_mapper_umap we use update_dict.

"average": average,
"t": t,
"prediction": prediction,
# "VecFld": VecFld,
"VecFld_true": VecFld_true,
# "VecFld_true": VecFld_true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VecFld_true means the groundtruth vector field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need this in the pipeline?

Comment on lines 603 to +604
"init_states": init_states,
"init_cells": init_cells,
"init_cells": list(init_cells),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why list only applies to init_cells instead of init_states?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_cells can be an Index (so we convert it to a list) while init_states is an array (a more compatible data type).

dynamo/preprocessing/cell_cycle.py Outdated Show resolved Hide resolved
dynamo/preprocessing/pca.py Outdated Show resolved Hide resolved
@Xiaojieqiu Xiaojieqiu merged commit 8456e3b into aristoteleo:master Nov 16, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants