Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditional variance #712

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

LeonStadelmann
Copy link
Collaborator

@LeonStadelmann LeonStadelmann commented Jun 11, 2024

Added function compute_variance() as well as a test, similar to compute_entropy().

@MUCDK MUCDK self-requested a review June 14, 2024 06:52
latent_space_selection:
Key or Keys which specifies the latent or feature space used for computing the conditional variance.
A single key has to be a latent space in :attr:`~anndata.AnnData.obsm` or
a gene in :attr:`~anndata.AnnData.var_names`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a feature in ..., because we might also store proteins/ATAC, etc.

Key or Keys which specifies the latent or feature space used for computing the conditional variance.
A single key has to be a latent space in :attr:`~anndata.AnnData.obsm` or
a gene in :attr:`~anndata.AnnData.var_names`.
A set of keys has to be a subset of genes in :attr:`~anndata.AnnData.var_names`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hinting doesn't say set, but list.

source: K,
target: K,
forward: bool = True,
latent_space_selection: Union[str, list[str]] = "X_pca",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not call it latent space. can also be raw space, e.g. gene space. Also, the types are not clear

mask = [var_name in latent_space_selection for var_name in self.adata.var_names]
latent_space = self.adata[:, mask].X.toarray()
else:
raise KeyError("Unknown latent space selection.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have a key error, we want to print what the wrong key is.

Comment on lines +762 to +776
filter_value = source if forward else target
opposite_filter_value = target if forward else source

if isinstance(latent_space_selection, str):
if latent_space_selection in self.adata.obsm:
latent_space = self.adata.obsm[latent_space_selection]
elif latent_space_selection in self.adata.var_names:
latent_space = self.adata[:, latent_space_selection in self.adata.var_names].X.toarray()
else:
raise KeyError("Gene/Latent space not found.")
elif type(latent_space_selection) in [list, np.ndarray]:
mask = [var_name in latent_space_selection for var_name in self.adata.var_names]
latent_space = self.adata[:, mask].X.toarray()
else:
raise KeyError("Unknown latent space selection.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a function (within the function)

)

cond_var = []
for i in range(cond_dists.shape[1]): # type: ignore[union-attr]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we vectorize this?


batch_size = batch_size if batch_size is not None else len(df)
func = self.push if forward else self.pull
for batch in range(0, len(df), batch_size):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we actually do this ? :)

batch_size=batch_size,
)
if key_added is None:
assert isinstance(out, pd.DataFrame)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for some properties, e.g. no NaN, non-negativity

@pytest.mark.parametrize("key_added", [None, "test"])
@pytest.mark.parametrize("batch_size", [None, 2])
@pytest.mark.parametrize("latent_space_selection", ["X_pca", "KLF12", ["KLF12", "Dlip3", "Dref"]])
def test_compute_variance_pipeline(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check for raise Error with wrong attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants