Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull/push in a batch-wise fashion #592

Open
Marius1311 opened this issue Aug 11, 2023 · 3 comments
Open

Pull/push in a batch-wise fashion #592

Marius1311 opened this issue Aug 11, 2023 · 3 comments
Assignees

Comments

@Marius1311
Copy link
Collaborator

Marius1311 commented Aug 11, 2023

Hi there! For a simple TemporalProblem, I've held out some genes (from the embedding computation, simple PCA) and computed the coupling. I would now like to use the coupling to predict expression values of the held out genes (at either t_1 or t_2, both possible), as a means of validation. However, when calling tp.push(source=8.0, target=8.5, data=gexp_sc, scale_by_marginals=True), where gexp_sc is the gene expression matrix of held-out genes on the source cells, my kernel dies. I assume that's because the matrix multiplication is carried out using a dense formulation, all at once. Is it somehow possible to do this in a batch-wise fashion, i.e. by only loading small chunks of the coupling into memory at once?

@giovp
Copy link
Member

giovp commented Aug 11, 2023

hi @Marius1311 ! I think #559 is related and there are some possible solutions, let us know if it works!

@Marius1311
Copy link
Collaborator Author

Great, thanks @giovp! I guess this is also related to #569.

A solution that works for me is specifying the batch_size=x in the problem's solve method, even though that's not actually required to solve the problem as it's quite small. However, that seems to imply that downstream computations are also batched, I can run

out = tp.push(source=8.0, target=8.5, data=gexp_src, scale_by_marginals=True, return_all=True, key_added=None)

now fine without any issues. However, this is a bit clumsy, as it requires me to solve the problem in a (slower) batch-wise fashion, even though I could solve it in offline mode. Thus, I think it would be nice to decouple the two batch_sizes, to allow a problem to be solved using some batch size, and to use pull/push downstream with another batch size.

@Marius1311
Copy link
Collaborator Author

Marius1311 commented Aug 15, 2023

sorry, partly unrelated - if I want to impute gene expression at the target using the source, would I have to use scale_by_marginals? Intuitively, I would say no, as all I want is Y = P^T X, where P is the coupling, X is known gene expression in the source, and Y is my unknown gene expression in the target. So I just want this matrix multiplication, with no additional scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants