Pull/push in a batch-wise fashion #592

Marius1311 · 2023-08-11T12:06:44Z

Hi there! For a simple TemporalProblem, I've held out some genes (from the embedding computation, simple PCA) and computed the coupling. I would now like to use the coupling to predict expression values of the held out genes (at either t_1 or t_2, both possible), as a means of validation. However, when calling tp.push(source=8.0, target=8.5, data=gexp_sc, scale_by_marginals=True), where gexp_sc is the gene expression matrix of held-out genes on the source cells, my kernel dies. I assume that's because the matrix multiplication is carried out using a dense formulation, all at once. Is it somehow possible to do this in a batch-wise fashion, i.e. by only loading small chunks of the coupling into memory at once?

The text was updated successfully, but these errors were encountered:

giovp · 2023-08-11T15:19:58Z

hi @Marius1311 ! I think #559 is related and there are some possible solutions, let us know if it works!

Marius1311 · 2023-08-15T11:55:26Z

Great, thanks @giovp! I guess this is also related to #569.

A solution that works for me is specifying the batch_size=x in the problem's solve method, even though that's not actually required to solve the problem as it's quite small. However, that seems to imply that downstream computations are also batched, I can run

out = tp.push(source=8.0, target=8.5, data=gexp_src, scale_by_marginals=True, return_all=True, key_added=None)

now fine without any issues. However, this is a bit clumsy, as it requires me to solve the problem in a (slower) batch-wise fashion, even though I could solve it in offline mode. Thus, I think it would be nice to decouple the two batch_sizes, to allow a problem to be solved using some batch size, and to use pull/push downstream with another batch size.

Marius1311 · 2023-08-15T11:58:56Z

sorry, partly unrelated - if I want to impute gene expression at the target using the source, would I have to use scale_by_marginals? Intuitively, I would say no, as all I want is Y = P^T X, where P is the coupling, X is known gene expression in the source, and Y is my unknown gene expression in the target. So I just want this matrix multiplication, with no additional scaling.

Marius1311 assigned michalk8 and MUCDK Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull/push in a batch-wise fashion #592

Pull/push in a batch-wise fashion #592

Marius1311 commented Aug 11, 2023 •

edited

Loading

giovp commented Aug 11, 2023 •

edited

Loading

Marius1311 commented Aug 15, 2023

Marius1311 commented Aug 15, 2023 •

edited

Loading

Pull/push in a batch-wise fashion #592

Pull/push in a batch-wise fashion #592

Comments

Marius1311 commented Aug 11, 2023 • edited Loading

giovp commented Aug 11, 2023 • edited Loading

Marius1311 commented Aug 15, 2023

Marius1311 commented Aug 15, 2023 • edited Loading

Marius1311 commented Aug 11, 2023 •

edited

Loading

giovp commented Aug 11, 2023 •

edited

Loading

Marius1311 commented Aug 15, 2023 •

edited

Loading