51 reduce memory burden of pipeline #95

geoffwoollard · 2024-09-11T00:59:12Z

I incorporated a test validating the numerical identity of the output, so I am confident there are no errors on that side.

Before there was batch normalization and masking for all the gt maps. Since only one map is loaded in at a time, this is done on-the-fly inside the local distance computation.

DSilva27

I would change how the low memory stuff works, doing things in a for loop seems really unnecessary. Here is what I would do

DSilva27 · 2024-09-16T18:37:34Z

The metrics should not care if the input is low memory or not, they should only care about volumes1, volumes2, and some arguments if necessary. The whole low memory stuff should come at the pipeline level. Here is what I would do

Implement functions that normalize volumes the way you want it, they should take a stack of volumes or a single volumes. Don't do this things inside an if-statement.
Load a chunk of the gt volumes, run all the preprocessing you need, and then get the submatrix of the full distance matrix for that chunk of volumes. This should use the distance matrices you already have implemented. Populate the full distance matrix on-the-fly. This way you can do a vmap to compute a submatrix, and then a simple stack operation to get the full matrix.

I would also create a low_memory_pipeline and a regular memory pipeline. All those if statements make the code very messy, and I think it'd be easy to miss something being wrong.

geoffwoollard · 2024-09-18T00:40:41Z

Yes, good point! I forgot that the getitem takes care of indexing into sub-batches

I did the masking and normalization in the subbatch. This prevents a duplication of reading it into memory (once for normalization / masking, another right before doing the m2m distance call)

There is still some if low memory mode control flow. To reduce code duplication.

geoffwoollard · 2024-09-18T00:59:31Z

Also, now the gt maps should be a .npy file, not a .pt file.

I hope that torch.from_numpy(np.load(fname.npy)) is not a lot slower than torch.load(fname.pt) for the big 160 GB file.

geoffwoollard added 8 commits September 3, 2024 20:16

low memory pasisng test

356696c

tests passing for low memory mode t and f

5e3591f

l2, corr, bioem working

7c8e247

fsc low memory working

14805d7

tests passing checking identical output

6f5e09e

all metrics implemented and test passing for matched results

a7290aa

all metrics implemented and test passing for matched results

22ec5a5

all metrics implemented and test passing for matched results

de3cddc

geoffwoollard linked an issue Sep 11, 2024 that may be closed by this pull request

reduce memory burden of pipeline #51

Open

geoffwoollard requested a review from DSilva27 September 11, 2024 00:59

geoffwoollard self-assigned this Sep 11, 2024

geoffwoollard added the enhancement New feature or request label Sep 11, 2024

geoffwoollard changed the base branch from main to dev September 11, 2024 00:59

geoffwoollard added 2 commits September 10, 2024 21:10

flags for masking and not masking

f20344e

tests for masking and normalization

6def1bf

geoffwoollard requested a review from DavidHerreros September 11, 2024 01:16

DavidHerreros approved these changes Sep 13, 2024

View reviewed changes

DSilva27 requested changes Sep 16, 2024

View reviewed changes

geoffwoollard added 11 commits September 17, 2024 14:16

code duplication for norm

c0f2f83

tests passing. vmap over sub batch

f1f8cd8

tests passing. get_sub_distance_matrix

61608cb

tests passing with hard coded instantiation

1440d84

map_to_map_distance.distance_matrix_precomputation

555a39f

chunk_size_low_memory

b71e52b

on the fly sub batch normalization

76fdb46

normalization and masking on the fly in sub batch

0a2efc0

tests passing

42525f4

tests passing for low memory off

e472372

tests passing. time to delete separate low memory

3077f6a

geoffwoollard added 2 commits September 17, 2024 20:30

all tests passing

cb7085e

remove low memory versions

1f4057e

update configs to remove low memory metrics

b33a6a0

DSilva27 approved these changes Sep 18, 2024

View reviewed changes

DSilva27 merged commit faf7e29 into dev Sep 18, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

51 reduce memory burden of pipeline #95

51 reduce memory burden of pipeline #95

geoffwoollard commented Sep 11, 2024

DSilva27 left a comment

DSilva27 commented Sep 16, 2024 •

edited

Loading

geoffwoollard commented Sep 18, 2024

geoffwoollard commented Sep 18, 2024

51 reduce memory burden of pipeline #95

51 reduce memory burden of pipeline #95

Conversation

geoffwoollard commented Sep 11, 2024

DSilva27 left a comment

Choose a reason for hiding this comment

DSilva27 commented Sep 16, 2024 • edited Loading

geoffwoollard commented Sep 18, 2024

geoffwoollard commented Sep 18, 2024

DSilva27 commented Sep 16, 2024 •

edited

Loading