51 reduce memory burden of pipeline #56

DSilva27 · 2024-07-17T17:35:17Z

Work in progress. Currently, I have improved the way data is loaded for the SVD pipeline. We still need to fix the memory usage for the map_to_map pipeline. The solution seems to be implement an iterative approach using PyTorch DataLoaders and computing the matrix little by little.

The solution implemented requires the maps to be in .npy format, as torch does not have a memory map method for loading .pt files. I found this to be a better option to saving the maps as a batch.

update gitignore

…be npy

geoffwoollard · 2024-08-05T13:28:09Z

I'm getting some import error
https://github.com/flatironinstitute/Cryo-EM-Heterogeneity-Challenge-1/actions/runs/10249495802/job/28353002463

Run pytest tests/test_preprocessing.py
============================= test session starts ==============================
platform linux -- Python 3.10.[14](https://github.com/flatironinstitute/Cryo-EM-Heterogeneity-Challenge-1/actions/runs/10249495802/job/28353002463#step:7:15), pytest-8.3.2, pluggy-1.5.0
rootdir: /home/runner/work/Cryo-EM-Heterogeneity-Challenge-1/Cryo-EM-Heterogeneity-Challenge-1
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 0 items / 1 error

==================================== ERRORS ====================================
_________________ ERROR collecting tests/test_preprocessing.py _________________
ImportError while importing test module '/home/runner/work/Cryo-EM-Heterogeneity-Challenge-1/Cryo-EM-Heterogeneity-Challenge-1/tests/test_preprocessing.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_preprocessing.py:2: in <module>
    from cryo_challenge._commands import run_preprocessing
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/cryo_challenge/_commands/run_preprocessing.py:7: in <module>
    from .._preprocessing.preprocessing_pipeline import preprocess_submissions
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/cryo_challenge/_preprocessing/__init__.py:1: in <module>
    from .preprocessing_pipeline import preprocess_submissions as preprocess_submissions
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/cryo_challenge/_preprocessing/preprocessing_pipeline.py:6: in <module>
    from .align_utils import align_submission, center_submission, threshold_submissions
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/cryo_challenge/_preprocessing/align_utils.py:4: in <module>
    from aspire.volume import Volume
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/volume/__init__.py:1: in <module>
    from .symmetry_groups import (
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/volume/symmetry_groups.py:6: in <module>
    from aspire.utils import Rotation
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/utils/__init__.py:71: in <module>
    from .resolution_estimation import FourierRingCorrelation, FourierShellCorrelation
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/utils/resolution_estimation.py:12: in <module>
    from aspire.numeric import fft
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/numeric/__init__.py:5: in <module>
    from .complex_pca.complex_pca import ComplexPCA
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/numeric/complex_pca/complex_pca.py:18: in <module>
    from .validation import check_array
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/aspire/numeric/complex_pca/validation.py:18: in <module>
    from numpy.core.numeric import ComplexWarning
E   ImportError: cannot import name 'ComplexWarning' from 'numpy.core.numeric' (/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/core/numeric.py)
------------------------------- Captured stdout --------------------------------
2024-08-05 13:[16](https://github.com/flatironinstitute/Cryo-EM-Heterogeneity-Challenge-1/actions/runs/10249495802/job/28353002463#step:7:17):54,925 INFO [matplotlib.font_manager] Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Can not load face (unknown file format; error code 0x2)
[20](https://github.com/flatironinstitute/Cryo-EM-Heterogeneity-Challenge-1/actions/runs/10249495802/job/28353002463#step:7:21)24-08-05 13:16:55,314 INFO [matplotlib.font_manager] generated new fontManager
=========================== short test summary info ============================
ERROR tests/test_preprocessing.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 6.65s ===============================
Error: Process completed with exit code 2.

This tests is passing locally... I'm using Python 3.8.17

geoffwoollard · 2024-08-06T13:07:43Z

@DSilva27 I suggest you rebase off of dev and then merge this in

DSilva27 added 7 commits July 16, 2024 15:35

implement memory friendly loader for svd pipeline

d494cce

Merge branch 'dev' into 51-reduce-memory-burden-of-pipeline

ba067b4

update gitignore

implement subset of volumes used for svd

5a51663

Merge branch 'dev' into 51-reduce-memory-burden-of-pipeline

f0814ee

update config file to reproduce presentation plots

49c0b00

update tutorial

4804865

update config for svd to make it obvious that reference files should …

531d3e6

…be npy

DSilva27 mentioned this pull request Jul 17, 2024

Update config_svd.yaml: path_to_reference #50

Merged

DSilva27 and others added 2 commits July 17, 2024 14:01

Merge branch 'dev' into 51-reduce-memory-burden-of-pipeline

74211bf

download numpy from osf

2b65642

geoffwoollard assigned DSilva27 Aug 5, 2024

geoffwoollard added the enhancement New feature or request label Aug 5, 2024

geoffwoollard added 5 commits August 5, 2024 09:30

only python 3.8

d928dcd

only python 3.9

4b99ef0

only python 3.10

0a6f17b

3.8,3.9,3.10,3.11' with fail fast false

8a80c4f

3.8,3.9,3.10,3.11' with fail fast false

c389cee

geoffwoollard self-requested a review August 5, 2024 14:45

geoffwoollard approved these changes Aug 5, 2024

View reviewed changes

DSilva27 deleted the branch dev August 6, 2024 20:42

DSilva27 closed this Aug 6, 2024

DSilva27 deleted the 51-reduce-memory-burden-of-pipeline branch August 6, 2024 20:50

DSilva27 restored the 51-reduce-memory-burden-of-pipeline branch August 6, 2024 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

51 reduce memory burden of pipeline #56

51 reduce memory burden of pipeline #56

DSilva27 commented Jul 17, 2024

geoffwoollard commented Aug 5, 2024 •

edited

Loading

geoffwoollard commented Aug 6, 2024

51 reduce memory burden of pipeline #56

51 reduce memory burden of pipeline #56

Conversation

DSilva27 commented Jul 17, 2024

geoffwoollard commented Aug 5, 2024 • edited Loading

geoffwoollard commented Aug 6, 2024

geoffwoollard commented Aug 5, 2024 •

edited

Loading