Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory footprint optimization #18

Open
MattF-NSIDC opened this issue Sep 29, 2023 · 5 comments
Open

Memory footprint optimization #18

MattF-NSIDC opened this issue Sep 29, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@MattF-NSIDC
Copy link
Member

Currently, a >6GB numpy array is allocated, and if you don't have enough free memory, you're in trouble ;)

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 6.32 GiB for an array with shape (332, 316, 8089) and data type int64
@MattF-NSIDC MattF-NSIDC added the enhancement New feature or request label Sep 29, 2023
@MattF-NSIDC
Copy link
Member Author

MattF-NSIDC commented Sep 29, 2023

Oof! Even after allocating 8GB, I can't build the database. At the end, when the progress bar shows:

100.0% 8089 of 8089

I get:

Killed

And in dmesg:

[ 2710.204256] Out of memory: Killed process 9393 (python) total-vm:13642760kB, anon-rss:7756332kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtab
les:19420kB oom_score_adj:0                                            

I don't know yet how much memory I will need, but our private cloud system is fairly limited on memory. We need to make sure our system admins are OK with this, or we need to optimize. Or, if the really large memory footprint is only a problem when initializing the database, we can pre-seed that file after generating it on a personal machine.

@MattF-NSIDC
Copy link
Member Author

MattF-NSIDC commented Sep 29, 2023

v3_1979-present_raw.pickle written.

Woo!

I succeeded with 12GB of memory and 2GB of swap. Heavy swapping was going on during the write :) I think I'd suggest 16GB to avoid swapping to disk.

@MattF-NSIDC
Copy link
Member Author

MattF-NSIDC commented Sep 29, 2023

The database update process also requires around 12GB.

EDIT: The database update process fails (oom killer 🔪 ) with 12GB. 😭

@MattF-NSIDC
Copy link
Member Author

MattF-NSIDC commented Sep 29, 2023

It peaks around 13.1 GB when creating the pickle database file. 16GB is cutting it close, but should last a couple years at least. I haven't done the math.

@MattF-NSIDC
Copy link
Member Author

16GB isn't enough to run update_data.py. I don't think we can get more than that. I'm going to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant