-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce memory burden of pipeline #51
Comments
Figured out a way to fix this issue combining the |
The big gt flat file is used in the map 2 map step. Let's see if that unit test can pass... For the distances implemented so far, things are done in a parallelizable way - so technically there only needs to be access to one map at a time for the computation. The program holds all the files in memory only to speed up the computation. |
The gt volume flat file also takes a long time to read in (15-30min) - which is inconvenient for developing - because if there are bugs after then it takes a long time to see. |
I plan to write the preprocessed submissions to .npz format instead of .pt, and then use a numpy based memmap. Therefore would resolve: #79 Links for reference |
I don't see a point in using memmap for the submission aligned .pt files. The volumes are under the 'volume' key. When this key is called, it loads all submitted maps (all indices) into memory, and can't be indexed out.
The volumes would have to be saved as a .npz flat file, which could then be indexed into. @DSilva27 made the point that they are not that big (~7 GB?), so there is not much point. |
Especially large memory burden from gt flat files (160 GB). Perhaps include a smaller version with some averaging...
Would also be good to benchmark memory useage.
The text was updated successfully, but these errors were encountered: