Replies: 1 comment 1 reply
-
Hi @BradGreig, These are great ideas. In the end, the only real way to optimize both for small and large runs (21cmMC and pure 21cmFAST respectively) is to have manual flags (and probably in the C-code). I'd be very happy for such flags to be included wherever they're useful, and to save memory from the C side wherever appropriate. As for #220, it of course won't be so granular that I can do what you're suggesting, though I fundamentally agree that that would be a great idea. We could slate it for v4. One idea would be to keep the current overall structure in terms of the "steps" involved, but within the initial conditions for example, have two or three calls to C functions, giving a chance to free/allocate memory from Python in between. Another very real option is to move all freeing/allocation to C. This is now a possibility (since the halo field stuff). I don't think it has any real drawbacks from the Python side, and means the allocation can be fully optimized within C. Even with that, I'd fully support a granularization of the C functions :-) |
Beta Was this translation helpful? Give feedback.
-
Hey @steven-murray, I have been thinking about memory usage quite a lot these last couple of days, and have a question (this may be something for future releases though). How granular will what you are working on in #220 be? Primarily I am thinking about minimising memory for large
DIM
/HII_DIM
runs. I'm curious about whether we can decompose functions into smaller pieces. For example, in the initial conditions we are keeping high-res versions of density, 3 velocity and 3 2LPT boxes. This is on the python side, with another 8 boxes in C.Would it be possible to do just the density, dump everything to file, then bring back what we need for the velocities, dump everything again, and then do the 2LPT stuff.
Another example would be for the ionisation box with mini-halos. If we could separate out the calculation per radius, we would only need to keep the current (and previous) Fcoll's for the specific radius in memory (not for all filtering scales). Clearly, this would significantly reduce the memory usage on any run.
However, I suspect that this is a future thing as it'll break the API in a significant way.
In the meantime, I have created another branch in which I will try and minimise some memory. I'm thinking of having a flag to minimise memory usage. It may cost a little in performance (possibly not, it will need to be tested) however it'll reduce the overall memory. This current branch wouldn't break any API or interfere with what you have in the coming PR (as it'll be within the C code only).
An example of the usefulness of this is that in the spin temperature calculation we keep ~ 40 boxes for the filtered density, however, we could just do 40 extra FFTs and reduce this memory (since FFTs aren't too expensive with wisdom's anyway). I did it this way for MCMC'ing where memory isn't an issue for small boxes, but problematic when you go to large
DIM
/HII_DIM
as you'll fill the memory earlier than one might need. Also, in the initial conditions, we have 6 components for the 2LPT calculation, but you only need 3 at any one time. Further, we could store these temporarily inhires_vx_2LPT
etc. from the Python class, and not have any of these 2LPT component boxes in C.Beta Was this translation helpful? Give feedback.
All reactions