Performance losses when using delay start since version 2.4 #1340
Replies: 4 comments 5 replies
-
Thank you for this detailed Issue report, @julien-temple. This is certainly something to look into! The major difference between 2.3.0 and 2.4.1 is the move from netcdf to zarr output. While that seems not directly related to delayed-start; it might be good to confirm that it has nothing to do with this change. Could you try run a simulation without output (so no |
Beta Was this translation helpful? Give feedback.
-
I have an idea. I would guess that once you launch your particles, they live forever and do not die, and you keep launching particles for the entire duration of your run. I also wonder if the first particle release contains a large number of particles? In the parcels output, the data is stored at (trajectory, observation) pair, and the first observation written is from the time of particle release. Zarr is written as a series of files, each containing some of the data -- a chunk. The size of the chunk in trajectory is by default the number of particles started in the first particle release. The size of the chunk in observations is a choice you can make. if you are doing this, the number of particles in your run is increasing as the square of the run-time. If your output is as I described, the size of the chunk in trajectories will be small, and if you keep releasing particles regularly, you might have to update all chunks at all times, and the IO time will increase as the square of the run length -- which is unfortunate. If this is true I would expect the speed to decrease rapidly (quadratically) as the particle tracking run continues. If I am right, you might want to manually decrease the chunk size in trajectory and increase it in observation. You can find how to do this at issue #1316. Also, it could help significantly if you have the particles die after a fixed age, if you don't need their tracks to extend forever. If you load the output into zarr, what does it report the chunk size to be? Try something like I think a careful reading of the discussion at #1316 might be useful. Jamie |
Beta Was this translation helpful? Give feedback.
-
@julien-temple Ok, a (1,1) chunk size will be the worst possible choice. Essentially, at the end of the run you have to open about 4 million files on each output step. (1000x1000 by 4 variables). (1,1000) is not a great a choice (though it is much better), since since you will effectively have to open all chunks of all data for each time step. Again, I would wonder if choosing initial particle releases and observation chunking so you get something more rounded like (10,100) or (100,10) might not be much better. You can always throw away some of the initial particles. An extra 100 particles will not do anything tragic, and will help your run time. @erikvansebille I am sorry I have not had time to dive into the coding. Life has gotten in the way. I hope to get you the other think I promised soon. But I think this suggests that the figuring out why the trajectory chunk size cannot be set to an arbitrary value is important. This also becomes an issue when running on a system with many cores, so that the initial release size/core can be small. I might have time to look at the end of next week. |
Beta Was this translation helpful? Give feedback.
-
Thank you @erikvansebille and @JamiePringle for all your responses. Thanks again Julien |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am using parcels to simulate the dispersal of juvenile sea turtles. I allow myself to create this discussion to report important performance losses I am facing since the 2.4 update :
I have used Parcels 2.3.0 with python 3.6 for several months now. To give an idea of the performance, a typical run of 1000 days with 5000 particles (no parallel computation) usually takes around 20 min.
However, after the update to version 2.4.1 (python 3.11), the identical run is very much slower with an estimated time of more than 24 hours... (estimated time was actually not converging and I needed to stop the process to prevent my computer from freezing).
After several experiments (changing parcels and python version, modifying kernels used and particle class), I figured out the problem occurs when using delay start (particle starts are spread over 4 months). This also gives the following warning (several times during the run) :
<array_function internals>:200: RuntimeWarning: invalid value encountered in cast
If I make the all particle start in the same time, the performance seems back to normal, and the previous warning is no longer displayed.
So I was hoping someone already faced the same situation and may have any idea to fix this performance issue.
Regards,
Julien
Beta Was this translation helpful? Give feedback.
All reactions