Performance losses when using delay start since version 2.4 #1340

julien-temple · 2023-03-23T13:54:30Z

julien-temple
Mar 23, 2023

Hello,

I am using parcels to simulate the dispersal of juvenile sea turtles. I allow myself to create this discussion to report important performance losses I am facing since the 2.4 update :

I have used Parcels 2.3.0 with python 3.6 for several months now. To give an idea of the performance, a typical run of 1000 days with 5000 particles (no parallel computation) usually takes around 20 min.

However, after the update to version 2.4.1 (python 3.11), the identical run is very much slower with an estimated time of more than 24 hours... (estimated time was actually not converging and I needed to stop the process to prevent my computer from freezing).

After several experiments (changing parcels and python version, modifying kernels used and particle class), I figured out the problem occurs when using delay start (particle starts are spread over 4 months). This also gives the following warning (several times during the run) :

<array_function internals>:200: RuntimeWarning: invalid value encountered in cast

If I make the all particle start in the same time, the performance seems back to normal, and the previous warning is no longer displayed.

So I was hoping someone already faced the same situation and may have any idea to fix this performance issue.

Regards,

Julien

erikvansebille · 2023-03-23T15:04:21Z

erikvansebille
Mar 23, 2023
Maintainer

Thank you for this detailed Issue report, @julien-temple. This is certainly something to look into!

The major difference between 2.3.0 and 2.4.1 is the move from netcdf to zarr output. While that seems not directly related to delayed-start; it might be good to confirm that it has nothing to do with this change. Could you try run a simulation without output (so no ParticleFile object) and see if that shows the same performance loss?

3 replies

julien-temple Mar 23, 2023
Author

Thank your for your reponse, @erikvansebille

I just tried what you suggested. Actually,running without output made the performance back to normal (that's encouraging but therefore I cant get my output data).
So if I understand correctly, the performance losses append during the writing of the output zarr file ?

I also tried with parcels 2.3.2 with a zarr output and the performance is normal as well. So for some reasons, the performance losses during the zarr writing is only visible in version 2.4.

erikvansebille Mar 24, 2023
Maintainer

OK, good to know that it is the zarr file then. Probably related to #1316, as @JamiePringle also mentions below

I'll add a warning to users that using repeatdt can have severe performance issues when the initial particle set length is much smaller than the number of times new particles are added. In that case, it's probably better to define all the particles at the start but assigning each particle its own start time (see also the first part of the delayed start tutorial)

julien-temple Mar 24, 2023
Author

Actually I am not using repeatdt, I am already assigning each particle its own start in runs I am describing.

JamiePringle · 2023-03-23T16:38:17Z

JamiePringle
Mar 23, 2023
Collaborator

I have an idea. I would guess that once you launch your particles, they live forever and do not die, and you keep launching particles for the entire duration of your run. I also wonder if the first particle release contains a large number of particles?

In the parcels output, the data is stored at (trajectory, observation) pair, and the first observation written is from the time of particle release. Zarr is written as a series of files, each containing some of the data -- a chunk. The size of the chunk in trajectory is by default the number of particles started in the first particle release. The size of the chunk in observations is a choice you can make.

if you are doing this, the number of particles in your run is increasing as the square of the run-time.

If your output is as I described, the size of the chunk in trajectories will be small, and if you keep releasing particles regularly, you might have to update all chunks at all times, and the IO time will increase as the square of the run length -- which is unfortunate. If this is true I would expect the speed to decrease rapidly (quadratically) as the particle tracking run continues.

If I am right, you might want to manually decrease the chunk size in trajectory and increase it in observation. You can find how to do this at issue #1316. Also, it could help significantly if you have the particles die after a fixed age, if you don't need their tracks to extend forever.

If you load the output into zarr, what does it report the chunk size to be? Try something like data=zarr.open('filename','r'); data.lon.info. You can do this even with an incomplete run.

I think a careful reading of the discussion at #1316 might be useful.

Jamie

2 replies

julien-temple Mar 24, 2023
Author

Thank you @JamiePringle, I released particles with a uniform random distribution over the first month, so basically each particle is released at a different time (rounded to hour). So the first release countains exactly only one particles.

Therefore, during the run, the chunksize is set to (1,1) :

In [40]: data.lon.info
Out[40]: 
Name               : /lon
Type               : zarr.core.Array
Data type          : float32
Shape              : (470, 12)
Chunk shape        : (1, 1)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 22560 (22.0K)
No. bytes stored   : 113311 (110.7K)
Storage ratio      : 0.2
Chunks initialized : 5640/5640

I tried to manually set the chunksize to (1, 1000). The estimated time was reduced to 2 hours (1000 days, 1000 particles), but its still far to be as good as version 2.3. I usually perform much longer simulations with higher amount of particles so I think it won't be fast enough.

If I understand correctly, the more particle are release at the beginning in the same time, the better it is.

Furthermore, I don't understand the fact that the performance losses linked to zarr writing appear in version 2.4 , whereas in 2.3.2 there was no such loss even when writing in zarr as well.

Julien

erikvansebille Mar 24, 2023
Maintainer

Furthermore, I don't understand the fact that the performance losses linked to zarr writing appear in version 2.4 , whereas in 2.3.2 there was no such loss even when writing in zarr as well.

To answer that last question: even though zarr-output was indeed also available in v2.3.2, this was still with the old ParticleFile class; which has been completely rewritten for v2.4.0. The chunk size parameter has only been introduced in v2.4.0, for example. One of the most important changes is that the temporary *.npy files are now gone and the output file is written 'on-the-fly'; which has a lot of advantages (less diskspace, ease-of-use) but now turns out to have poor performance in situations where repeatdt is used.

JamiePringle · 2023-03-24T13:38:52Z

JamiePringle
Mar 24, 2023
Collaborator

@julien-temple Ok, a (1,1) chunk size will be the worst possible choice. Essentially, at the end of the run you have to open about 4 million files on each output step. (1000x1000 by 4 variables).

(1,1000) is not a great a choice (though it is much better), since since you will effectively have to open all chunks of all data for each time step. Again, I would wonder if choosing initial particle releases and observation chunking so you get something more rounded like (10,100) or (100,10) might not be much better. You can always throw away some of the initial particles. An extra 100 particles will not do anything tragic, and will help your run time.

@erikvansebille I am sorry I have not had time to dive into the coding. Life has gotten in the way. I hope to get you the other think I promised soon. But I think this suggests that the figuring out why the trajectory chunk size cannot be set to an arbitrary value is important. This also becomes an issue when running on a system with many cores, so that the initial release size/core can be small. I might have time to look at the end of next week.

0 replies

julien-temple · 2023-03-31T12:58:30Z

julien-temple
Mar 31, 2023
Author

Thank you @erikvansebille and @JamiePringle for all your responses.
Adding more particle in the first release as possible indeed helped a lot. This also means having the maximum as possible in the chunksize trajectory component.
Performances are not as good when all particle are released in the same time, but it turned to be very acceptable.
On the contrary, regarding the observation component of the chunksize, I did not see clear very significant changes when trying different values. But following zarr documentation suggestion, I now compute it such that the one chunk account for 1megabyte of data.
(https://zarr.readthedocs.io/en/stable/tutorial.html#chunk-optimizations). I hope this will be relevant during most configuration parcels execution as well.

Thanks again

Julien

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance losses when using delay start since version 2.4 #1340

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Performance losses when using delay start since version 2.4 #1340

julien-temple Mar 23, 2023

Replies: 4 comments · 5 replies

erikvansebille Mar 23, 2023 Maintainer

julien-temple Mar 23, 2023 Author

erikvansebille Mar 24, 2023 Maintainer

julien-temple Mar 24, 2023 Author

JamiePringle Mar 23, 2023 Collaborator

julien-temple Mar 24, 2023 Author

erikvansebille Mar 24, 2023 Maintainer

JamiePringle Mar 24, 2023 Collaborator

julien-temple Mar 31, 2023 Author

julien-temple
Mar 23, 2023

Replies: 4 comments 5 replies

erikvansebille
Mar 23, 2023
Maintainer

julien-temple Mar 23, 2023
Author

erikvansebille Mar 24, 2023
Maintainer

julien-temple Mar 24, 2023
Author

JamiePringle
Mar 23, 2023
Collaborator

julien-temple Mar 24, 2023
Author

erikvansebille Mar 24, 2023
Maintainer

JamiePringle
Mar 24, 2023
Collaborator

julien-temple
Mar 31, 2023
Author