Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Parallel mode for monte-carlo simulations #619

Open
wants to merge 69 commits into
base: develop
Choose a base branch
from

Conversation

brunosorban
Copy link
Collaborator

@brunosorban brunosorban commented Jun 9, 2024

This pull request implements the option to run simulations in parallel to the MonteCarlo class. The feature is using a context manager named MonteCarloManager to centralize all workers and shared objects, ensuring proper termination of the sub-processes.

A second feature is the possibility to export (close to) all simulation inputs and outputs to an .h5 file. The file can be visualized via HDF View (or similar) software. Since it's a not so conventional file, method to read and a structure to post-process multiple simulations was also added under rocketpy/stochastic/post_processing. There's a cache handling the data manipulation where a 3D numpy array is returned with all simulations, the shape corresponds to (simulation_index, time_index, column). column is reserved for vector data, where x,y and z, for example, may be available under the same data. For example, under cache.read_inputs('motors/thrust_source') time and thrust will be found.

Pull request type

  • Code changes (bugfix, features)

Checklist

  • Tests for the changes have been added (if needed)
  • Docs have been reviewed and added / updated
  • Lint (black rocketpy/ tests/) has passed locally
  • All tests (pytest tests -m slow --runslow) have passed locally
  • CHANGELOG.md has been updated (if relevant)

Current behavior

In the current moment, montecarlo simulations must run in parallel and all outputs a txt file

New behavior

The montecarlo simulations may now be executed in parallel and all outputs may be exported to a txt or an h5 file, saving some key data or everything.

Breaking change

  • Yes
  • No

Additional information

None

@brunosorban brunosorban requested a review from phmbressan June 9, 2024 13:27
@brunosorban brunosorban requested a review from a team as a code owner June 9, 2024 13:27
@brunosorban brunosorban changed the title Parallel mode for monte-carlo simulations ENH: Parallel mode for monte-carlo simulations Jun 9, 2024
@brunosorban
Copy link
Collaborator Author

brunosorban commented Jun 9, 2024

Benchmark of the results. A machine with 6 cores(12 threads) was used.

workers_performance

Copy link
Collaborator

@phmbressan phmbressan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing feature, as the results show the MonteCarlo class has great potential for parallelization.

The only blocking issue I see with this PR is the serialization code. It still does not support all of rocketpy features and requires a lot of maintanance and updates on our end.

Do you see any other option for performing the serialization of inputs?

@Gui-FernandesBR
Copy link
Member

Amazing feature, as the results show the MonteCarlo class has great potential for parallelization.

The only blocking issue I see with this PR is the serialization code. It still does not support all of rocketpy features and requires a lot of maintanance and updates on our end.

Do you see any other option for performing the serialization of inputs?

@phmbressan we should make all the classes json serializable, it's an open issue at #522 . In the meantime, maybe we could still use the _encoders module to serialize inputs.

I agree with you that implementing flight class serialization within this PR may conflict create maintenance issues for us. The simplest solution would be to delete the flightv1_serializer (and similar) function.

Copy link
Member

@Gui-FernandesBR Gui-FernandesBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phmbressan really good modifications to this PR. Great work.

Before merging, please run 1000 simulations so the example becomes better illustrated on the documentation, please.

rocketpy/simulation/monte_carlo.py Show resolved Hide resolved
rocketpy/simulation/monte_carlo.py Outdated Show resolved Hide resolved
rocketpy/simulation/monte_carlo.py Outdated Show resolved Hide resolved
rocketpy/simulation/monte_carlo.py Outdated Show resolved Hide resolved
rocketpy/simulation/monte_carlo.py Outdated Show resolved Hide resolved
rocketpy/simulation/monte_carlo.py Outdated Show resolved Hide resolved
@phmbressan
Copy link
Collaborator

phmbressan commented Aug 23, 2024

I have pushed a fix for the issue on file writing when running on Windows (more accurately on processes spawn mode). I have tested it on a Windows machine and it was running correctly, but I invite reviewers to test also in different OS configs.

Issues solved by this PR:

  • MonteCarlo simulations have a parallel mode;
  • Both the simulation execution and data saving are executed in parallel (producer - consumer);
  • There are performance gains on large simulations;
  • The serial simulations can be executed in the same fasion and the outputs of both ways are compatible.

Points of Improvement:

  • Soft Interrupts of parallel simulations (e.g. an exception or Ctrl-C) are only effective on Linux. Spawned processes (Windows) currently are hard stopping.
  • On Windows, the Jupyter notebook will not show the status update prints (running the simulations in a terminal is fine). This seems to be a OS level std output change that is not easily solved.

Some of these points could become issues of the repository. Stating them here for proper PR documentation.

Future Considerations:

  • Python 3.14 and forward will make the spawn the default start method for all OS. We could change RocketPy start method stay as fork on Linux if this undermines too much the performance;
  • The Python GIL should be removed some years from now (PEP703), this could bring performance benefits, since Threads are generally faster to start.

@Gui-FernandesBR
Copy link
Member

@phmbressan I like the way this PR was refactored. Many thanks for your effort.

Please fix the pylint errors and solve all the open conversations in this PR so we can approve and merge it onto develop!

Optionally, try to rebase the PR to get the latest commits from develop.

Comment on lines 292 to 296
if n_workers is None or n_workers > os.cpu_count():
n_workers = os.cpu_count()

if n_workers < 2:
raise ValueError("Number of workers must be at least 2 for parallel mode.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should print the number of workers being used with _SimMonitor.reprint here.


sim_consumer.start()

for seed in seeds:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but I think the consumer should start after the producers

Comment on lines 332 to 337
)
processes.append(sim_producer)

for sim_producer in processes:
sim_producer.start()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra for loop here

Suggested change
)
processes.append(sim_producer)
for sim_producer in processes:
sim_producer.start()
)
processes.append(sim_producer)
sim_producer.start()

Comment on lines 385 to 391
while sim_monitor.keep_simulating():
sim_idx = sim_monitor.increment() - 1

self.environment._set_stochastic(seed)
self.rocket._set_stochastic(seed)
self.flight._set_stochastic(seed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every single iteration needs to be re-seeded?

If this was done before the while loop, wouldn't it be enough?

@@ -253,114 +491,52 @@ def __run_single_simulation(self, input_file, output_file):
]
for item in d.items()
)
inputs_dict["idx"] = sim_idx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inputs_dict["idx"] = sim_idx
inputs_dict["index"] = sim_idx

For clarity on the files

Comment on lines 496 to 499
outputs_dict = {
export_item: getattr(monte_carlo_flight, export_item)
for export_item in self.export_list
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outputs_dict = {
export_item: getattr(monte_carlo_flight, export_item)
for export_item in self.export_list
}
outputs_dict = {
export_item: getattr(monte_carlo_flight, export_item)
for export_item in self.export_list
}
outputs_dict["index"] = sim_idx

Really useful to have index on both input and output

@Gui-FernandesBR
Copy link
Member

Converted to draft until you solve the remaining issues, specially the random number generation problem,
@phmbressan

@Gui-FernandesBR Gui-FernandesBR linked an issue Dec 8, 2024 that may be closed by this pull request
17 tasks
@phmbressan phmbressan force-pushed the enh/parallel_montecarlo branch from 53ba8ed to 00d9d02 Compare December 16, 2024 21:17
@Lucas-Prates
Copy link
Contributor

I believe this PR is ready again for another round of review. These are the changes since the previous review:

  1. @phmbressan has done some great work simplifying and optimizing even further the parallel structure, and a sim_consumer process is no longer needed;
  2. @phmbressan and I fixed the random number generator bug. The solution consisted in resetting all stochastic structures inside the StochasticRocket and their position. The simplest solution we found, without changing things that go directly to either Rocket and Flight, is implemented in the methods _set_stochastic and __reset_components of StochasticRocket, so please take a closer look at both;
  3. a very very minor fix in some of the methods of Components, just make sure that they make sense.

Overall, it seems that the time per iteration is even faster now, at least by my local measurements. @phmbressan might want to complement the information provided here, he knows this PR much better than I do!

Please, make sure to take a careful look at the Monte Carlo .input file to check that there is indeed no dependency on the generated random variables.

@Lucas-Prates Lucas-Prates marked this pull request as ready for review December 18, 2024 13:17
@Lucas-Prates
Copy link
Contributor

Lucas-Prates commented Dec 18, 2024

Another important issue: I currently can not interrupt the MonteCarlo.simulate method smoothly when it is run in parallel, all attempts lead to killing the notebook 😨 ! Would be great to check if the same is happening in your own machines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a basic class, can we add unit tests to cover the modified lines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks odd to have this file in the git tree... I think we should rebase the branch to develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request, including adjustments in current codes Monte Carlo Monte Carlo and related contents
Projects
Status: Next Version
Development

Successfully merging this pull request may close these issues.

ENH: Monte Carlo Analysis Enhancements
5 participants