Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching suite experiements for proper behavior of IPlatform#wait_till_done_progress (Fix: 1653) #1654

Closed
wants to merge 1 commit into from

Conversation

ckirkman-IDM
Copy link
Contributor

@ckirkman-IDM ckirkman-IDM commented Oct 12, 2021

See #1653 for details. This has worked for me without error (proper waiting).

This fix in a new deploy will be necessary for non-development usage of emodpy-hiv scenario running (post calibration) and for proper demo'ing of the code.

@ZDu-IDM
Copy link
Collaborator

ZDu-IDM commented Oct 12, 2021

Same question as yesterday, do you have a simple code to repro this issue? Basically, I would like to learn what happens before this line:

 platform.wait_till_done_progress(item=suite)

Q: Where and how this suite gets created and how experiments are created and associated with this suite? A simple code will answer all my questions, thanks!.

@ckirkman-IDM
Copy link
Contributor Author

Same question as yesterday, do you have a simple code to repro this issue? Basically, I would like to learn what happens before this line:

 platform.wait_till_done_progress(item=suite)

Q: Where and how this suite gets created and how experiments are created and associated with this suite? A simple code will answer all my questions, thanks!.

Unfortunately I have no simple repro to provide. My code generating the error has dev dependencies and is ... a bit annoying to set up. BUT:

Any examples in idmtools-land that does the following should trigger the issue (non-waiting):

  • create a Suite()
  • create some experiments of sims in this suite
  • wait on the suite, not the experiments:
  • platform.wait_till_done_progress(item=suite)

An approximation of my code that does this:

platform = Platform('Calculon')
suite = Suite(name=suite_name)
platform.create_items(suite)
experiments = []
for each experiment_to_make:
    experiment = Experiment(name, aSimulationBuilder, parent_id=suite.id)
    platform.create_items([experiment])
    experiments.append(experiment)
platform.run_items(experiments)
platform.wait_till_done(item=suite)

@ZDu-IDM
Copy link
Collaborator

ZDu-IDM commented Oct 15, 2021

@ckirkman-IDM Thanks for the workflow logical details and this is exactly what I would like to see.

In order to call and make it to work

  platform.wait_till_done(item=suite)

user needs to make sure suite.experiments contains all the experiments they add.

The code you showed:

  experiment = Experiment(name, aSimulationBuilder, parent_id=suite.id)

doesn't add the experiment to the suite!

You may need to add one more line of code below:

  experiment = Experiment(name, aSimulationBuilder)
  experiment.suite = suite

the last line above will add the experiment to suite automatically! It should work as expected.

Please give it a try and see how it works.

@ckirkman-IDM
Copy link
Contributor Author

ckirkman-IDM commented Oct 18, 2021

Hey Zhaowei, I tried removing my one line fix and adding the experiment.suite = suite line, but I get an infinite recursion error if I do so, only my one-line fix seems to work:

  File "C:\Users\ckirkman\environments\emodpy-hiv-cleaninstall\lib\site-packages\idmtools\utils\hashing.py", line 119, in save
    Pickler.save(self, obj)
  File "c:\python37\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "c:\python37\lib\pickle.py", line 774, in save_tuple
    save(element)
  File "C:\Users\ckirkman\environments\emodpy-hiv-cleaninstall\lib\site-packages\idmtools\utils\hashing.py", line 98, in save
    elif isinstance(obj, ExperimentParentIterator):
  File "c:\python37\lib\abc.py", line 139, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded while calling a Python object

I am guessing the issue is when it tries to save the experiments, which have a suite, which has experiments, which has a suite, ... (not knowing when to stop adding parents/children or something)

@ZDu-IDM
Copy link
Collaborator

ZDu-IDM commented Oct 19, 2021

Hi @ckirkman-IDM Thanks for testing it even though it is failed at moment.

We need to debug to figure out where/how it happened (anything wrong with idmtools or user code) when we have time later...

For now, we can add experiment manually like following,

experiment = Experiment(name, SimulationBuilder, parent_id=suite.id)
suite.experiments.append(experiment)

It should work, please give it a try, thanks!

@shchen-idmod
Copy link
Collaborator

Can you change to merge to dev not master?

@shchen-idmod
Copy link
Collaborator

shchen-idmod commented Oct 19, 2021

I do not think you should add this line to there. it is not going to help for suite waiting on experiments to finish which already did in next line of code.

I tested my code like this:

suite = Suite(name='Idm Suite')
platform.create_items(suite)
# add experiment to suite
suite.add_experiment(exp)
suite.add_experiment(exp2)
# you can also add the experiment to a suite like this:
# exp.suite = suite
# exp2.suite = suite

# run suite
platform.run_items(suite)  <-- this will call platform.run_items(experiments) for each experiment
platform.wait_till_done(item=suite)

If I understand correctly, one should never be able to add an experiment to a suite in a valid way if the experiment is complete (there should be an error somewhere during commissioning).

@ckirkman-IDM
Copy link
Contributor Author

Hi @ckirkman-IDM Thanks for testing it even though it is failed at moment.

We need to debug to figure out where/how it happened (anything wrong with idmtools or user code) when we have time later...

For now, we can add experiment manually like following,

experiment = Experiment(name, SimulationBuilder, parent_id=suite.id)
suite.experiments.append(experiment)

It should work, please give it a try, thanks!

This works without my proposed platform change. However, I do think iplatform should understand and handle cases where the full objects of experiments (children) might not have been updated from current state on the actual platform. If not, then the code does not behave as expected and has a too-loosely defined user API for how to build and wait on things properly.

@ckirkman-IDM
Copy link
Contributor Author

Can you change to merge to dev not master?

I'll see if the comments can be moved to the attached issue, and once done, I will close the PR. When something emerges from the issue, we can re-PR to dev.

@ckirkman-IDM
Copy link
Contributor Author

I do not think you should add this line to there. it is not going to help for suite waiting on experiments to finish which already did in next line of code.

I tested my code like this:

suite = Suite(name='Idm Suite')
platform.create_items(suite)
# add experiment to suite
suite.add_experiment(exp)
suite.add_experiment(exp2)
# you can also add the experiment to a suite like this:
# exp.suite = suite
# exp2.suite = suite

# run suite
platform.run_items(suite)  <-- this will call platform.run_items(experiments) for each experiment
platform.wait_till_done(item=suite)

If I understand correctly, one should never be able to add an experiment to a suite in a valid way if the experiment is complete (there should be an error somewhere during commissioning).

Wow, how I ended up editing your comment instead of reply-to... interesting, but restored. Now for my comment:

I don't see how this should be a limitation unless experiments belonging to suites are inherently run-based (maybe a user wants to reorganize experiments post-run, but the runs are still good). Even then, shouldn't the code just commission 0 sims if you add a completed exp to a suite? Or is this a comps limitation percolating back to idmtools?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants