Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How do I correctly manually reset the episode on a rollout_end? #1961

Open
4 tasks done
npit opened this issue Jul 4, 2024 · 2 comments
Open
4 tasks done
Labels
question Further information is requested

Comments

@npit
Copy link
Contributor

npit commented Jul 4, 2024

❓ Question

Hello,

I am modifying an environment on selected training milestones, on the end of rollouts.
After these modifications I want any episode cut short when the rollout ended to be flushed, and the next rollout to begin with a fresh new episode and a resetted environment.

I'm assuming I just need to modify the self._last_xxx variables of the model, since the rollout collection will start anew.
Is that correct? Will the following callback func suffice?

    def on_rollout_end(self):
        self.model._last_obs = self.training_env.reset()
        # replicate initializations from BaseAlgorithm._setup_learn
        self.model._last_episode_starts = np.ones((self.training_env.num_envs,), dtype=bool)
        if self.model._vec_normalize_env is not None:
            self.model._last_original_obs = self._vec_normalize_env.get_original_obs()

Do you have any recommendations?
Thanks!

Checklist

@npit npit added the question Further information is requested label Jul 4, 2024
@qgallouedec
Copy link
Collaborator

In your environment, is the number of timespteps per episode fixed?

@npit
Copy link
Contributor Author

npit commented Jul 8, 2024

In your environment, is the number of timespteps per episode fixed?

It is not. There is a max number of iterations per episode, but termination is conditional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants