[Question] Does the environment restart from scratch when reloading model ? #1643

Florence-C · 2023-08-09T12:35:49Z

❓ Question

Hello,
I have a question about saving and loading a model, and more specifically about the environment.

When reloading a saved model from a checkpoint and continuing training, the environment restarts from the beginning. If a random seed is given to the model during training, this random seed is saved and will be used again during retraining. However, the environment restarts from this random seed, so that the same episodes are seen when training is restarted.
I also noticed that the random seed of the algorithm is reset when reloading a model. In other word, when loading a model, the randomness is reset to the initial random seed.

So, when doing a whole training at once, I don't get the same results as when I stop and restart it (I'm using PPO, so there's no replay buffer) (as in issues : 326 ).

Am I missing something? Is there a way to restart the environment from where it was when the model was saved in the control point?

I provide a minimal code example below, where Cartpole is trained first during 50k steps in once, then the training is split in two 25k steps. The training is identical during the first 25k steps, and then diverge (see image below - I display the policy gradient loss as it was very clear on the image, but the reward curve also diverges from 25k steps).

from stable_baselines3 import PPO
import gymnasium as gym
import os

def train_1() : 

	tensorboard_log = './results/debug/'

	os.makedirs(tensorboard_log, exist_ok=True)

	env = gym.make('CartPole-v1', render_mode="rgb_array")
	model = PPO('MlpPolicy', env, tensorboard_log=tensorboard_log, seed=42)
	model.learn(50000)
	model.save('./results/debug/model_train1_50k')
	

def train_2() : 

	tensorboard_log = './results/debug/'

	os.makedirs(tensorboard_log, exist_ok=True)

	env = gym.make('CartPole-v1', render_mode="rgb_array")
	model = PPO('MlpPolicy', env, tensorboard_log=tensorboard_log, seed=42)
	model.learn(25000)
	model.save('./results/debug/model_train2_25k')


def train_3() : 

	tensorboard_log = './results/debug/'
	os.makedirs(tensorboard_log, exist_ok=True)
	env = gym.make('CartPole-v1', render_mode="rgb_array")
	model = PPO.load('./results/debug/model_train2_25k', env=env)
	model.learn(25000, reset_num_timesteps=False)
	model.save('./results/debug/model_train3_50k')


if __name__ == '__main__':
	train_1()
	train_2()
	train_3()

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2023-08-20T15:19:10Z

Hello Flo =),

So, when doing a whole training at once, I don't get the same results as when I stop and restart it

I think your problem lie in the env: the cartpole env doesn't have a fixed number of timesteps per episode and you would need to seed it halfway to get the same result.

Here is an example with Pendulum (where the number of timesteps per episode is fixed, so we can stop the data collection exactly after an episode and not in the middle of it)

This is actually a duplicate of #435 (comment) I think.

(and the following might work with any env, not only the fixed number of timesteps one)

import os

import gymnasium as gym
from stable_baselines3 import PPO

# Each episode is always 200 steps with Pendulum
# we choose total_timesteps = 200 * k
# and n_steps = 200 * k
env_id = "Pendulum-v1"
seed = 42
n_timesteps_per_episode = 200
total_timesteps = 12 * n_timesteps_per_episode
kwargs = dict(
    n_steps=200,
    batch_size=100,
    n_epochs=1,
    policy_kwargs=dict(
        net_arch=[64],
    ),
)


tensorboard_log = "./results/"

os.makedirs(tensorboard_log, exist_ok=True)

def create_env():
    return gym.make(env_id, render_mode="rgb_array")


obs = create_env().reset(seed=0)[0]


def full_train():
    env = create_env()

    model = PPO(
        "MlpPolicy",
        env,
        tensorboard_log=tensorboard_log,
        seed=seed,
        **kwargs,
    )
    model.learn(total_timesteps, progress_bar=True)
    # Seed the env halfway to have the same behavior as when loading
    # a checkpoint
    model.set_env(create_env())
    model.set_random_seed(seed)
    model.learn(total_timesteps, progress_bar=True, reset_num_timesteps=False)

    print(model.predict(obs))
    print(model.predict(obs, deterministic=True))


def train_first_part():
    env = create_env()

    model = PPO(
        "MlpPolicy",
        env,
        tensorboard_log=tensorboard_log,
        seed=seed,
        **kwargs,
    )
    model.learn(total_timesteps, progress_bar=True)
    model.save("./results/checkpoint")


def train_second_part():
    env = create_env()

    model = PPO.load("./results/checkpoint")
    model.set_env(env)
    model.set_random_seed(seed)
    model.learn(total_timesteps, progress_bar=True, reset_num_timesteps=False)
    print(model.predict(obs))
    print(model.predict(obs, deterministic=True))


if __name__ == "__main__":
    full_train()
    train_first_part()
    train_second_part()

EDIT: you can take a look at #597 on why we need set_env()

PS: you can ```python to have code highlighting in markdown ;)

Florence-C · 2023-09-08T14:13:10Z

Hi Antonin :D

Thanks for the reply and sorry for the delay !

In the example you provided, the full_train is already divided in two parts. So what I understand is that one single training with 2*n timesteps (and no reseeding in the middle) cannot be exactly equivalent to two successive n-step trainings (in the case the environment is recreated between the two).
There is no way to restart the environment where the model was saved, if it was not anticipated before, am I correct ?

PS : not sure I get the importance of set_env. I thought it was a function that checks the environment, and potentially vectorizes it.

araffin · 2023-09-17T08:45:03Z

There is no way to restart the environment where the model was saved, if it was not anticipated before, am I correct ?

yes, but if you do quantitative experiments, this should not be an issue.

PS : not sure I get the importance of set_env. I thought it was a function that checks the environment, and potentially vectorizes it.

oh, my answer to that one got apparently lost...

stable-baselines3/stable_baselines3/common/base_class.py

Line 478 in 1cd6ae4

def set_env(self, env: GymEnv, force_reset: bool = True) -> None:

and

stable-baselines3/stable_baselines3/common/base_class.py

Lines 505 to 508 in 1cd6ae4

    
           # Discard `_last_obs`, this will force the env to reset before training 
        
           # See issue https://github.com/DLR-RM/stable-baselines3/issues/597 
        
           if force_reset: 
        
               self._last_obs = None

Florence-C added the question Further information is requested label Aug 9, 2023

araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Aug 9, 2023

araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Aug 20, 2023

araffin added the duplicate This issue or pull request already exists label Aug 20, 2023

araffin closed this as completed Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Does the environment restart from scratch when reloading model ? #1643

[Question] Does the environment restart from scratch when reloading model ? #1643

Florence-C commented Aug 9, 2023 •

edited by araffin

Loading

araffin commented Aug 20, 2023 •

edited

Loading

Florence-C commented Sep 8, 2023

araffin commented Sep 17, 2023 •

edited

Loading

[Question] Does the environment restart from scratch when reloading model ? #1643

[Question] Does the environment restart from scratch when reloading model ? #1643

Comments

Florence-C commented Aug 9, 2023 • edited by araffin Loading

❓ Question

Checklist

araffin commented Aug 20, 2023 • edited Loading

Florence-C commented Sep 8, 2023

araffin commented Sep 17, 2023 • edited Loading

Florence-C commented Aug 9, 2023 •

edited by araffin

Loading

araffin commented Aug 20, 2023 •

edited

Loading

araffin commented Sep 17, 2023 •

edited

Loading