Inconsistent actions between train and inference on Mario #36

takuma-yoneda · 2019-03-15T01:39:39Z

I trained policy for mario environment with
python train.py --default --env-id mario --noReward
And observed quite high external reward during the training:

[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.

However, when I try to run the policy with inference.py with the following
python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train
the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).

Is there a way to fix it?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent actions between train and inference on Mario #36

Inconsistent actions between train and inference on Mario #36

takuma-yoneda commented Mar 15, 2019

Inconsistent actions between train and inference on Mario #36

Inconsistent actions between train and inference on Mario #36

Comments

takuma-yoneda commented Mar 15, 2019