`AutoResetWrapper` does not, in fact, call `env.reset()` #278

Theo-Cheynel · 2022-12-19T13:15:36Z

Theo-Cheynel
Dec 19, 2022

Hi,

The AutoResetWrapper (

Line 123 in 5ef4f65

class AutoResetWrapper(brax_env.Wrapper):

) is supposedly a way to have the environment's reset function called when done is set to 1. However, it does not call the environment's reset method : instead, it uses the qp stored in state.info['first_qp'] after the last external call to reset.

This assumes that the call to env.reset will always return the same value. However, in the example environments provided, there is some randomness introduced during the reset, which is not captured by the current AutoResetWrapper.

I think the expected behaviour is to have env.reset be called when needed, however I guess this might make things really slow. Could anything be done to fix this ?

Thanks

btaba · 2022-12-19T22:10:53Z

btaba
Dec 19, 2022
Maintainer

Hi @Theo-Cheynel ! The intention of AutoResetWrapper is indeed to use the first_qp and not to call env.reset for performance reasons as mentioned (since the whole thing needs to be jitted). This works in practice with a large batch size of environments

0 replies

Theo-Cheynel · 2022-12-20T09:27:33Z

Theo-Cheynel
Dec 20, 2022
Author

Thanks, that makes sense ! I think this has an impact on the training of my models though, since I have to set a moderate batch size in order to stay within my GPU's vRAM. Do you know what could be done to provide more randomness without impacting training performance too much ?

2 replies

btaba Dec 20, 2022
Maintainer

Hi @Theo-Cheynel ! You could try increasing the noise scale within your fixed batch size, or try running on multiple devices (the trainers run a pmap). Another idea is to call a real env.reset on the host periodically. Lastly, if the noise is simple enough to implement, you could also try modifying first_qp on the fly and see how that impacts performance. Let us know what you wind up doing!

Theo-Cheynel Dec 20, 2022
Author

Thanks ! For now I figured I'd simply call a true reset() every once in a while. In a near future I'll see if I can integrate the noise inside of the where_done, as it sounds like a better way to have a different starting qp everytime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AutoResetWrapper` does not, in fact, call `env.reset()` #278

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

AutoResetWrapper does not, in fact, call env.reset() #278

Theo-Cheynel Dec 19, 2022

Replies: 2 comments · 2 replies

btaba Dec 19, 2022 Maintainer

Theo-Cheynel Dec 20, 2022 Author

btaba Dec 20, 2022 Maintainer

Theo-Cheynel Dec 20, 2022 Author

`AutoResetWrapper` does not, in fact, call `env.reset()` #278

Theo-Cheynel
Dec 19, 2022

Replies: 2 comments 2 replies

btaba
Dec 19, 2022
Maintainer

Theo-Cheynel
Dec 20, 2022
Author

btaba Dec 20, 2022
Maintainer

Theo-Cheynel Dec 20, 2022
Author