Explanation this package

This package is a Recurrent Behavior Cloning.

And it is compatible with Imitation library

It is okay to use a expert dataset which is from human , whether it has recurrent state or not (like lstm_state or gru_state).

Training

python3 train_gru_bc.py

BC loss (ent_weight = 1e-3 , l2_weight = 0.0)

Pytorch == 1.12.1

Stable-baselines3 == 2.0.0

Sb3-contrib == 2.0.o

Imitation == 1.0.0

RecurrentRLHF (Preference based RL with Recurrent reward model)

GRU_AC (Actor-critic or Proximal Policy Optimizer with GRU)

BipedalWalker policy's hyper-parameter [git repo]

GRU BC reference [git repo] [paper]