Skip to content

[ICML 2024] The algorithm of Reinforcement Learning with an Assistant Reward Agent (ReLara)

Notifications You must be signed in to change notification settings

mahaozhe/ReLara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICML 2024] Reward Shaping for Reinforcement Learning with An Assistant Reward Agent

The codes for the proposed Reinforcement Learning with an Assistant Reward Agent (ReLara) algorithm.

[Paper Link]

The framework of the ReLara algorithm is shown below:

The framework of the ReLara algorithm.

ReLara involves two agents: a policy agent (PA) and a reward agent (RA). The PA learns the policy to maximize the augmented rewards $r^E + \beta r^S$, while the RA learns to generate a dense shaped reward $r^S$.

Requirements

  • This code has been tested on:
pytorch==2.0.1+cu117
  • Install all dependent packages:
pip3 install -r requirements.txt

Run ReLara Algorithm

Run the following command to train ReLara on the task specified by <Task ID>:

python run-ReLara.py --env-id <Task ID>

All available environments with sparse rewards evaluated in our paper are listed below:

All available environments with sparse rewards

  • Mujoco-Sparse:
    • MyMujoco/Ant-Height-Sparse: the AntStand task.
    • MyMujoco/Ant-Speed-Sparse: the AntSpeed task.
    • MyMujoco/Ant-Far-Sparse: the AntFar task.
    • MyMujoco/Ant-Very-Far-Sparse: the AntVeryFar task.
    • MyMujoco/Walker2d-Keep-Sparse: the WalkerKeep task.
    • MyMujoco/Humanoid-Keep-Sparse: the HumanKeep task.
    • MyMujoco/HumanoidStandup-Sparse: the HumanStand task.
  • Robotics-Sparse:
    • MyFetchRobot/Reach-Jnt-Sparse-v0: the RobotReach task.
    • MyFetchRobot/Push-Jnt-Sparse-v0: the RobotPush task.
  • Classic control:
    • MountainCarContinuous-v0: the MountainCar task.

All hyper-parameters are set as default values in the code. You can change them by adding arguments to the command line. All available arguments are listed below:

--exp-name: the name of the experiment, to record the tensorboard and save the model.

--env-id: the task id
--seed: the random seed.
--cuda: the cuda device, default is 0, the code will automatically choose "cpu" if cuda is not available.
--gamma: the discount factor.

--total-timesteps: the total timesteps to train the agent.
--pa-learning-starts: the burn-in steps of the policy agent.
--ra-learning-starts: the burn-in steps of the reward agent.

--proposed-reward-scale: the scale of the proposed reward, default is 1.
--beta: the weight of the proposed reward, default is 0.2.

--pa-buffer-size: the buffer size of the policy agent.
--pa-rb-optimize-memory: whether to optimize the memory of the policy agent
--pa-batch-size: the batch size of the policy agent
--ra-buffer-size: the buffer size of the reward agent.
--ra-rb-optimize-memory: whether to optimize the memory of the reward agent
--ra-batch-size: the batch size of the reward agent

--pa-actor-lr: the learning rate of the actor of the policy agent
--pa-critic-lr: the learning rate of the critic of the policy agent
--pa-alpha-lr: the learning rate of the alpha of the policy agent
--ra-actor-lr: the learning rate of the actor of the reward agent
--ra-critic-lr: the learning rate of the critic of the reward agent
--ra-alpha-lr: the learning rate of the alpha of the reward agent

--pa-policy-frequency: the policy frequency of the policy agent
--pa-target-frequency: the target frequency of the policy agent
--pa-tau: the tau of the policy agent
--ra-policy-frequency: the policy frequency of the reward agent
--ra-target-frequency: the target frequency of the reward agent
--ra-tau: the tau of the reward agent

--pa-alpha: the alpha of the policy agent
--pa-alpha-autotune: whether to autotune the alpha of the policy agent
--ra-alpha: the alpha of the reward agent
--ra-alpha-autotune: whether to autotune the alpha of the reward agent

--write-frequency: the frequency to write the tensorboard
--save-frequency: the frequency to save the checkpoint
--save-folder: the folder to save the model

Comparative Evaluation

The comparison of ReLara with several baselines, including ROSA (Mguni et al., 2023), ExploRS (Devidze et al., 2022) SAC (Haarnoja et al., 2018), TD3 (Fujimoto et al., 2018), RND (Burda et al., 2018) and PPO (Schulman et al., 2017), is shown below:

Comparison the learning performance of ReLara with the baselines.

About

[ICML 2024] The algorithm of Reinforcement Learning with an Assistant Reward Agent (ReLara)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages