You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is there to allow to coordinate who is running what and see a more or less live update of the performances being uploaded to openrlbenchmark.
For all deterministic environments, we push the learning rate to 1.0 and exploration rate higher since it's all about exploring fast in these cases. Our deterministic envs:
This issue is there to allow to coordinate who is running what and see a more or less live update of the performances being uploaded to openrlbenchmark.
See all runs: openrlbenchmark
How to help?
Mark your name on an algo/env combination and state the runs as you make them.
Run command with benchmark script:
Deterministic envs
For all deterministic environments, we push the learning rate to 1.0 and exploration rate higher since it's all about exploring fast in these cases. Our deterministic envs:
deep-sea-treasure-v0
deep-sea-treasure-concave-v0
four-room-v0
fruit-tree-v0
Multi-policy
✅ CAPQL
--env-id mo-lunar-lander-continuous-v2 --num-timesteps 50000 --ref-point -110 -400 -100 -100 --init-hyperparams "alpha:0.2"
10/10--env-id mo-halfcheetah-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "alpha:0.2"
10/10--env-id mo-hopper-2d-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "alpha:0.2"
10/10--env-id mo-hopper-v4 --num-timesteps 200000 --ref-point -100 -100 -100 --init-hyperparams "alpha:0.2"
10/10✅ GPI-LS continuous
--algo gpi_ls_continuous
--env-id mo-lunar-lander-continuous-v2 --num-timesteps 200000 --ref-point -110 -400 -100 -100 --init-hyperparams "per:False"
10/10--env-id mo-halfcheetah-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "per:False"
10/10--env-id mo-hopper-2d-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "per:False"
10/10--env-id mo-hopper-v4 --num-timesteps 200000 --ref-point -100 -100 -100 --init-hyperparams "per:False"
10/10✅ GPI-PD continuous
--algo gpi_pd_continuous
--env-id mo-lunar-lander-continuous-v2 --num-timesteps 200000 --ref-point -110 -400 -100 -100
10/10--env-id mo-halfcheetah-v4 --num-timesteps 100000 --ref-point -100 -100
10/10--env-id mo-hopper-2d-v4 --num-timesteps 100000 --ref-point -100 -100
10/10--env-id mo-hopper-v4 --num-timesteps 100000 --ref-point -100 -100 -100
10/10✅ GPI-LS discrete
--algo gpi_ls_discrete
--env-id mo-mountaincar-v0 --num-timesteps 200000 --ref-point -200 -200 -200 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id mo-lunar-lander-v2 --num-timesteps 200000 --ref-point -101 -1001 -101 -101 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id minecart-v0 --num-timesteps 200000 --gamma 0.98 --ref-point -1 -1 -200 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id mo-highway-fast-v0 --num-timesteps 200000 --ref-point -1 -1 -40 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id mo-reacher-v4 --num-timesteps 200000 --ref-point -50 -50 -50 -50 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10✅ GPI-PD discrete
--algo gpi_pd_discrete
--env-id mo-mountaincar-v0 --num-timesteps 50000 --ref-point -200 -200 -200
10/10--env-id mo-lunar-lander-v2 --num-timesteps 200000 --ref-point -101 -1001 -101 -101 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id minecart-v0 --num-timesteps 200000 --gamma 0.98 --ref-point -1 -1 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10--env-id mo-highway-fast-v0 --num-timesteps 100000 --ref-point -1 -1 -40
10/10--env-id mo-reacher-v4 --num-timesteps 200000 --ref-point -50 -50 -50 -50 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10"
10/10✅ Envelope
--algo envelope
--env-id mo-mountaincar-v0 --num-timesteps 1000000 --ref-point -200 -200 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000"
10/10--env-id mo-lunar-lander-v2 --num-timesteps 1000000 --ref-point -101 -1001 -101 -101 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000"
10/10--env-id minecart-v0 --gamma 0.98 --num-timesteps 1000000 --ref-point -1 -1 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000"
10/10--env-id mo-highway-fast-v0 --num-timesteps 1000000 --ref-point -1 -1 -40 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000"
10/10--env-id mo-reacher-v4 --num-timesteps 1000000 --ref-point -50 -50 -50 -50 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000"
10/10✅ PGMORL
--algo pgmorl
--env-id mo-mountaincarcontinuous-v0 --num-timesteps 3000000 --ref-point -110 -110
10/10--env-id mo-halfcheetah-v4 --num-timesteps 5000000 --ref-point -100 -100
10/10--env-id mo-hopper-2d-v4 --num-timesteps 5000000 --ref-point -100 -100
10/10PCN
--algo pcn
--env-id mo-mountaincar-v0 --init-hyperparams "scaling_factor:np.array([...])
0/10--env-id mo-lunar-lander-v2 --init-hyperparams "scaling_factor:np.array([...])
0/10--algo pcn --env-id minecart-v0 --gamma 0.98 --ref-point -1 -1 -200 --num-timesteps 10000000 --auto-tag True --wandb-entity openrlbenchmark --seed 0 --init-hyperparams "scaling_factor:np.array([1, 1, 0.1, 0.1])" --train-hyperparams "max_return:1.5"
0/10✅ PQL (deterministic envs)
--algo pql
--env-id deep-sea-treasure-v0 --num-timesteps 200000 --ref-point 0 -50 --init-hyperparams "ref_point:np.array([0, -50])"
10/10 (deterministic env)--env-id deep-sea-treasure-concave-v0 --num-timesteps 200000 --ref-point 0 -50 --init-hyperparams "ref_point:np.array([0, -50])"
10/10 (deterministic env)--env-id fruit-tree-v0 --num-timesteps 150000 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "ref_point:np.array([-1, -1, -1, -1, -1, -1])"
10/10 (deterministic env)✅ GPI-LS tabular
--algo gpi-ls --init-hyperparams "use_gpi_policy:True"
--env-id deep-sea-treasure-v0 --num-timesteps 400000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id deep-sea-treasure-concave-v0 --num-timesteps 400000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --init-hyperparams "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20"
10/10--env-id fruit-tree-v0 --num-timesteps 400000 --gamma 0.99 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id four-room-v0 --num-timesteps 400000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)✅ MPMOQL
--algo mpmoql
--env-id deep-sea-treasure-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id deep-sea-treasure-concave-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20"
10/10--env-id fruit-tree-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id four-room-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)✅ OLS
--algo ols --init-hyperparams "weight_selection_algo:'ols'" "epsilon_ols:0.0"
--env-id deep-sea-treasure-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "weight_selection_algo:'ols'" "epsilon_ols:0.0" "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id deep-sea-treasure-concave-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "weight_selection_algo:'ols'" "epsilon_ols:0.0" "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20"
10/10--env-id fruit-tree-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "weight_selection_algo:'ols'" "epsilon_ols:0.0" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)--env-id four-room-v0 resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "weight_selection_algo:'ols'" "epsilon_ols:0.0" --train-hyperparams "timesteps_per_iteration:int(1e4)"
10/10 (deterministic env)Single-policy
MOQL
deep-sea-treasure-v0
0/10deep-sea-treasure-concave-v0
0/10resource-gathering-v0
fruit-tree-v0
0/10four-room-v0
0/10EUPG
deep-sea-treasure-concave-v0
0/10fishwood-v0
0/10The text was updated successfully, but these errors were encountered: