Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance report issue tracker #43

Open
49 of 60 tasks
ffelten opened this issue Mar 20, 2023 · 0 comments
Open
49 of 60 tasks

Performance report issue tracker #43

ffelten opened this issue Mar 20, 2023 · 0 comments
Assignees

Comments

@ffelten
Copy link
Collaborator

ffelten commented Mar 20, 2023

This issue is there to allow to coordinate who is running what and see a more or less live update of the performances being uploaded to openrlbenchmark.

See all runs: openrlbenchmark

How to help?

Mark your name on an algo/env combination and state the runs as you make them.

Run command with benchmark script:

python benchmark/launch_experiment.py --algo <ALGO> --env-id <ENV_ID> --num-timesteps 1000000 --gamma 0.99 --ref-point ... --auto-tag True --wandb-entity openrlbenchmark --seed <0 to 9> --init-hyperparams ... --train-hyperparams ...

Deterministic envs

For all deterministic environments, we push the learning rate to 1.0 and exploration rate higher since it's all about exploring fast in these cases. Our deterministic envs:

  • deep-sea-treasure-v0
  • deep-sea-treasure-concave-v0
  • four-room-v0
  • fruit-tree-v0

Multi-policy

✅ CAPQL

  • --env-id mo-lunar-lander-continuous-v2 --num-timesteps 50000 --ref-point -110 -400 -100 -100 --init-hyperparams "alpha:0.2" 10/10
  • --env-id mo-halfcheetah-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "alpha:0.2" 10/10
  • --env-id mo-hopper-2d-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "alpha:0.2" 10/10
  • --env-id mo-hopper-v4 --num-timesteps 200000 --ref-point -100 -100 -100 --init-hyperparams "alpha:0.2" 10/10

✅ GPI-LS continuous

--algo gpi_ls_continuous

  • --env-id mo-lunar-lander-continuous-v2 --num-timesteps 200000 --ref-point -110 -400 -100 -100 --init-hyperparams "per:False" 10/10
  • --env-id mo-halfcheetah-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "per:False" 10/10
  • --env-id mo-hopper-2d-v4 --num-timesteps 200000 --ref-point -100 -100 --init-hyperparams "per:False" 10/10
  • --env-id mo-hopper-v4 --num-timesteps 200000 --ref-point -100 -100 -100 --init-hyperparams "per:False" 10/10

✅ GPI-PD continuous

--algo gpi_pd_continuous

  • --env-id mo-lunar-lander-continuous-v2 --num-timesteps 200000 --ref-point -110 -400 -100 -100 10/10
  • --env-id mo-halfcheetah-v4 --num-timesteps 100000 --ref-point -100 -100 10/10
  • --env-id mo-hopper-2d-v4 --num-timesteps 100000 --ref-point -100 -100 10/10
  • --env-id mo-hopper-v4 --num-timesteps 100000 --ref-point -100 -100 -100 10/10

✅ GPI-LS discrete

--algo gpi_ls_discrete

  • --env-id mo-mountaincar-v0 --num-timesteps 200000 --ref-point -200 -200 -200 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id mo-lunar-lander-v2 --num-timesteps 200000 --ref-point -101 -1001 -101 -101 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id minecart-v0 --num-timesteps 200000 --gamma 0.98 --ref-point -1 -1 -200 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id mo-highway-fast-v0 --num-timesteps 200000 --ref-point -1 -1 -40 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id mo-reacher-v4 --num-timesteps 200000 --ref-point -50 -50 -50 -50 --init-hyperparams "per:False" "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10

✅ GPI-PD discrete

--algo gpi_pd_discrete

  • --env-id mo-mountaincar-v0 --num-timesteps 50000 --ref-point -200 -200 -200 10/10
  • --env-id mo-lunar-lander-v2 --num-timesteps 200000 --ref-point -101 -1001 -101 -101 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id minecart-v0 --num-timesteps 200000 --gamma 0.98 --ref-point -1 -1 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10
  • --env-id mo-highway-fast-v0 --num-timesteps 100000 --ref-point -1 -1 -40 10/10
  • --env-id mo-reacher-v4 --num-timesteps 200000 --ref-point -50 -50 -50 -50 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:100000" "target_net_update_freq:200" "gradient_updates:10" 10/10

✅ Envelope

--algo envelope

  • --env-id mo-mountaincar-v0 --num-timesteps 1000000 --ref-point -200 -200 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000" 10/10
  • --env-id mo-lunar-lander-v2 --num-timesteps 1000000 --ref-point -101 -1001 -101 -101 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000" 10/10
  • --env-id minecart-v0 --gamma 0.98 --num-timesteps 1000000 --ref-point -1 -1 -200 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000" 10/10
  • --env-id mo-highway-fast-v0 --num-timesteps 1000000 --ref-point -1 -1 -40 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000" 10/10
  • --env-id mo-reacher-v4 --num-timesteps 1000000 --ref-point -50 -50 -50 -50 --init-hyperparams "initial_epsilon:1.0" "final_epsilon:0.05" "epsilon_decay_steps:500000" 10/10

✅ PGMORL

--algo pgmorl

  • --env-id mo-mountaincarcontinuous-v0 --num-timesteps 3000000 --ref-point -110 -110 10/10
  • --env-id mo-halfcheetah-v4 --num-timesteps 5000000 --ref-point -100 -100 10/10
  • --env-id mo-hopper-2d-v4 --num-timesteps 5000000 --ref-point -100 -100 10/10

PCN

--algo pcn

  • --env-id mo-mountaincar-v0 --init-hyperparams "scaling_factor:np.array([...]) 0/10
  • --env-id mo-lunar-lander-v2 --init-hyperparams "scaling_factor:np.array([...]) 0/10
  • `--env-id mo-highway-fast-v0
  • `--env-id mo-reacher-v4
  • --algo pcn --env-id minecart-v0 --gamma 0.98 --ref-point -1 -1 -200 --num-timesteps 10000000 --auto-tag True --wandb-entity openrlbenchmark --seed 0 --init-hyperparams "scaling_factor:np.array([1, 1, 0.1, 0.1])" --train-hyperparams "max_return:1.5" 0/10

✅ PQL (deterministic envs)

--algo pql

  • --env-id deep-sea-treasure-v0 --num-timesteps 200000 --ref-point 0 -50 --init-hyperparams "ref_point:np.array([0, -50])" 10/10 (deterministic env)
  • --env-id deep-sea-treasure-concave-v0 --num-timesteps 200000 --ref-point 0 -50 --init-hyperparams "ref_point:np.array([0, -50])" 10/10 (deterministic env)
  • --env-id fruit-tree-v0 --num-timesteps 150000 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "ref_point:np.array([-1, -1, -1, -1, -1, -1])" 10/10 (deterministic env)

✅ GPI-LS tabular

--algo gpi-ls --init-hyperparams "use_gpi_policy:True"

  • --env-id deep-sea-treasure-v0 --num-timesteps 400000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)"10/10 (deterministic env)
  • --env-id deep-sea-treasure-concave-v0 --num-timesteps 400000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --init-hyperparams "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20" 10/10
  • --env-id fruit-tree-v0 --num-timesteps 400000 --gamma 0.99 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id four-room-v0 --num-timesteps 400000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "use_gpi_policy:True" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)

✅ MPMOQL

--algo mpmoql

  • --env-id deep-sea-treasure-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)"10/10 (deterministic env)
  • --env-id deep-sea-treasure-concave-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20" 10/10
  • --env-id fruit-tree-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id four-room-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)

✅ OLS

--algo ols --init-hyperparams "weight_selection_algo:'ols'" "epsilon_ols:0.0"

  • --env-id deep-sea-treasure-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "weight_selection_algo:'ols'" "epsilon_ols:0.0" "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id deep-sea-treasure-concave-v0 --num-timesteps 1000000 --gamma 0.99 --ref-point 0 -50 --init-hyperparams "learning_rate:1.0" "weight_selection_algo:'ols'" "epsilon_ols:0.0" "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -2 --train-hyperparams "timesteps_per_iteration:int(1e4)" "num_eval_episodes_for_front:20" 10/10
  • --env-id fruit-tree-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "weight_selection_algo:'ols'" "epsilon_ols:0.0" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)
  • --env-id four-room-v0 resource-gathering-v0 --num-timesteps 1000000 --ref-point -1 -1 -1 --init-hyperparams "learning_rate:1.0" "initial_epsilon:1.0" "epsilon_decay_steps:100000" "final_epsilon:0.1" "weight_selection_algo:'ols'" "epsilon_ols:0.0" --train-hyperparams "timesteps_per_iteration:int(1e4)" 10/10 (deterministic env)

Single-policy

MOQL

  • deep-sea-treasure-v0 0/10
  • deep-sea-treasure-concave-v0 0/10
  • resource-gathering-v0
  • fruit-tree-v0 0/10
  • four-room-v0 0/10

EUPG

  • deep-sea-treasure-concave-v0 0/10
  • fishwood-v0 0/10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants