From d747fef30b390c06461c91e76c8604fb527b8ead Mon Sep 17 00:00:00 2001
From: Theresa Eimer <t.eimer@ai.uni-hannover.de>
Date: Wed, 5 Jun 2024 11:04:56 +0200
Subject: [PATCH 1/2] Small doc updates

---
 docs/CONTRIBUTING.md                          | 23 ++++++++-----------
 docs/advanced_usage/algorithm_states.rst      |  8 ++++---
 docs/advanced_usage/autorl_paradigms.rst      | 16 +++++++++----
 docs/advanced_usage/dynamic_configuration.rst | 13 +++++++----
 docs/basic_usage/env_subsets.rst              | 11 +++++----
 docs/basic_usage/index.rst                    | 14 ++++++-----
 docs/basic_usage/objectives.rst               |  9 ++++----
 docs/basic_usage/seeding.rst                  | 11 ++++++---
 8 files changed, 61 insertions(+), 44 deletions(-)

diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
index 4dbdd58d0..1408bbd2d 100644
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -51,13 +51,14 @@ Ready to contribute? Here's how to set up `arlbench` for local development.
 2. Clone your fork locally:
 ```
     $ git clone git@github.com:your_name_here/arlbench.git
+    $ cd arlbench
 ```
 
-3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
+3. Install your local copy into a conda env:
 ```
-    $ mkvirtualenv arlbench
-    $ cd arlbench/
-    $ python setup.py develop
+    $ conda create -n arlbench python=3.10
+    $ conda activate arlbench
+    $ make install-dev
 ```
 
 4. Create a branch for local development:
@@ -67,15 +68,11 @@ Ready to contribute? Here's how to set up `arlbench` for local development.
 
    Now you can make your changes locally.
 
-5. When you're done making changes, check that your changes pass ruff, including testing other Python versions with tox:
+5. When you're done making changes, check that your changes pass ruff:
 ```
-    $ ruff format arlbench tests
-    $ python setup.py test or pytest
-    $ tox
+    $ make format
 ```
 
-   To get flake8 and tox, just pip install them into your virtualenv.
-
 6. Commit your changes and push your branch to GitHub:
 ```
     $ git add .
@@ -93,16 +90,14 @@ Before you submit a pull request, check that it meets these guidelines:
 2. If the pull request adds functionality, the docs should be updated. Put
    your new functionality into a function with a docstring, and add the
    feature to the list in README.rst.
-3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check
-   https://travis-ci.com/automl/arlbench/pull_requests
-   and make sure that the tests pass for all supported Python versions.
+3. The pull request should work for Python 3.10 and above. This should be tested in the GitHub workflows.
 
 ## Tips
 
 To run a subset of tests:
 
 ```
-$ pytest tests.test_arlbench
+$ make test
 
 ```
 
diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst
index 79e280389..beb39aef5 100644
--- a/docs/advanced_usage/algorithm_states.rst
+++ b/docs/advanced_usage/algorithm_states.rst
@@ -1,7 +1,9 @@
 Using the ARLBench States
 ==========================
 
-In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`.
-As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training.
+In addition to providing different objectives, ARLBench also provides insights into the target algorithms' **internal states**. This is done using so called `StateFeatures`.
 
-The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach.
\ No newline at end of file
+As of now, we implement the `GradInfo` state feature which returns the **norm and variance of the gradients observed during training**.
+The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. 
+Please include `grad_info` in this list if you want to use this state feature for your approach.
+We are currently working on extending this part of ARLBench to other state features as well.
\ No newline at end of file
diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst
index c5214b964..8ebdd4402 100644
--- a/docs/advanced_usage/autorl_paradigms.rst
+++ b/docs/advanced_usage/autorl_paradigms.rst
@@ -1,22 +1,28 @@
 ARLBench and Different AutoRL Paradigms
 =======================================
 
-In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms.
+Since there are various AutoRL paradigms in the literature, we mention how ARLBench relates to each one.
 
 Hyperparameter Optimization (HPO)
 ---------------------------------
-(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL.
+Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL.
+This can also be done in a dynamic fashion in ARLBench.
 
 Dynamic Algorithm Configuration (DAC)
 -------------------------------------
 When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples,
-this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches.
+this can be done using the CLI or the AutoRL Environment. In DAC specifically, however, the hyperparameter controller learns to adapt hyperparameters based in an algorithm state. 
+This is supported in ARLBench, but not implemented extensively just yet. At the moment, we only offer a limited amount of gradient features, which might not be enough to learn a reliable hyperparameter controller.
+Since DAC has not been applied to RL in this manner yet, however, we are not yet sure which other features are necessary to make DAC work in the context of RL.
 
 Neural Architecture Search (NAS)
 --------------------------------
 In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters.
-In the future, ARLBench could be extended by more powerful search space interfaces for NAS.
+Most NAS approaches actually focus on more elaborate search spaces to find architectures tailored to a usecase. This line of research is not very prominent in the context of RL yet, unfortunately.
+We hope ARLBench can support such research in the future by extending to standard NAS search spaces like DARTS or novel RL-specific ones.
 
 Meta-Gradients
 --------------
-As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples.
\ No newline at end of file
+As of now, ARLBench does not include meta-gradient/second order optimization based approaches for AutoRL. 
+However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples.
+Through this interface, we jope to be able to provide an option for second order gradient computation in the future.
\ No newline at end of file
diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst
index 9e370975b..2753a7fe5 100644
--- a/docs/advanced_usage/dynamic_configuration.rst
+++ b/docs/advanced_usage/dynamic_configuration.rst
@@ -1,11 +1,14 @@
 Dynamic Configuration in ARLBench
 ==================================
 
-In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches. 
-These methods, in contrast, can adapt the current hyperparameter configuration during training.
+In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports **dynamic configuration approaches**. 
+These methods, in contrast, can adapt the current hyperparameter configuration **during training**.
 To do this, you can use the CLI or the AutoRL Environment as shown in our examples.
 
-When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration.
+When using the CLI, you have to **pass a checkpoint path** for the current training state. 
+Then, the training is proceeded using this training state with a new configuration.
+This is especially useful for highly parallelizable dynamic tuning methods, e.g. population based methods.
 
-For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training.
-By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration.
+For the AutoRL Environment, you can set `n_steps` to the **number of configuration updates** you want to perform during training.
+You should also adjust (`n_total_timesteps`) accordingly down to 1/`n_steps` in your settings. 
+Then calling the `step()` function multiple times until termination will perform the same dynamic configuration as with the CLI.
diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst
index a22628d99..471f09f87 100644
--- a/docs/basic_usage/env_subsets.rst
+++ b/docs/basic_usage/env_subsets.rst
@@ -1,14 +1,17 @@
 The ARLBench Subsets
 ====================
 
-We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets:
+We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. 
+This subset of 4-5 environments per algorithms matches the overall reward distribution across 128 hyperparameter configurations and 10 seeds::
 
 .. image:: ../images/subsets.png
   :width: 800
   :alt: Environment subsets for PPO, DQN and SAC
 
-We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. 
+In our experiments on GPU, all subset together should take about **1.5h to evaluate once**. 
+This number will need to be multiplied by the number of RL seeds you want to evaluate on, the number of optimizer runs you consider as well as the optimization budget for the total runtime of your experiments.
+If this full runtime is too long for your setup, you can also consider evaluating only a subset of algorithms - we strongly recommend you focus your benchmarking **on these exact environments**, however, to ensure you cover the space total landscape of RL behaviors well. 
+
 The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments.
 For more information how the subset selection was done, please refer to our paper.
-
-For more information on how to evaluate your method on these subsets, please refer to the examples in our GitHub repository.
\ No newline at end of file
+The examples in our GitHub repository show how you can evaluate your own method using these subsets. 
\ No newline at end of file
diff --git a/docs/basic_usage/index.rst b/docs/basic_usage/index.rst
index 283403288..29aaea425 100644
--- a/docs/basic_usage/index.rst
+++ b/docs/basic_usage/index.rst
@@ -3,13 +3,15 @@ Benchmarking AutoRL Methods
 
 ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench.
 We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages. 
-The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose:
 
-1. Decide which RL algorithms you choose as your HPO targets. In the best case, you will use all three: PPO, DQN and SAC.
-2. Decide which AutoRL methods you want to benchmark. 
-3. Decide which objectives you want to optimize for. We provide a variety of objectives you can select one or more from.
-4. Use the pre-defined search spaces to run your AutoRL method for several runs. If there is a good reason to deviate from these search spaces, please report this alongside your results.
-5. Evaluate the best found configuration on the environment test seeds and report this result.
+The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose which you can also see in our examples:
+
+1. Decide **which RL algorithms** you choose as your HPO targets. In the best case, you will use all three: PPO, DQN and SAC. You should also decide on the number of runs per algorithm you can afford to run (we recommend at least 10).
+2. Decide **which AutoRL methods** you want to benchmark. Also set a number of runs per AutoRL method (we recommend 3 at the very least, ideally more).
+3. Decide **which objectives** you want to optimize for. We provide a variety of objectives you can select one or more from.
+4. **Use the pre-defined search spaces** in your setup. If there is a good reason to deviate from these search spaces, please report this alongside your results.
+5. **Execute your experiments** for all combinations your defined - use this same setup for any baselines you compare against.
+5. **Evaluate** the best found configuration on the environment test seeds and report this result.
 
 
 In-depth Information on:
diff --git a/docs/basic_usage/objectives.rst b/docs/basic_usage/objectives.rst
index a4aaeabed..e5847dccc 100644
--- a/docs/basic_usage/objectives.rst
+++ b/docs/basic_usage/objectives.rst
@@ -9,7 +9,8 @@ These are selected as a list of keywords in the configuration of the AutoRL Envi
     python arlbench.py autorl.objectives=["reward_mean"]
 
 The following objectives are available at the moment:
-- reward_mean: the mean evaluation reward across a number of evaluation episodes
-- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes
-- runtime: the runtime of the training process
-- emissions: the CO2 emissions of the training process, tracked using `CodeCarbon <https://github.com/mlco2/codecarbon>`_ (which does not currently support ARM)
\ No newline at end of file
+
+- **reward_mean**: the mean evaluation reward across a number of evaluation episodes
+- **reward_std**: the standard deviation of the evaluation rewards across a number of evaluation episodes
+- **runtime**: the runtime of the training process
+- **emissions**: the CO2 emissions of the training process, tracked using `CodeCarbon <https://github.com/mlco2/codecarbon>`_.
\ No newline at end of file
diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst
index 4f9a86ee6..cea69956d 100644
--- a/docs/basic_usage/seeding.rst
+++ b/docs/basic_usage/seeding.rst
@@ -1,8 +1,13 @@
 Considerations for Seeding
 ============================
 
-Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing.
-For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training.
+Seeding is important both on the level of RL algorithms as well as the AutoRL level. 
+In general, we propose to use **three different set of random seeds** for training, validation, and testing.
+
+For **training and validation**, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training.
 We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process.
+You are of course free to increase this range, but we recommend to use **at least 10 different seeds** for reliable results.
 
-When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds.
\ No newline at end of file
+When it comes to testing HPO methods, we provide a evaluation script in our examples. 
+We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds.
+Here we suggest **three HPO runs as a minimum** even for stable optimizers - for consistent results with small confidence intervals, you should like aim for more runs.
\ No newline at end of file

From 3f63743211a3b4051ce8d1a0461143b0140e94a3 Mon Sep 17 00:00:00 2001
From: Theresa Eimer <t.eimer@ai.uni-hannover.de>
Date: Wed, 5 Jun 2024 11:11:14 +0200
Subject: [PATCH 2/2] small changes to examples

---
 examples/Readme.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/examples/Readme.md b/examples/Readme.md
index 7d81851b5..524285069 100644
--- a/examples/Readme.md
+++ b/examples/Readme.md
@@ -212,26 +212,27 @@ hp_config:
   target_update_interval: 10
 ```
 
-You should replace `my_optimizer` with the name of your method to make sure the results are stored in the right directory. You can then set your incumbent configuration for the algorithm/environment accordingly.
-
+You can then set your incumbent configuration for the algorithm/environment accordingly.
 As soon as you have stored all your incumbents (in this example in the `incumbent` directory in `configs`), you can run the evaluation script:
 
 ```bash
-python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(*)"
+python run_arlbench.py --config-name=evaluate -m "hpo_method=<my_optimizer>" "autorl.seed=100-110" "incumbent=glob(*)" 
 ```
 
-The command will evaluate all configurations on the three test seeds `100,101,102`. Make sure not to use these during the design or tuning of your methods as this will invalidate the evaluation results.
+The command will evaluate all configurations on the test seeds `100,101,102,...`. Make sure not to use these during the design or tuning of your methods as this will invalidate the evaluation results.
+We recommend test on at least 10 seeds.
 
 The final evaluation results are stored in the `evaluation` directory for each algorithm and environment.
 
 To run the evaluation only for a single algorithm, e.g. PPO, you can adapt the `incumbent` argument:
 
 ```bash
-python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(ppo*)"
+python run_arlbench.py --config-name=evaluate -m "autorl.seed=100-110" "incumbent=glob(ppo*)"
 ```
 
 The same can be done for single combinations of environments and algorithms.
 
 ### Evaluation of Dynamic Approaches
 
-When it comes to dynamic HPO methods, you cannot simply return the incumbent but have to evaluate the whole method. For this case, we recommend to use the Hypersweeper or AutoRL Environment as shown in the examples above. Make sure to set the seed of the AutoRL Environment accordingly (`100, 101, 102, ...`).
+When it comes to dynamic HPO methods, you cannot simply return the incumbent for evaluation since wou'll ahve a schedule with variable length and configuration intervals. 
+For this case, we recommend to use your dynamic tuning setup, but make sure to set the seed of the AutoRL Environment accordingly to a set of test seeds (`100, 101, 102, ...`).