This repository contains work on applying multi-teacher offline reinforcement learning to dynamically adjust the learning rate during neural network optimization.
It is recommended to create a new Conda environment before installing the repository.
conda create -n DACORL python=3.10
conda activate DACORL
git clone --recurse-submodules https://github.com/Bronzila/DACORL.git && cd DACORL
pip install -r CORL/requirements/requirements_dev.txt
pip install -e DACBench[all,dev]
pip install -e .
To generate a dataset, train agents, and evaluate them, use the main.py
script. We utilize the Hydra framework for configuration management.
By default, all three tasks (data generation, training, and evaluation) run consecutively. You can separate these tasks by specifying the mode using mode=data_gen|train|eval
. Note that training will automatically trigger evaluation upon completion.
It is also required to specify the result_dir
when running any job. You can override default configuration values as needed. Refer to the hydra_conf/config.yaml
file and other configuration files in the hydra_conf
directory for more details.
python main.py result_dir=data/test_experiment
We use SLURM scripts to run our experiments on our compute cluster. All scripts can be found in scripts/LayerwiseSGD. To run the scripts in your own workspace or virtual environment, you'll need to modify the initial lines of the script. This ensures it correctly navigates to your working directory and activates your virtual environment.
NOTE: Multi-Teacher experiments only differ in the way of generating data by generating data for multiple teachers and then combining the generated datasets.
Conducting multi-teacher experiments is also easily possible by using the main.py
script. Here we differentiate between two teacher combination strategies: homogeneous
and heterogeneous
, which can be parameterized using the combination
configuration field. In the following we will quickly introduce the two different combination strategies and how to use them.
NOTE: If you have already generated data by multiple teachers and do not want to re-generate it, but use the present data, pass
data_exists=true
tomain.py
. This way the script will automatically use the present teachers in the respective folder (e.g.,step_decay
) for the homogeneous case and all default teachers defined by theteacher
field for the heterogeneous case.
The recommended pipeline is to first generate data for all teachers you aim to combine using main.py
with mode=data_gen
. When all data has been generated, you can use the combination
field for the specific combination you aim to train on. By specifying data_exists=true
and mode=train
the script automatically combines the given datasets and trains the agent on the combined dataset.
In homogeneous combinations, teachers of the same type (e.g., step decay) but with different configurations (e.g., decay rate of 0.9 over 9 steps) are combined. To run homogeneous combination experiments, please set combination=homogeneous
. The current implementation generates data using five teachers and then concatenates the datasets.
In heterogeneous combinations, teachers of varying type and configuration are combined. To run heterogeneous experiments, please set combination=heterogeneous
. To define which (default) teachers you want to combine, please use the teacher
field. Here we use the following notation:
Teacher types are abbreviated:
- E = Exponential decay
- ST = Step decay
- SG = SGDR
- C = Constant
Using these abbreviations you can combine teachers by separating them using a "-". To combine the exponential decay, step decay and constant teacher for example, use teacher=E-ST-C
.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Commit your changes with clear messages.
- Push to the branch (
git push origin feature-branch
). - Open a pull request.
To ensure code quality and consistency, we use pre-commit
for automatic code formatting and linting. Therefore, please install the development dependencies and the pre-commit
hooks, which will be run automatically before each commit.
pip install .[dev]
# make sure that you are in DACORL/
pre-commit install