DACORL

This repository contains work on applying multi-teacher offline reinforcement learning to dynamically adjust the learning rate during neural network optimization.

Installation

It is recommended to create a new Conda environment before installing the repository.

conda create -n DACORL python=3.10
conda activate DACORL

git clone --recurse-submodules https://github.com/Bronzila/DACORL.git && cd DACORL
pip install -r CORL/requirements/requirements_dev.txt
pip install -e DACBench[all,dev]
pip install -e .

Usage

To generate a dataset, train agents, and evaluate them, use the main.py script. We utilize the Hydra framework for configuration management.

By default, all three tasks (data generation, training, and evaluation) run consecutively. You can separate these tasks by specifying the mode using mode=data_gen|train|eval. Note that training will automatically trigger evaluation upon completion.

It is also required to specify the result_dir when running any job. You can override default configuration values as needed. Refer to the hydra_conf/config.yaml file and other configuration files in the hydra_conf directory for more details.

python main.py result_dir=data/test_experiment

Cluster-Usage

We use SLURM scripts to run our experiments on our compute cluster. All scripts can be found in scripts/LayerwiseSGD. To run the scripts in your own workspace or virtual environment, you'll need to modify the initial lines of the script. This ensures it correctly navigates to your working directory and activates your virtual environment.

Multi-Teacher Experiments

NOTE: Multi-Teacher experiments only differ in the way of generating data by generating data for multiple teachers and then combining the generated datasets.

Conducting multi-teacher experiments is also easily possible by using the main.py script. Here we differentiate between two teacher combination strategies: homogeneous and heterogeneous, which can be parameterized using the combination configuration field. In the following we will quickly introduce the two different combination strategies and how to use them.

NOTE: If you have already generated data by multiple teachers and do not want to re-generate it, but use the present data, pass data_exists=true to main.py. This way the script will automatically use the present teachers in the respective folder (e.g., step_decay) for the homogeneous case and all default teachers defined by the teacher field for the heterogeneous case.

The recommended pipeline is to first generate data for all teachers you aim to combine using main.py with mode=data_gen. When all data has been generated, you can use the combination field for the specific combination you aim to train on. By specifying data_exists=true and mode=train the script automatically combines the given datasets and trains the agent on the combined dataset.

Homogeneous

In homogeneous combinations, teachers of the same type (e.g., step decay) but with different configurations (e.g., decay rate of 0.9 over 9 steps) are combined. To run homogeneous combination experiments, please set combination=homogeneous. The current implementation generates data using five teachers and then concatenates the datasets.

Heterogeneous

In heterogeneous combinations, teachers of varying type and configuration are combined. To run heterogeneous experiments, please set combination=heterogeneous. To define which (default) teachers you want to combine, please use the teacher field. Here we use the following notation: Teacher types are abbreviated:

E = Exponential decay
ST = Step decay
SG = SGDR
C = Constant

Using these abbreviations you can combine teachers by separating them using a "-". To combine the exponential decay, step decay and constant teacher for example, use teacher=E-ST-C.

Development

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit your changes with clear messages.
Push to the branch (git push origin feature-branch).
Open a pull request.

Pre-commit

To ensure code quality and consistency, we use pre-commit for automatic code formatting and linting. Therefore, please install the development dependencies and the pre-commit hooks, which will be run automatically before each commit.

pip install .[dev]

# make sure that you are in DACORL/
pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 398 Commits
CORL @ fdb1757		CORL @ fdb1757
DACBench @ 32d62c1		DACBench @ 32d62c1
configs/agents		configs/agents
hydra_conf		hydra_conf
legacy_scripts		legacy_scripts
plotting_notebooks		plotting_notebooks
plotting_thesis		plotting_thesis
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
benchmarking.py		benchmarking.py
cma_plotting.py		cma_plotting.py
combine_buffers.py		combine_buffers.py
data_gen.py		data_gen.py
eval.py		eval.py
eval.sh		eval.sh
eval_sgd.py		eval_sgd.py
generate_cmaes_table.py		generate_cmaes_table.py
generate_incumbent_statistics.py		generate_incumbent_statistics.py
generate_latex_tables.py		generate_latex_tables.py
generate_latex_tables_sgd.py		generate_latex_tables_sgd.py
generate_perc_table.py		generate_perc_table.py
generate_result_comparison.py		generate_result_comparison.py
generate_results_markdown.py		generate_results_markdown.py
generate_sgd_tables.py		generate_sgd_tables.py
generate_tables.py		generate_tables.py
generate_tables_percentages.py		generate_tables_percentages.py
main.py		main.py
plotting.py		plotting.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
teacher_hpo_SGD.py		teacher_hpo_SGD.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DACORL

Table of Contents

Installation

Usage

Cluster-Usage

Multi-Teacher Experiments

Homogeneous

Heterogeneous

Development

Contributing

Pre-commit

About

Releases 2

Packages

Contributors 2

Languages

Bronzila/DACORL

Folders and files

Latest commit

History

Repository files navigation

DACORL

Table of Contents

Installation

Usage

Cluster-Usage

Multi-Teacher Experiments

Homogeneous

Heterogeneous

Development

Contributing

Pre-commit

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages