Active learning with a generative large language model

This is codebase for experiments run doing text-based active learning using large language models as generative models for query synthesis. Contact [email protected] for more details.

Installation

Clone the repository:

git clone [email protected]:OxAI-Safety-Hub/al-llm-experiments.git

Optional (but recommended): create a new virtual environment.
Install the requirements:

pip install -r requirements.txt

Set up Weights and Biases
Change the Weights and Biases entity name in /al_llm/constants.py.

Running experiments

The main script for running experiments is scripts/run_experiment.py.
First install the al_llm package locally in editable mode:

pip install -e .

Pass the --help flag to see the list of options:

python /scripts/run_experiment.py --help

The first argument is the run ID. A good convention for run ID is:

{DATASET}_{CLASSIFIER}_{SAMPLE_GENERATOR}_{ACQUISTION_FUNCTION}_{NUMBER}

using abbreviations. For example, the second pool-based experiment with the Rotten Tomatoes dataset, which uses a plain classifier and max uncertainty acquisition function might be called:

rt_plain_pool_mu_2

If there are special features you can add them at the end, before the number.

By default the experiment runs in the 'Experiments' project. To change this specify the --project-name option.
To select the GPU, use something like --cuda-device 'cuda:1'. The default is to use the 0th device.
By default, we run a single experiment for one seed. To run multiple of the same experiment over different seeds, add the --multiple-seeds flag.
In terms of configuring the experiment parameters, you'll most likely want to play around with the following options:

--dataset-name
--classifier-base-model
--use-tapted-classifier
--sample-generator-base-model
--use-tapted-sample-generator
--sample-generator-temperature
--sample-generator-top-k
--acquisition-function

But other options may be interesting.

Documentation

The following guides are located in the docs folder.

Name		Name	Last commit message	Last commit date
Latest commit History 893 Commits
.github/workflows		.github/workflows
al_llm		al_llm
datasets		datasets
docs		docs
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active learning with a generative large language model

Installation

Running experiments

Documentation

About

Releases

Packages

Contributors 4

Languages

OxAI-Safety-Hub/al-llm-experiments

Folders and files

Latest commit

History

Repository files navigation

Active learning with a generative large language model

Installation

Running experiments

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages