llm-tweet-classification

Classifying tweets with large language models with zero- and few-shot learning with custom and generic prompts, as well as supervised learning algorithms for comparison.

Our results on annotating tweets with labels `exemplar` and `political`:

F1-scores & Accuracies	Precision-Recall

Getting Started

Install all requirements for the LLM classification script.

pip install -r requirements.txt

NB: This will only install a minimal set of requirements to create figures for reproducability sake with the code below. A more complete requirements file for running the full pipeline can be found in configs.

Inference

The repo contains a CLI script llm_classification.py. You can use it for running arbitrary classification tasks in .tsv or .csv files with Large Language models from either HuggingFace or OpenAI.

If you intend to use OpenAI models, you will have to specify your API key and ORG as environment variables.

export OPENAI_API_KEY="..."
export OPENAI_ORG="..."

The script has one command-line argument, namely a config file of the following format:

[paths]
in_file="labelled_data.csv"
out_dir="predictions/"

[system]
seed=0
device="cpu"

[model]
name="google/flan-t5-base"
task="few-shot"

[inference]
x_column="raw_text"
y_column="exemplar"
n_examples=5

If you intend to use a custom prompt for a given model, you can save it in a txt file and add its path to the paths section of the config.

[paths]
in_file="labelled_data.csv"
out_dir="predictions/"
prompt_file="custom_prompt.txt"

If you want to use hand-selected examples for few-shot learning, pass along a subset of the original data int the paths section of the config. Examples have to be in the same format as the data.

[paths]
in_file="labelled_data.csv"
out_dir="predictions/"
examples="examples.csv"

You can run the CLI like this:

python3 llm_classification.py "config.cfg"

Config Documentation

Paths:
- in_file: str - Path to input file, either .csv or .tsv
- out_dir: str - Output directory. The script creates one if not already there.
System:
- seed: int - Random seed for selecting few-shot examples. Is ignored when task=="zero-shot"
- device: str - Device to run inference on. Change to cuda:0 if you want to run on GPU.
Model:
- name: str - Name of the model from OpenAI or HuggingFace.
- task: {"few-shot", "zero-shot"} - Indicates whether zero-shot or few-shot inference should be run.
Inference:
- x_column: str - Name of independent variable in the table.
- y_column: str - Name of dependent variable in the table.
- n_examples: int - Number of examples to give to few-shot models. Is ignored when task=="zero-shot"

OpenAI script

For ease of use we have developed a script that generates predictions for all OpenAI models in one run. We did this, because OpenAI inference can run on low performance instances, as such it isn't a problem if it takes a long time to run. Additionally since all instances access the same API, and there are rate limits, we could not start multiple instances and run them in parallel.

Paths in this script are hardcoded and you might need to adjust it for personal use.

python3 run_gpt_inference.py

Supervised Classification

For supervised models we made a separate script. This includes running and evaluating Glove-200d with logistic regression and finetuning DistilBert for classification.

This script requires different requirements, therefore you should install these from the appropriate file:

pip install -r supervised_requirements.txt

Paths in this script are hardcoded and you might need to adjust it for personal use.

python3 supervised_classification.py

Output

This will output a table with predictions added to the out_dir folder in the config.

The file name format is as follows:

f"predictions/{task}_pred_{column}_{model}.csv"

Each table will have a pred_<y_column> and also a train_test_set column that is labelled train for all examples included in the prompt for few-shot learning and test everywhere else.

Evaluating results

To evaluate the performance of the model(s), you can run the CLI evaluation.py script. It has two command line arguments: --in_dir and --out_dir. These, respectively, refer to the folder in which the predictions from the llm_classification.py script has been saved (i.e., your predictions folder), and the folder where the classification report(s) should be saved. --in_dir defaults to 'predictions/' and --out_dir defaults to 'output/' (which is a folder that is created if it does not exist already)

It can be run as follows:

python3 evaluation.py --in_dir "your/data/path" --out_dir "your/out/path"

It expects the output file(s) from llm_classification.py in the specified file name format and placement. It will output two files to the specified out folder:

a txt file with the classification report for the test data for each of the files in the --in_dir folder.
a csv file with the same information as the txt file, but which can be used for plotting the results.

Plotting results

The plotting.py script takes the csv-file produced by the evaluation script and makes three plots:

acc_figure.png: The accuracy for each of the 8 models on each outcome (political, exemplar) in each task (zero-shot, few-shot) with each prompt type (generic, custom). It's split into four quadrants, with the left side being the exemplar column, the right being political, the upper line being custom prompts and the lower column being generic prompts.
f1_figure.png: The f1-score for positive labels for each model in each task – again split into political and exemplar + generic and custom prompt.
prec_rec_figure.png: Precision plotted against recall for each of the models, split into three rows and four columns. Rows indicate task (zero-shot, few-shot, supervised classification), columns indiciate label column (political, exemplar) and prompt type (generic, custom)

python3 plotting.py

These are all saved in a figures/ folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-tweet-classification

Our results on annotating tweets with labels `exemplar` and `political`:

Getting Started

Inference

Config Documentation

OpenAI script

Supervised Classification

Output

Evaluating results

Plotting results

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
configs		configs
figures		figures
output		output
prompts		prompts
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.cfg		config.cfg
evaluation.py		evaluation.py
llm_classification.py		llm_classification.py
plotting.py		plotting.py
plotting_requirements.txt		plotting_requirements.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_gpt_inference.py		run_gpt_inference.py
supervised_classification.py		supervised_classification.py
supervised_requirements.txt		supervised_requirements.txt

License

centre-for-humanities-computing/llm-tweet-classification

Folders and files

Latest commit

History

Repository files navigation

llm-tweet-classification

Our results on annotating tweets with labels exemplar and political:

Getting Started

Inference

Config Documentation

OpenAI script

Supervised Classification

Output

Evaluating results

Plotting results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Our results on annotating tweets with labels `exemplar` and `political`:

Packages