This repository contains the code and analysis tools for the paper "Large Language Models Reflect the Ideology of their Creators". We provide a comprehensive framework for analyzing ideological biases in Large Language Models (LLMs) through their evaluations of historical political figures.
The dataset contains evaluations from 19 different LLMs of 3,991 political figures, with responses in all six UN languages (Arabic, Chinese, English, French, Russian, and Spanish). Access the full dataset on Hugging Face.
- Python 3.11 or higher
- Poetry (for dependency management)
-
Clone the repository:
git clone https://github.com/aida-ugent/llm-ideology-analysis.git cd llm-ideology-analysis
-
Install dependencies using Poetry:
poetry install
-
Copy the environment template:
cp .env.template .env
-
Configure the following environment variables in
.env
:OPENAI_API_KEY
: OpenAI API keyANTHROPIC_API_KEY
: Anthropic API keyHUGGINGFACE_TOKEN
: Hugging Face tokenMISTRAL_API_KEY
: Mistral API keyTOGETHER_API_KEY
: Together API keyPERPLEXITY_API_KEY
: Perplexity API keyGEMINI_API_KEY
: Google Gemini API key
RESULTS_DIR
: Directory for storing resultsNOTEBOOKS_DIR
: Directory containing analysis notebooksDOCS_DIR
: Directory for documentationFIGURES_DIR
: Directory for generated figuresCACHE_PATH
: Path for caching results
-
Process questions through the unified API:
poetry run python src/run_questions_through_unified_api.py
-
Run the manifesto tagger:
poetry run python src/run_manifesto_tagger.py
-
Analyze results using Jupyter notebooks in the
notebooks/
directory:
@misc{buyl2024largelanguagemodelsreflect,
title={Large Language Models Reflect the Ideology of their Creators},
author={Maarten Buyl and Alexander Rogiers and Sander Noels and Iris Dominguez-Catena and Edith Heiter and Raphael Romero and Iman Johary and Alexandru-Cristian Mara and Jefrey Lijffijt and Tijl De Bie},
year={2024},
eprint={2410.18417},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.18417},
}
- Maarten Buyl (*‡) - Ghent University, Belgium
- Alexander Rogiers (†) - Ghent University, Belgium
- Sander Noels (†) - Ghent University, Belgium
- Guillaume Bied - Ghent University, Belgium
- Iris Dominguez-Catena - Public University of Navarre, Spain
- Edith Heiter - Ghent University, Belgium
- Iman Johary - Ghent University, Belgium
- Alexandru-Cristian Mara - Ghent University, Belgium
- Raphael Romero - Ghent University, Belgium
- Jefrey Lijffijt - Ghent University, Belgium
- Tijl De Bie - Ghent University, Belgium
* Corresponding author: [email protected]
† These authors contributed equally to this work
‡ Lead author
-
Ghent University
Department of Electronics and Information Systems
IDLab
Technologiepark-Zwijnaarde 122
9052 Ghent, Belgium -
Public University of Navarre
Department of Statistics, Computer Science and Mathematics
31006 Pamplona, Spain
For questions or issues, please:
- Open an issue in this repository
- Contact one of the corresponding authors: [email protected], [email protected] or [email protected]