This repository contains code and figures for our paper Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? by Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman and Sanmi Koyejo.
Setup | Usage | Contributing | Citing | Contact
- (Optional) Update conda:
conda update -n base -c defaults conda -y
- Create and activate the conda environment:
conda env create --file environment.yml -y && conda activate elusive
- Update pip.
pip install --upgrade pip
- Install some additional packages:
pip install bitsandbytes sentencepiece
- (Optional) To run evals, initialize EleutherAI's
lm-evaluation-harness
:
git submodule update --init --recursive
Change into the directory and install lm-evaluation-harness
:
cd submodules/lm-evaluation-harness && pip install -e . && cd ../..
- Login to
wandb
:
wandb login
Data will be provided once the paper is accepted and published. For early access, please contact the authors see Contact below.
This project's code has three broad stages:
- Collecting Language Model Scores on NLP Benchmarks: Running language model families on standard LLM benchmarks and collating the per-sample results.
- Computing Compute-Score Correlations: For each 4-tuple of
(language model family, NLP benchmark, correlation metric, performance score)
, we compute the per-sample correlations between scores and compute over the model family. This is done using scripts/compute_correlations_between_sample_scores_and_compute.py and W&B sweeps - Analyzing Compute-Score Correlations: We analyze the results of the correlations in the paper and generate figures using the Python scripts in notebooks.
Contributions are welcome! Please format your code with black.
To cite this work, please use:
@misc{schaeffer2024predictingdownstreamcapabilitiesfrontier,
title={Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?},
author={Rylan Schaeffer and Hailey Schoelkopf and Brando Miranda and Gabriel Mukobi and Varun Madan and Adam Ibrahim and Herbie Bradley and Stella Biderman and Sanmi Koyejo},
year={2024},
eprint={2406.04391},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.04391},
}
Note: We created a new clean repository for the review process; thus, this repo's commit history is not representative of each individual's contributions.
Questions? Comments? Interested in collaborating? Open an issue or email [email protected] and [email protected].