Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

Introducing "Number Token Loss" (NTL) for language models to improve numerical reasoning by using regression-based loss functions that account for the proximity of numbers, achieving better performance on math tasks without increasing computational overhead.

Setup

Via Python

Requires Python 3.9 or higher
Install the required packages
```
pip install -r requirements.txt
```
Log into wandb in the terminal
```
wandb login
```
Enter you username and auth token (wandb.ai/auth)

Via Docker

Start a docker container with the transformers image

docker run --name container_name --gpus <device_number> -v /home/students/code/<name>/path_to_code:/app/data -it huggingface/transformers-pytorch-gpu

Inside the container, interactively set the transformers library to version 4.42.4 and install wandb and hydra
```
pip install transformers==4.42.4
pip install wandb
pip install hydra-core
```
Log into wandb in the terminal
```
wandb login
```
Enter you username and auth token (wandb.ai/auth)

Training

The main script is src.run_language_modeling.py.

The Arguments are configured via Hydra (Yadan, Omry. Hydra - A framework for elegantly configuring complex applications. Github, 2019. Available at: https://github.com/facebookresearch/hydra.)

Therefore the script can be called via

export PYTHONPATH=".:src/"
python src/run_language_modeling.py dataset_args=<gsm8k or mathematics_dataset, default mathematics_dataset>
                                    model_args=<rt, rt_ntl, vanilla_t5, vanilla_t5_ntl, xval>
                                    training_args=<eval or train>

You can override the default config via the command line, e.g.
```
python src/run_language_modeling.py model_args=vanilla_t5 training_args=train training_args.per_device_train_batch_size=8
```
or override them in the config/run_specific_config/config.yaml file.

For debugging, you can use the config/run_specific_config/debug_config.yaml file via

python src/run_language_modeling.py model_args=vanilla_t5 training_args=train run_specific_config@_global_=debug_config

For running in nohup mode, use

nohup python src/run_language_modeling.py dataset_args=mathematics_dataset model_args=vanilla_t5 training_args=train >logs/log_<run_name>.txt &

Reproduce our results

Get the data from https://console.cloud.google.com/storage/browser/mathematics-dataset;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false
Execute create_data_splits.py
Put the .txt files under data/mathematics_dataset-v1.0/
Execute the run_language_modeling.py script with the following arguments:

Standard T5:

python src/run_language_modeling.py model_args=vanilla_t5 +training_args.max_steps=1050000

Standard T5 + NTL-MSE:

python src/run_language_modeling.py model_args=vanilla_t5_ntl +training_args.max_steps=1050000

Standard T5 + NTL-WAS:

python src/run_language_modeling.py model_args=vanilla_t5_ntl  model_args.number_token_loss_with_wasserstein=true +training_args.max_steps=1050000

RT:

python src/run_language_modeling.py model_args=rt +training_args.max_steps=1050000

RT + NTL-MSE:

python src/run_language_modeling.py model_args=rt_ntl +training_args.max_steps=1050000

xVal:
```
python src/xval/train.py
```

For evaluating instead of training a model, add those two parameters to the respective python command: training_args=eval model_args.model_name_or_path=<path to checkpoint file> e.g for Standard T5 + NTL-WAS:

python src/run_language_modeling.py model_args=vanilla_t5_ntl  model_args.number_token_loss_with_wasserstein=true training_args=eval model_args.model_name_or_path=<path to checkpoint file>

Citation

If you use this work, please cite:

@inproceedings{zausinger24regress,
  title={Regress, Don't Guess--A Regression-like Loss on Number Tokens for Language Models},
  author={Zausinger, Jonas and Pennig, Lars and Chlodny, Kacper and Limbach, Vincent and Ketteler, Anna and Prein, Thorben and Singh, Vishwa Mohan and Danziger, Michael and Born, Jannis},
  booktitle={The 4th Workshop on Mathematical Reasoning and AI at NeurIPS'24},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github/workflows		.github/workflows
config		config
data		data
resources		resources
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
regression_transformer_number_tokens.txt		regression_transformer_number_tokens.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

Setup

Via Python

Via Docker

Training

Reproduce our results

Citation

About

Releases 1

Packages

Contributors 4

Languages

tum-ai/number-token-loss

Folders and files

Latest commit

History

Repository files navigation

Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

Setup

Via Python

Via Docker

Training

Reproduce our results

Citation

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages