Introducing "Number Token Loss" (NTL) for language models to improve numerical reasoning by using regression-based loss functions that account for the proximity of numbers, achieving better performance on math tasks without increasing computational overhead.
- Requires Python 3.9 or higher
- Install the required packages
pip install -r requirements.txt
- Log into wandb in the terminal
Enter you username and auth token (wandb.ai/auth)
wandb login
- Start a docker container with the transformers image
docker run --name container_name --gpus <device_number> -v /home/students/code/<name>/path_to_code:/app/data -it huggingface/transformers-pytorch-gpu
- Inside the container, interactively set the transformers library to version 4.42.4 and install wandb and hydra
pip install transformers==4.42.4 pip install wandb pip install hydra-core
- Log into wandb in the terminal
Enter you username and auth token (wandb.ai/auth)
wandb login
- The main script is src.run_language_modeling.py.
- The Arguments are configured via Hydra (Yadan, Omry. Hydra - A framework for elegantly configuring complex applications. Github, 2019. Available at: https://github.com/facebookresearch/hydra.)
- Therefore the script can be called via
export PYTHONPATH=".:src/" python src/run_language_modeling.py dataset_args=<gsm8k or mathematics_dataset, default mathematics_dataset> model_args=<rt, rt_ntl, vanilla_t5, vanilla_t5_ntl, xval> training_args=<eval or train>
- You can override the default config via the command line, e.g.
or override them in the config/run_specific_config/config.yaml file.
python src/run_language_modeling.py model_args=vanilla_t5 training_args=train training_args.per_device_train_batch_size=8
- For debugging, you can use the config/run_specific_config/debug_config.yaml file via
python src/run_language_modeling.py model_args=vanilla_t5 training_args=train run_specific_config@_global_=debug_config
- For running in nohup mode, use
nohup python src/run_language_modeling.py dataset_args=mathematics_dataset model_args=vanilla_t5 training_args=train >logs/log_<run_name>.txt &
- Get the data from https://console.cloud.google.com/storage/browser/mathematics-dataset;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false
- Execute create_data_splits.py
- Put the .txt files under data/mathematics_dataset-v1.0/
- Execute the run_language_modeling.py script with the following arguments:
- Standard T5:
python src/run_language_modeling.py model_args=vanilla_t5 +training_args.max_steps=1050000
- Standard T5 + NTL-MSE:
python src/run_language_modeling.py model_args=vanilla_t5_ntl +training_args.max_steps=1050000
- Standard T5 + NTL-WAS:
python src/run_language_modeling.py model_args=vanilla_t5_ntl model_args.number_token_loss_with_wasserstein=true +training_args.max_steps=1050000
- RT:
python src/run_language_modeling.py model_args=rt +training_args.max_steps=1050000
- RT + NTL-MSE:
python src/run_language_modeling.py model_args=rt_ntl +training_args.max_steps=1050000
- xVal:
python src/xval/train.py
For evaluating instead of training a model, add those two parameters to the respective python command: training_args=eval model_args.model_name_or_path=<path to checkpoint file>
e.g for Standard T5 + NTL-WAS:
python src/run_language_modeling.py model_args=vanilla_t5_ntl model_args.number_token_loss_with_wasserstein=true training_args=eval model_args.model_name_or_path=<path to checkpoint file>
If you use this work, please cite:
@inproceedings{zausinger24regress,
title={Regress, Don't Guess--A Regression-like Loss on Number Tokens for Language Models},
author={Zausinger, Jonas and Pennig, Lars and Chlodny, Kacper and Limbach, Vincent and Ketteler, Anna and Prein, Thorben and Singh, Vishwa Mohan and Danziger, Michael and Born, Jannis},
booktitle={The 4th Workshop on Mathematical Reasoning and AI at NeurIPS'24},
year={2024}
}