Skip to content

HPLT Bitextor Models v1

Latest
Compare
Choose a tag to compare
@onadegibert onadegibert released this 01 Oct 07:23
· 18 commits to main since this release

This release includes the first version (v1) of fast Machine Translation (MT) models specifically designed for integration with the Bitextor pipeline. These models were developed in 2023, focusing on optimizing translation speed and efficiency for large-scale parallel corpus generation.

For more details on the underlying dataset and technologies used in this work, please refer to our paper:

Citation: de Gibert, O., Nail, G., Arefyev, N., Ba{~n}{'o}n, M., van der Linde, J., Ji, S., Zaragoza-Bernabeu, J., Aulamo, M., Ram{'\i}rez-S{'a}nchez, G., Kutuzov, A., Pyysalo, S., Oepen, S., & Tiedemann, J. (2024). "A New Massive Multilingual Dataset for High-Performance Language Technologies". In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia: ELRA and ICCL. Link.