Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 444 Bytes

README.md

File metadata and controls

5 lines (4 loc) · 444 Bytes

LLM_Next_Word_Prediction

Code for next word prediction training based on the BookMIA dataset. This is part of the code for tests done of the work "Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?"

Dataset : https://huggingface.co/datasets/swj0419/BookMIA/viewer Here the language model are trained with next token prediction for older books so they appear as a copyright text in the model.