Skip to content

Code for next word prediction training based on the BookMIA dataset. This is part of the code for tests done of the work "Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?"

Notifications You must be signed in to change notification settings

pankayaraj/LLM_Next_Word_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM_Next_Word_Prediction

Code for next word prediction training based on the BookMIA dataset. This is part of the code for tests done of the work "Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?"

Dataset : https://huggingface.co/datasets/swj0419/BookMIA/viewer Here the language model are trained with next token prediction for older books so they appear as a copyright text in the model.

About

Code for next word prediction training based on the BookMIA dataset. This is part of the code for tests done of the work "Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages