LLM_Next_Word_Prediction

Code for next word prediction training based on the BookMIA dataset. This is part of the code for tests done of the work "Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?"

Dataset : https://huggingface.co/datasets/swj0419/BookMIA/viewer Here the language model are trained with next token prediction for older books so they appear as a copyright text in the model.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Datasets		Datasets
Finetuned_Models		Finetuned_Models
Finetuning		Finetuning
.gitignore		.gitignore
README.md		README.md
Trained_Models		Trained_Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM_Next_Word_Prediction

About

Releases

Packages

Languages

pankayaraj/LLM_Next_Word_Prediction

Folders and files

Latest commit

History

Repository files navigation

LLM_Next_Word_Prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages