Skip to content

Practice Implementation of the Byte-Pair-Encoding

Notifications You must be signed in to change notification settings

janniss91/BPE-practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Byte-Pair Encoding

A primitive but functional implementation of the Byte-Pair encoding in Python. Note that the training algorithm is rather slow, however, it speeds up as pieces merge.

Running

To get a Byte-Pair encoding from your corpus, you must provide a single plain text file.

Example:

python3 bpe.py file.txt 100

Arguments:

  1. Your input file
  2. The desired vocabulary size.

No libraries needed

No libraries must be installed to run the script, the standard Python library provides the necessary packages.

About

Practice Implementation of the Byte-Pair-Encoding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages