Here are specific details that are useful when you want to contribute to the BPE crates. Make sure to read the repository's contribution guidelines as well.
This project has a slightly unusual structure to resolve some dependency issues.
- This directory contains
bpe
, the BPE code itself. - A sibling directory contains
bpe-openai
, which exposes tokenizers for OpenAI token sets, and depends onbpe
. - Tests are located in the
tests
subdirectory, and benchmarks in thebenchmarks
subdirectory. Both of these are separate crates so they can depend onbpe-openai
without causing a cyclic dependency.
Only the bpe
and bpe-openai
crates are meant to be published. The other ones are for development use only.
Change the working directory to the benchmarks
directory:
cd benchmarks
Run the benchmark as follows (required cargo-criterion installed):
cargo criterion
(Using cargo bench
ignores the settings in criterion.toml
!)
Open the full report which should be located in target/criterion/reports/index.html
.
Update the figures in this repo as follows (requires rsvg-convert
from librsvg
installed):
script/copy-results