Sentence Transformers in Burn

This library provides an implementation of the Sentence Transformers framework for computing text representations as vector embeddings in Rust. Specifically, it uses the Burn deep learning library to implement the BERT model. Using Burn, this can be combined with any supported backend for fast, efficient, cross-platform inference on CPUs and GPUs. ST-Burn supports any state-of-the-art model that implements the BERT architecture.

Currently inference-only for now.

Features

Import models via safetensors (using Candle). 📦
Code structure replicates the official Huggingface BertModel implementation. 🚀
Flexible inference backend using Burn. 🔧

Installation

sentence-transformers-burn can be installed from source.

cargo add --git https://github.com/tvergho/sentence-transformers-burn.git sentence_transformers

Run cargo build to make sure everything can be correctly built.

cargo build

Note that building the burn-tch dependency may require manually linking Libtorch. After installing via pip:

export LIBTORCH=$(python3 -c 'import torch; from pathlib import Path; print(Path(torch.__file__).parent)')
# /path/to/torch

export DYLD_LIBRARY_PATH=/path/to/torch/lib

Python dependencies (for running the scripts in scripts/) should also be installed.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

A BertModel can be loaded and initialized from a file, as in the example below:

use sentence_transformers::bert_loader::{load_model_from_safetensors, load_config_from_json};
use sentence_transformers::model::{
  bert_embeddings::BertEmbeddingsInferenceBatch,
  bert_model::BertModel,
};
use burn_tch::{TchBackend, TchDevice};
use burn::tensor::Tensor;

const BATCH_SIZE: u64 = 64;

let device = TchDevice::Cpu;
let config = load_config_from_json("model/bert_config.json");
let model: BertModel<_> = load_model_from_safetensors::<TchBackend<f32>>("model/bert_model.safetensors", &device, config);

let batch = BertEmbeddingsInferenceBatch {
  tokens: Tensor::zeros(vec![BATCH_SIZE, 256]).to_device(&device.clone()),
  mask_attn: Some(Tensor::ones(vec![BATCH_SIZE, 256]).to_device(&device.clone()))
};

model.forward(batch); // [batch_size, seq_len, n_dims]

sentence-transformers-burn also comes with a built-in inference server. To start, simply run:

cargo run --release --bin server -- path/to/model/dir

The model directory should contain a bert_model.safetensors and bert_config.json file. Once the server is running, inference can be initiated via POST request:

POST http://localhost:3030/embed

{
  "input_ids": [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]],
  "attention_mask": [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
}

This will return a 3D array of floats of size [batch_size, seq_len, n_dims].

Testing

Tests can be run to verify that the Rust model output matches a comparable Huggingface model. To save a model to use during testing, run python scripts/prepare_test.py. Then, simply:

cargo run test

To Do

Cleaner model import (directly from safetensors/config.json)
Proper documentation and more testing
More model usage options (e.g. classification, NER, question answering heads)
GGML backend/quantization

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
dependencies		dependencies
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Transformers in Burn

Features

Installation

Usage

Testing

To Do

About

Releases

Packages

Languages

License

tvergho/sentence-transformers-burn

Folders and files

Latest commit

History

Repository files navigation

Sentence Transformers in Burn

Features

Installation

Usage

Testing

To Do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages