fast-embeddings-api

An OpenAI-like API for Massive Text Embeddings using FastAPI.

How to run

Install the requirements:

pip install -r requirements.txt

Start the server by defining a model with MODEL variable. You can also add a DEVICE variable which can be cuda or cpu:

MODEL=BAAI/bge-base-en-v1.5 DEVICE=cuda python -m src.server

By default, it runs in 0.0.0.0:8000. You can change this by defining variables HOST and/or PORT.

The first time you run the server it will create a model_auto_opt_OX folder, where X=3 or 4 depending on the device. This folder contains the optimized ONNX version of your model. For the next runs, it will use that optimized model. If you want to regenerate the folder, you can use the variable RELOAD=True

Docker image

You can run the application using Docker with the following command:

docker run -it --gpus all -p 8000:8000 -e MODEL=BAAI/bge-base-en-v1.5 -e DEVICE=cuda vokturz/fast-embeddings-api

MTEB Benchmark

You can check the best embeddings model on https://huggingface.co/spaces/mteb/leaderboard

Credits

Mainly based in Optimum-Benchmark x MTEB by HuggingFace and limcheekin/open-text-embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src/server		src/server
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test_server.ipynb		test_server.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fast-embeddings-api

How to run

Docker image

MTEB Benchmark

Credits

About

Releases

Packages

Languages

Vokturz/fast-embeddings-api

Folders and files

Latest commit

History

Repository files navigation

fast-embeddings-api

How to run

Docker image

MTEB Benchmark

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages