Skip to content

Commit

Permalink
Update version to v0.0.80
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Actions committed Sep 17, 2024
1 parent 4161fb5 commit 6803b13
Show file tree
Hide file tree
Showing 7 changed files with 294 additions and 99 deletions.
17 changes: 9 additions & 8 deletions docs/deployment/self-deployment/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ title: Self-deployment
slug: overview
---

Mistral AI provides ready-to-use Docker images on the Github registry. The weights are distributed separately.
Mistral AI models can be self-deployed on your own infrastructure through various
inference engines. We recommend using [vLLM](https://vllm.readthedocs.io/), a
highly-optimized Python-only serving framework which can exponse an OpenAI-compatible
API.

To run these images, you need a cloud virtual machine matching the requirements for a given model. These requirements can be found in the [model description](/getting-started/models).
Other inference engine alternatives include
[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and
[TGI](https://huggingface.co/docs/text-generation-inference/index).

We recommend three different serving frameworks for our models :
- [vLLM](https://vllm.readthedocs.io/): A python only serving framework which deploys an API matching OpenAI's spec. vLLM provides paged attention kernel to improve serving throughput.
- NVidias's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) served with Nvidia's [Triton Inference Server](https://github.com/triton-inference-server) : TensorRT-LLM provides a DSL to build fast inference engines with dedicated kernels for large language models. Triton Inference Server allows efficient serving of these inference engines.
- [TGI](https://huggingface.co/docs/text-generation-inference/index): A toolkit for deploying LLMs, including OpenAI's spec, grammars, production monitoring, and tools functionality.

These images can be run locally, or on your favorite cloud provider, using [SkyPilot](https://skypilot.readthedocs.io/en/latest/).
You can also leverage specific tools to facilitate infrastructure management, such as
[SkyPilot](https://skypilot.readthedocs.io) or [Cerebrium](https://www.cerebrium.ai).
Loading

0 comments on commit 6803b13

Please sign in to comment.