Update version to v0.0.80

mistralai · Sep 17, 2024 · 6803b13 · 6803b13
1 parent 4161fb5
commit 6803b13
Show file tree

Hide file tree

Showing 7 changed files with 294 additions and 99 deletions.
diff --git a/docs/deployment/self-deployment/overview.mdx b/docs/deployment/self-deployment/overview.mdx
@@ -4,13 +4,14 @@ title: Self-deployment
 slug: overview
 ---
 
-Mistral AI provides ready-to-use Docker images on the Github registry. The weights are distributed separately.
+Mistral AI models can be self-deployed on your own infrastructure through various
+inference engines. We recommend using [vLLM](https://vllm.readthedocs.io/), a
+highly-optimized Python-only serving framework which can exponse an OpenAI-compatible
+API.
 
-To run these images, you need a cloud virtual machine matching the requirements for a given model. These requirements can be found in the [model description](/getting-started/models).
+Other inference engine alternatives include 
+[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and
+[TGI](https://huggingface.co/docs/text-generation-inference/index).
 
-We recommend three different serving frameworks for our models :
-- [vLLM](https://vllm.readthedocs.io/): A python only serving framework which deploys an API matching OpenAI's spec. vLLM provides paged attention kernel to improve serving throughput.
-- NVidias's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) served with Nvidia's [Triton Inference Server](https://github.com/triton-inference-server) : TensorRT-LLM provides a DSL to build fast inference engines with dedicated kernels for large language models. Triton Inference Server allows efficient serving of these inference engines.
-- [TGI](https://huggingface.co/docs/text-generation-inference/index): A toolkit for deploying LLMs, including OpenAI's spec, grammars, production monitoring, and tools functionality.
-
-These images can be run locally, or on your favorite cloud provider, using [SkyPilot](https://skypilot.readthedocs.io/en/latest/).
+You can also leverage specific tools to facilitate infrastructure management, such as 
+[SkyPilot](https://skypilot.readthedocs.io) or [Cerebrium](https://www.cerebrium.ai).