From 0a298e6e5e47f0bf91771abca5f76edc6568e549 Mon Sep 17 00:00:00 2001 From: kerthcet Date: Mon, 2 Dec 2024 12:06:48 +0800 Subject: [PATCH] Update Features of Overview Signed-off-by: kerthcet --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9f23927..d7267c0 100644 --- a/README.md +++ b/README.md @@ -32,12 +32,12 @@ Easy, advanced inference platform for large language models on Kubernetes - **Easy of Use**: People can quick deploy a LLM service with minimal configurations. - **Broad Backends Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md). -- **Model Distribution**: Out-of-the-box model cache system with [Manta](https://github.com/InftyAI/Manta). +- **Efficient Model Distribution**: Out-of-the-box model cache system support with [Manta](https://github.com/InftyAI/Manta). - **Accelerator Fungibility**: llmaz supports serving the same LLM with various accelerators to optimize cost and performance. - **SOTA Inference**: llmaz supports the latest cutting-edge researches like [Speculative Decoding](https://arxiv.org/abs/2211.17192) or [Splitwise](https://arxiv.org/abs/2311.18677)(WIP) to run on Kubernetes. - **Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores. llmaz will automatically handle the model loading, requiring no effort from users. - **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 0. -- **Scaling Efficiency (WIP)**: llmaz works smoothly with autoscaling components like [Cluster-Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to meet elastic demands. +- **Scaling Efficiency (WIP)**: llmaz works smoothly with autoscaling components like [Cluster-Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to satisfy elastic needs. ## Quick Start