InftyAI · InftyAI-Agent · Nov 12, 2024 · Nov 12, 2024
diff --git a/README.md b/README.md
@@ -22,17 +22,22 @@ Easy, advanced inference platform for large language models on Kubernetes
 
 ## Architecture
 
-![image](./docs/assets/arch.png)
+<p align="center">
+  <picture>
+    <img alt="architecture" src="https://raw.githubusercontent.com/inftyai/llmaz/main/docs/assets/arch.png" width=100%>
+  </picture>
+</p>
 
 ## Features Overview
 
 - **Easy of Use**: People can quick deploy a LLM service with minimal configurations.
-- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp), [ollama](https://github.com/ollama/ollama). Find the full list of supported backends [here](./docs/support-backends.md).
+- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
+- **Model Distribution**: Out-of-the-box model cache system with [Manta](https://github.com/InftyAI/Manta).
 - **Scaling Efficiency (WIP)**: llmaz works smoothly with autoscaling components like [Cluster-Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios.
 - **Accelerator Fungibility (WIP)**: llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
 - **SOTA Inference**: llmaz supports the latest cutting-edge researches like [Speculative Decoding](https://arxiv.org/abs/2211.17192) or [Splitwise](https://arxiv.org/abs/2311.18677)(WIP) to run on Kubernetes.
 - **Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
-- **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 1.
+- **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 0.
 
 ## Quick Start
 

diff --git a/docs/assets/arch.png b/docs/assets/arch.png
diff --git a/docs/assets/overview.png b/docs/assets/overview.png
diff --git a/pkg/controller_helper/backendruntime.go b/pkg/controller_helper/backendruntime.go
@@ -46,6 +46,7 @@ func (p *BackendRuntimeParser) Envs() []corev1.EnvVar {
 }
 
 func (p *BackendRuntimeParser) Args(mode InferenceMode, models []*coreapi.OpenModel) ([]string, error) {
+	// TODO: add validation in webhook.
 	if mode == SpeculativeDecodingInferenceMode && len(models) != 2 {
 		return nil, fmt.Errorf("models number not right, want 2, got %d", len(models))
 	}

diff --git a/test/e2e/suit_test.go b/test/e2e/suit_test.go
@@ -104,7 +104,7 @@ func readyForTesting(client client.Client) {
 	}, timeout, interval).Should(Succeed())
 
 	// Delete this model before beginning tests.
-	Expect(client.Delete(ctx, model))
+	Expect(client.Delete(ctx, model)).To(Succeed())
 	Eventually(func() error {
 		return client.Get(ctx, types.NamespacedName{Name: model.Name}, &coreapi.OpenModel{})
 	}).ShouldNot(Succeed())