diff --git a/README.md b/README.md index 530feb8..bcb66b2 100644 --- a/README.md +++ b/README.md @@ -50,24 +50,28 @@ | ---- | ---- | ---- | ---- | ---- | ---- | | **[DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII)** | ![Stars](https://img.shields.io/github/stars/microsoft/deepspeed-mii.svg) | ![Release](https://img.shields.io/github/release/microsoft/deepspeed-mii) | ![Contributors](https://img.shields.io/github/contributors/microsoft/deepspeed-mii) | MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. | | | **[Inference](https://github.com/roboflow/inference)** | ![Stars](https://img.shields.io/github/stars/roboflow/inference.svg) | ![Release](https://img.shields.io/github/release/roboflow/inference) | ![Contributors](https://img.shields.io/github/contributors/roboflow/inference) | A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. | vision | -| **[ipex-llm](https://github.com/intel-analytics/ipex-llm)** | ![Stars](https://img.shields.io/github/stars/intel-analytics/ipex-llm.svg) | ![Release](https://img.shields.io/github/release/intel-analytics/ipex-llm) | ![Contributors](https://img.shields.io/github/contributors/intel-analytics/ipex-llm) | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | edge | +| **[ipex-llm](https://github.com/intel-analytics/ipex-llm)** | ![Stars](https://img.shields.io/github/stars/intel-analytics/ipex-llm.svg) | ![Release](https://img.shields.io/github/release/intel-analytics/ipex-llm) | ![Contributors](https://img.shields.io/github/contributors/intel-analytics/ipex-llm) | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | device | | **[llmaz](https://github.com/InftyAI/llmaz)** | ![Stars](https://img.shields.io/github/stars/inftyai/llmaz.svg) | ![Release](https://img.shields.io/github/release/inftyai/llmaz) | ![Contributors](https://img.shields.io/github/contributors/inftyai/llmaz) | ☸️ Effortlessly serve state-of-the-art LLMs on Kubernetes. | | | **[LMDeploy](https://github.com/InternLM/lmdeploy)** | ![Stars](https://img.shields.io/github/stars/internlm/lmdeploy.svg) | ![Release](https://img.shields.io/github/release/internlm/lmdeploy) | ![Contributors](https://img.shields.io/github/contributors/internlm/lmdeploy) | LMDeploy is a toolkit for compressing, deploying, and serving LLMs. | | | **[MaxText](https://github.com/google/maxtext)** | ![Stars](https://img.shields.io/github/stars/google/maxtext.svg) | ![Release](https://img.shields.io/github/release/google/maxtext) | ![Contributors](https://img.shields.io/github/contributors/google/maxtext) | A simple, performant and scalable Jax LLM! | Jax | -| **[llama.cpp](https://github.com/ggerganov/llama.cpp)** | ![Stars](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg) | ![Release](https://img.shields.io/github/release/ggerganov/llama.cpp) | ![Contributors](https://img.shields.io/github/contributors/ggerganov/llama.cpp) | LLM inference in C/C++ | edge | +| **[llama.cpp](https://github.com/ggerganov/llama.cpp)** | ![Stars](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg) | ![Release](https://img.shields.io/github/release/ggerganov/llama.cpp) | ![Contributors](https://img.shields.io/github/contributors/ggerganov/llama.cpp) | LLM inference in C/C++ | device | | **[MInference](https://github.com/microsoft/minference)** | ![Stars](https://img.shields.io/github/stars/microsoft/minference.svg) | ![Release](https://img.shields.io/github/release/microsoft/minference) | ![Contributors](https://img.shields.io/github/contributors/microsoft/minference) | To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | | | **[MLC LLM](https://github.com/mlc-ai/mlc-llm)** | ![Stars](https://img.shields.io/github/stars/mlc-ai/mlc-llm.svg) | ![Release](https://img.shields.io/github/release/mlc-ai/mlc-llm) | ![Contributors](https://img.shields.io/github/contributors/mlc-ai/mlc-llm) | Universal LLM Deployment Engine with ML Compilation | | | **[MLServer](https://github.com/SeldonIO/MLServer)** | ![Stars](https://img.shields.io/github/stars/SeldonIO/MLServer.svg) | ![Release](https://img.shields.io/github/release/SeldonIO/MLServer) | ![Contributors](https://img.shields.io/github/contributors/SeldonIO/MLServer) | MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's V2 Dataplane spec. | | | **[Nanoflow](https://github.com/efeslab/Nanoflow)** | ![Stars](https://img.shields.io/github/stars/efeslab/nanoflow.svg) | ![Release](https://img.shields.io/github/release/efeslab/nanoflow) | ![Contributors](https://img.shields.io/github/contributors/efeslab/nanoflow) | A throughput-oriented high-performance serving framework for LLMs | | -| **[Ollama](https://github.com/ollama/ollama)** | ![Stars](https://img.shields.io/github/stars/ollama/ollama.svg) | ![Release](https://img.shields.io/github/release/ollama/ollama) | ![Contributors](https://img.shields.io/github/contributors/ollama/ollama) | Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. | edge | +| **[Ollama](https://github.com/ollama/ollama)** | ![Stars](https://img.shields.io/github/stars/ollama/ollama.svg) | ![Release](https://img.shields.io/github/release/ollama/ollama) | ![Contributors](https://img.shields.io/github/contributors/ollama/ollama) | Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. | device | | **[OpenLLM](https://github.com/bentoml/OpenLLM)** | ![Stars](https://img.shields.io/github/stars/bentoml/openllm.svg) | ![Release](https://img.shields.io/github/release/bentoml/openllm) | ![Contributors](https://img.shields.io/github/contributors/bentoml/openllm) | Operating LLMs in production | | | **[OpenVINO](https://github.com/openvinotoolkit/openvino)** | ![Stars](https://img.shields.io/github/stars/openvinotoolkit/openvino.svg) | ![Release](https://img.shields.io/github/release/openvinotoolkit/openvino) | ![Contributors](https://img.shields.io/github/contributors/openvinotoolkit/openvino) | OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference | | +| **[Ratchet](https://github.com/huggingface/ratchet)** | ![Stars](https://img.shields.io/github/stars/huggingface/ratchet.svg) | ![Release](https://img.shields.io/github/release/huggingface/ratchet) | ![Contributors](https://img.shields.io/github/contributors/huggingface/ratchet) | A cross-platform browser ML framework. | browser | | **[RayServe](https://github.com/ray-project/ray)** | ![Stars](https://img.shields.io/github/stars/ray-project/ray.svg) | ![Release](https://img.shields.io/github/release/ray-project/ray) | ![Contributors](https://img.shields.io/github/contributors/ray-project/ray) | Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. | | | **[RouteLLM](https://github.com/lm-sys/routellm)** | ![Stars](https://img.shields.io/github/stars/lm-sys/routellm.svg) | ![Release](https://img.shields.io/github/release/lm-sys/routellm) | ![Contributors](https://img.shields.io/github/contributors/lm-sys/routellm) | A framework for serving and evaluating LLM routers - save LLM costs without compromising quality. | cost | | **[SGLang](https://github.com/sgl-project/sglang)** | ![Stars](https://img.shields.io/github/stars/sgl-project/sglang.svg) | ![Release](https://img.shields.io/github/release/sgl-project/sglang) | ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang) | SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. | | +| **[transformers.js](https://github.com/xenova/transformers.js)** | ![Stars](https://img.shields.io/github/stars/xenova/transformers.js.svg) | ![Release](https://img.shields.io/github/release/xenova/transformers.js) | ![Contributors](https://img.shields.io/github/contributors/xenova/transformers.js) | State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! | browser | | **[Triton Inference Server](https://github.com/triton-inference-server/server)** | ![Stars](https://img.shields.io/github/stars/triton-inference-server/server.svg) | ![Release](https://img.shields.io/github/release/triton-inference-server/server) | ![Contributors](https://img.shields.io/github/contributors/triton-inference-server/server) | The Triton Inference Server provides an optimized cloud and edge inferencing solution. | | | **[Text Generation Inference](https://github.com/huggingface/text-generation-inference)** | ![Stars](https://img.shields.io/github/stars/huggingface/text-generation-inference.svg) | ![Release](https://img.shields.io/github/release/huggingface/text-generation-inference) | ![Contributors](https://img.shields.io/github/contributors/huggingface/text-generation-inference) | Large Language Model Text Generation Inference | | | **[vLLM](https://github.com/vllm-project/vllm)** | ![Stars](https://img.shields.io/github/stars/vllm-project/vllm.svg) | ![Release](https://img.shields.io/github/release/vllm-project/vllm) | ![Contributors](https://img.shields.io/github/contributors/vllm-project/vllm) | A high-throughput and memory-efficient inference and serving engine for LLMs | | +| **[web-llm](https://github.com/mlc-ai/web-llm)** | ![Stars](https://img.shields.io/github/stars/mlc-ai/web-llm.svg) | ![Release](https://img.shields.io/github/release/mlc-ai/web-llm) | ![Contributors](https://img.shields.io/github/contributors/mlc-ai/web-llm) | A high-throughput and memory-efficient inference and serving engine for LLMs | browser | +| **[zml](https://github.com/zml/zml)** | ![Stars](https://img.shields.io/github/stars/zml/zml.svg) | ![Release](https://img.shields.io/github/release/zml/zml) | ![Contributors](https://img.shields.io/github/contributors/zml/zml) | High performance AI inference stack. Built for production. | | ## Training