Skip to content

Commit

Permalink
Add a bunch of browser inference tools
Browse files Browse the repository at this point in the history
Signed-off-by: kerthcet <[email protected]>
  • Loading branch information
kerthcet committed Oct 9, 2024
1 parent ec6077f commit bacc344
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,24 +50,28 @@
| ---- | ---- | ---- | ---- | ---- | ---- |
| **[DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII)** | ![Stars](https://img.shields.io/github/stars/microsoft/deepspeed-mii.svg) | ![Release](https://img.shields.io/github/release/microsoft/deepspeed-mii) | ![Contributors](https://img.shields.io/github/contributors/microsoft/deepspeed-mii) | MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. | |
| **[Inference](https://github.com/roboflow/inference)** | ![Stars](https://img.shields.io/github/stars/roboflow/inference.svg) | ![Release](https://img.shields.io/github/release/roboflow/inference) | ![Contributors](https://img.shields.io/github/contributors/roboflow/inference) | A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. | vision |
| **[ipex-llm](https://github.com/intel-analytics/ipex-llm)** | ![Stars](https://img.shields.io/github/stars/intel-analytics/ipex-llm.svg) | ![Release](https://img.shields.io/github/release/intel-analytics/ipex-llm) | ![Contributors](https://img.shields.io/github/contributors/intel-analytics/ipex-llm) | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | edge |
| **[ipex-llm](https://github.com/intel-analytics/ipex-llm)** | ![Stars](https://img.shields.io/github/stars/intel-analytics/ipex-llm.svg) | ![Release](https://img.shields.io/github/release/intel-analytics/ipex-llm) | ![Contributors](https://img.shields.io/github/contributors/intel-analytics/ipex-llm) | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | device |
| **[llmaz](https://github.com/InftyAI/llmaz)** | ![Stars](https://img.shields.io/github/stars/inftyai/llmaz.svg) | ![Release](https://img.shields.io/github/release/inftyai/llmaz) | ![Contributors](https://img.shields.io/github/contributors/inftyai/llmaz) | ☸️ Effortlessly serve state-of-the-art LLMs on Kubernetes. | |
| **[LMDeploy](https://github.com/InternLM/lmdeploy)** | ![Stars](https://img.shields.io/github/stars/internlm/lmdeploy.svg) | ![Release](https://img.shields.io/github/release/internlm/lmdeploy) | ![Contributors](https://img.shields.io/github/contributors/internlm/lmdeploy) | LMDeploy is a toolkit for compressing, deploying, and serving LLMs. | |
| **[MaxText](https://github.com/google/maxtext)** | ![Stars](https://img.shields.io/github/stars/google/maxtext.svg) | ![Release](https://img.shields.io/github/release/google/maxtext) | ![Contributors](https://img.shields.io/github/contributors/google/maxtext) | A simple, performant and scalable Jax LLM! | Jax |
| **[llama.cpp](https://github.com/ggerganov/llama.cpp)** | ![Stars](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg) | ![Release](https://img.shields.io/github/release/ggerganov/llama.cpp) | ![Contributors](https://img.shields.io/github/contributors/ggerganov/llama.cpp) | LLM inference in C/C++ | edge |
| **[llama.cpp](https://github.com/ggerganov/llama.cpp)** | ![Stars](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg) | ![Release](https://img.shields.io/github/release/ggerganov/llama.cpp) | ![Contributors](https://img.shields.io/github/contributors/ggerganov/llama.cpp) | LLM inference in C/C++ | device |
| **[MInference](https://github.com/microsoft/minference)** | ![Stars](https://img.shields.io/github/stars/microsoft/minference.svg) | ![Release](https://img.shields.io/github/release/microsoft/minference) | ![Contributors](https://img.shields.io/github/contributors/microsoft/minference) | To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | |
| **[MLC LLM](https://github.com/mlc-ai/mlc-llm)** | ![Stars](https://img.shields.io/github/stars/mlc-ai/mlc-llm.svg) | ![Release](https://img.shields.io/github/release/mlc-ai/mlc-llm) | ![Contributors](https://img.shields.io/github/contributors/mlc-ai/mlc-llm) | Universal LLM Deployment Engine with ML Compilation | |
| **[MLServer](https://github.com/SeldonIO/MLServer)** | ![Stars](https://img.shields.io/github/stars/SeldonIO/MLServer.svg) | ![Release](https://img.shields.io/github/release/SeldonIO/MLServer) | ![Contributors](https://img.shields.io/github/contributors/SeldonIO/MLServer) | MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's V2 Dataplane spec. | |
| **[Nanoflow](https://github.com/efeslab/Nanoflow)** | ![Stars](https://img.shields.io/github/stars/efeslab/nanoflow.svg) | ![Release](https://img.shields.io/github/release/efeslab/nanoflow) | ![Contributors](https://img.shields.io/github/contributors/efeslab/nanoflow) | A throughput-oriented high-performance serving framework for LLMs | |
| **[Ollama](https://github.com/ollama/ollama)** | ![Stars](https://img.shields.io/github/stars/ollama/ollama.svg) | ![Release](https://img.shields.io/github/release/ollama/ollama) | ![Contributors](https://img.shields.io/github/contributors/ollama/ollama) | Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. | edge |
| **[Ollama](https://github.com/ollama/ollama)** | ![Stars](https://img.shields.io/github/stars/ollama/ollama.svg) | ![Release](https://img.shields.io/github/release/ollama/ollama) | ![Contributors](https://img.shields.io/github/contributors/ollama/ollama) | Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. | device |
| **[OpenLLM](https://github.com/bentoml/OpenLLM)** | ![Stars](https://img.shields.io/github/stars/bentoml/openllm.svg) | ![Release](https://img.shields.io/github/release/bentoml/openllm) | ![Contributors](https://img.shields.io/github/contributors/bentoml/openllm) | Operating LLMs in production | |
| **[OpenVINO](https://github.com/openvinotoolkit/openvino)** | ![Stars](https://img.shields.io/github/stars/openvinotoolkit/openvino.svg) | ![Release](https://img.shields.io/github/release/openvinotoolkit/openvino) | ![Contributors](https://img.shields.io/github/contributors/openvinotoolkit/openvino) | OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference | |
| **[Ratchet](https://github.com/huggingface/ratchet)** | ![Stars](https://img.shields.io/github/stars/huggingface/ratchet.svg) | ![Release](https://img.shields.io/github/release/huggingface/ratchet) | ![Contributors](https://img.shields.io/github/contributors/huggingface/ratchet) | A cross-platform browser ML framework. | browser |
| **[RayServe](https://github.com/ray-project/ray)** | ![Stars](https://img.shields.io/github/stars/ray-project/ray.svg) | ![Release](https://img.shields.io/github/release/ray-project/ray) | ![Contributors](https://img.shields.io/github/contributors/ray-project/ray) | Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. | |
| **[RouteLLM](https://github.com/lm-sys/routellm)** | ![Stars](https://img.shields.io/github/stars/lm-sys/routellm.svg) | ![Release](https://img.shields.io/github/release/lm-sys/routellm) | ![Contributors](https://img.shields.io/github/contributors/lm-sys/routellm) | A framework for serving and evaluating LLM routers - save LLM costs without compromising quality. | cost |
| **[SGLang](https://github.com/sgl-project/sglang)** | ![Stars](https://img.shields.io/github/stars/sgl-project/sglang.svg) | ![Release](https://img.shields.io/github/release/sgl-project/sglang) | ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang) | SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. | |
| **[transformers.js](https://github.com/xenova/transformers.js)** | ![Stars](https://img.shields.io/github/stars/xenova/transformers.js.svg) | ![Release](https://img.shields.io/github/release/xenova/transformers.js) | ![Contributors](https://img.shields.io/github/contributors/xenova/transformers.js) | State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! | browser |
| **[Triton Inference Server](https://github.com/triton-inference-server/server)** | ![Stars](https://img.shields.io/github/stars/triton-inference-server/server.svg) | ![Release](https://img.shields.io/github/release/triton-inference-server/server) | ![Contributors](https://img.shields.io/github/contributors/triton-inference-server/server) | The Triton Inference Server provides an optimized cloud and edge inferencing solution. | |
| **[Text Generation Inference](https://github.com/huggingface/text-generation-inference)** | ![Stars](https://img.shields.io/github/stars/huggingface/text-generation-inference.svg) | ![Release](https://img.shields.io/github/release/huggingface/text-generation-inference) | ![Contributors](https://img.shields.io/github/contributors/huggingface/text-generation-inference) | Large Language Model Text Generation Inference | |
| **[vLLM](https://github.com/vllm-project/vllm)** | ![Stars](https://img.shields.io/github/stars/vllm-project/vllm.svg) | ![Release](https://img.shields.io/github/release/vllm-project/vllm) | ![Contributors](https://img.shields.io/github/contributors/vllm-project/vllm) | A high-throughput and memory-efficient inference and serving engine for LLMs | |
| **[web-llm](https://github.com/mlc-ai/web-llm)** | ![Stars](https://img.shields.io/github/stars/mlc-ai/web-llm.svg) | ![Release](https://img.shields.io/github/release/mlc-ai/web-llm) | ![Contributors](https://img.shields.io/github/contributors/mlc-ai/web-llm) | A high-throughput and memory-efficient inference and serving engine for LLMs | browser |
| **[zml](https://github.com/zml/zml)** | ![Stars](https://img.shields.io/github/stars/zml/zml.svg) | ![Release](https://img.shields.io/github/release/zml/zml) | ![Contributors](https://img.shields.io/github/contributors/zml/zml) | High performance AI inference stack. Built for production. | |

## Training

Expand Down

0 comments on commit bacc344

Please sign in to comment.