- Want to know which one is "the best"? Have a look at the 🏆 Leaderboards in the Benchmarking section.
- llm.extractum.io The LLM Explorer, a Large Language Model Directory with filters for trending, downloads and latest showing details like quantizations, model types and sizes
- can-it-run-llm Check most Huggingface LLMs and quants for hardware requirements like vram, ram and memory requirements
- chatgptui/desktop
- chatbox is a Windows, Mac & Linux native ChatGPT Client
- BingGPT Desktop application of new Bing's AI-powered chat
- cheetah Speech to text for remote coding interviews, giving you hints from GTP3/4
- Chat2DB++ general-purpose SQL & multi DBMS client and reporting tool which uses ChatGPT capabilities to write and optimize Queries
- ChatGPT-Next-Web Web, Windows, Linux, Mac GUI. Supports: Local LLMs, Markdown, LaTex, mermaid, code, history compression, prompt templates
- ChatGPT Native Application for Windows, Mac, Android, iOS, Linux
cpp / ggml / gguf:
- koboldcpp llama.cpp with a fancy UI, persistent stories, editing tools, memory etc. Supporting ggmlv3 and old ggml, CLBlast and llama, RWKV, GPT-NeoX, Pythia models
- Serge chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed
- faraday.dev using llama.cpp under the hood to run most llama based models, made for character based chat and role play
gpt4all:
- gpt4all terminal and gui version to run local gpt-j models, compiled binaries for win/osx/linux
- gpt4all.zig terminal version of GPT4All
- gpt4all-chat Cross platform desktop GUI for GPT4All models (gpt-j)
ollama:
- ollama Run, create, and share llms on macOS, win/linux with a simple cli interface and portable modelfile package
- ollama-ui Simple HTML UI for Ollama
- ollama-ui ChatGPT-Style Responsive Chat Web UI Client (GUI) for Ollama
others:
- LM Studio closed-source but very easy to use Native Mac, Windows, Linux GUI, supporting ggml, MPT, StarCoder, Falcon, Replit, GPT-Neu-X, gguf
- pinokio Template based 1 Click installer for ML inference (LLMs, Audio, Text, Video)
- Lit-llama training, fine tuning and inference of llama
- Dalai LLaMA-based ChatGPT for single GPUs
- ChatLLaMA LLaMA-based ChatGPT for single GPUs
- mlc-llm, run any LLM on any hardware (iPhones, Android, Win, Linux, Mac, WebGPU, Metal. NVidia, AMD)
- webllm Web LLM running LLMs with WebGPU natively in the browser using local GPU acceleration, without any backend, demo
- faraday.dev Run open-source LLMs on your Win/Mac. Completely offline. Zero configuration.
- ChatALL concurrently sends prompts to multiple LLM-based AI bots both local and APIs and displays the results
- pyllama hacked version of LLaMA based on Meta's implementation, optimized for Single GPUs
- gmessage visually pleasing chatbot that uses a locally running LLM server and supports multiple themes, chat history search, text to speech, JSON file export, and OpenAI API compatible Python code
- selfhostedAI one-click deployment of RWKV, ChatGLM, llama.cpp models for substituting the openAI API to a locally hosted API
- Lit-GPT run SOTA LLMs, supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed
- text-generation-inference Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power LLMs api-inference widgets
- gorilla-cli use natural language in the terminal to assist with command writing, gorilla writes the commands based on a user prompt, while the user just approves them
- minigpt4.cpp to run minigpt4 using 4-bit quantization with using the ggml library in pure C/C++
- LocalAI Drop-in OpenAI API replacement with local LLMs, Audio To Text (whisper), Image generation (Stable Diffusion), OpenAI functions and Embeddings
- Windows AI Studio Visual Studio Code extension for Fine-tuning, RAG development and inference of local models
- jan an open source alternative to ChatGPT that runs 100% offline on Windows, Intel/Apple Silicon Mac, Linux and Mobile
- open-interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal
- ClipboardConqueror a novel omnipresent copilot alternative designed to bring your very own LLM AI assistant to any text field
- Chat With RTX by NVIDIA using Tensore Cores locally to run LLMs fast with a local RAG workflow
- TypingMind
- Chatwithme.chat
- enricoros/nextjs-chatgpt-app
- no8081/chatgpt-demo
- IPython-gpt use chatGPT directly inside jupyter notebooks
- Chatbot UI An open source ChatGPT UI
- freegpt-webui provides a user friendly web-interface connecting to free (reverse-engineered) public GPT3.5/GPT4 endpoints using gpt4free
- Flux Graph-based LLM power tool for exploring many prompts and completions in parallel.
- Text Generation Webui An all purpose UI to run LLMs of all sorts with optimizations (running LLaMA-13b on 6GB VRAM, HN Thread)
- Text Generation Webui Ph0rk0z fork supporting all GPTQ versions and max context of 8192 instead of 4096 (because some models support longer context now)
- dockerLLM TheBloke's docker variant of text-generation-webui
- lollms-webui former GPT4ALL-UI by ParisNeo, user friendly all-in-one interface, with bindings for c_transformers, gptq, gpt-j, llama_cpp, py_llama_cpp, ggml
- Alpaca-LoRa-Serve
- chat petals web app + HTTP and Websocket endpoints for BLOOM-176B inference with the Petals client
- Alpaca-Turbo Web UI to run alpaca model locally on Win/Mac/Linux
- FreedomGPT Web app that executes the FreedomGPT LLM locally
- HuggingChat open source chat interface for transformer based LLMs by Huggingface
- openplayground enables running LLM models on a laptop using a full UI, supporting various APIs and local HuggingFace cached models
- RWKV-Runner Easy installation and running of RWKV Models, providing a local OpenAI API, GUI and custom CUDA kernel acceleration. Supports 2gb up to 32gb VRAM
- BrainChulo Chat App with vector based Long-Term Memory supporting one-shot, few-shot and Tool capable agents
- biniou a self-hosted webui for 30+ generative ai models for text generation, image generation, audio generation, video generation etc.
- ExUI simple, lightweight web UI for running local inference using ExLlamaV2
- ava Air-gapped Virtual Assistant / Personal Language Server with support for local models using llama.cpp as a backend, demo
- llamafile Distribute and run LLMs with a single file on Windows, macOS, Linux
- OpenChat web ui that currently supports openAI but will implement local LLM support, RAG with PDF, websites, confluence, office 365
- lobe-chat docker image based chat bot framework with plugin and agent support, roles, UI etc
- LibreChat OpenAI, Assistants API, Vision, Mistral, Bing, Anthropic, OpenRouter, Google Gemini, model switching, langchain, DALL-E, Plugins, OpenAI Functions, Multi-User, Presets
- ExLlama a more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights. By ReturningTarzan
- ExLlamaV2 faster ExLlama
- transformers huggingface transformers
- bitsandbytes 8 bit inference
- AutoGPTQ 4bit inference
- llama.cpp
- TensorRT-LLM Python API for running LLMs on GPU with support for MHA, MQA, GQA, Tensor Parallelism, INT4/8 Quantization, GPTQ, AWQ, FP8, RoPE to run Baichuan, BLOOM, ChatGLM, Falcon, GPT-J/NeoX, LLaMA/2,MPT, OPT, SantaCoder, StarCoder etc.
- tensorrtllm_backend Triton TensorRT-LLM Backend
- RWKV.cpp CPU only port of BlinkDL/RWKV-LM to ggerganov/ggml. Supports FP32, FP16 and quantized INT4.
- sherpa llama.cpp on android
- chatglm.cpp C++ implementation of ChatGLM-6B & ChatGLM2-6B
- MLX Apple's ML Toolkit supporting Transformers in the MLX format for faster inference
- datafilik/GPT-Voice-Assistant
- Abdallah-Ragab/VoiceGPT
- LlmKira/Openaibot
- BarkingGPT Audio2Audio by using Whisper+chatGPT+Bark
- gpt_chatbot Windows / elevenlabs TTS + pinecone long term memory
- gpt-voice-conversation-chatbot using GPT3.5/4 API, elevenlab voices, google tts, session long term memory
- JARVIS-ChatGPT conversational assistant that uses OpenAI Whisper, OpenAI ChatGPT, and IBM Watson to provide quasi-real-time tips and opinions.
- ALFRED LangChain Voice Assistant, powered by GPT-3.5-turbo, whisper, Bark, pyttsx3 and more
- bullerbot uses GPT and ElevenLabs to join your online meetings, listen for your name and answers questions with your voice
- RealChar Create, Customize and Talk to your AI Character/Companion in Realtime using GPT3.5/4, Claude2, Chroma Vector DB, Whisper Speech2Text, ElevenLabs Text2Speech
- gdansk-ai full stack AI voice chatbot (speech-to-text, LLM, text-to-speech) with integrations to Auth0, OpenAI, Google Cloud API and Stripe - Web App, API
- bark TTS for oobabooga/text-generation-webui make your local LLM talk
- bark TTS for oobabooga/text-generation-webui another implementation
- iris-llm local voice chat agent
- WhisperFusion ultra low latency conversations built with WhisperLive, WhisperSpeech and Mistral
- sqlchat Use OpenAI GPT3/4 to chat with your database
- chat-with-github-repo which uses streamlit, gpt3.5-turbo and deep lake to answer questions about a git repo
- mpoon/gpt-repository-loader uses Git and GPT-4 to convert a repository into a text format for various tasks, such as code review or documentation generation.
- chat-your-data Create a ChatGPT like experience over your custom docs using LangChain
- embedchain python based RAG Framework
- dataherald a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English
- databerry create proprietary data stores that can be accessed by GPT
- Llama-lab home of llama_agi and auto_llama using LlamaIndex
- PrivateGPT a standalone question-answering system using LangChain, GPT4All, LlamaCpp and embeddings models to enable offline querying of documents
- Spyglass tests an Alpaca integration for a self-hosted personal search app. Select the llama-rama feature branch. Discussion on reddit
- local_llama chatting with your PDFs offline. gpt_chatwithPDF alternative with the ultimate goal of using llama instead of chatGPT
- Sidekick Information retrieval for LLMs
- DB-GPT SQL generation, private domain Q&A, data processing, unified vector storage/indexing, and support for various plugins and LLMs
- localGPT a privateGPT inspired document question-answering solution using GPU instead of CPU acceleration and InstructorEmbeddings, which perform better according to leaderboards instead of LlamaEmbeddings
- LocalDocs plugin for GPT4All
- annoy_ltm extension to add long term memory to chatbots using a nearest neighbor vector DB for memory retrieval
- ChatDocs PrivateGPT + Web UI + GPU Support + ggml, transformers, webui
- PAutoBot document question-answering engine developed with LangChain, GPT4All, LlamaCpp, ChromaDB, PrivateGPT, CPU only
- AIDE CLI based privateGPT fork, improved, refactored, multiline support, model switch support, non question command support
- khoj Chat offline with your second brain using Llama 2, supporting multiple data sources, web search etc.
- secondbrain Multi-platform desktop app to download and run LLMs locally in your computer
- Paper QA LLM Chain for answering questions from documents with citations, using OpenAI Embeddings or local llama.cpp, langchain and FAISS Vector DB
- BriefGPT document summarization and querying using OpenAI' and locally run LLM's using LlamaCpp or GPT4ALL, and embeddings stored as a FAISS index, built using Langchain.
- anything-llm document ingestion, supports multiple vector DBs, remote and local LLMs and supports chat and query mode
- factool factuality Detection in Generative AI
- opencopilot LLM agnostic, open source Microsoft Copilot alternative to easily built copilot functionality with RAG, Knowledgebase, Conversional History, Eval, UX into your product
- DocsGPT chat with your project documentation using RAG, supports OpenAI and local LLMs, and also provides a RAG-fine-tuned docsgpt-14b model
- txtai All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
- mindsdb database for datascience and AI centered workloads like local LLM / OpenAI models access, text embeddings, forecasting etc.
- Swiss Army Llama FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract
- Quivr Dump all your files and thoughts into your private GenerativeAI Second Brain and chat with it
- danswer Model agnostic RAG QA with many advanced features like Hybrid search + Reranking, time extraction, user intent identification, User access level management, document update and connectors for many SaaS tools
- SecureAI-Tools Chat with local documents through various local or commercial models, supporting user authentication
- OpenCopilot implement RAG principles with your own LLM supporting API calling of multiple endpoints
- RAGatouille Retrievel with ColBERT and other implementations of SOTA research for your RAG pipeline
- QAnything two stage retrieval based on retrieve-and-rerank approach with SOTA performance for EN/CN and planned support for structured and unstructured data and DBs
- opengpts open source GPTs and Assistants with LangChain, LangServe and LangSmith. LLM agnostic, Prompt Engineering, Tool support, Vector DB agnostic, Various Retrieval Algorithms, Chat History support
- cognee Memory management for RAG and AI Applications and Agents
- bionic-gpt LLM deployment with authentication, team and RBAC functionality, RAG pipeline, tenants etc.
- rawdog CLI assistant that responds by generating and auto-executing a Python script. Recursive Augmentation With Deterministic Output Generations (RAWDOG) is a novel alternative to RAG
- ADeus RAG Chatbot for everything you say, by using an always on audio recorder and a Web App
- sider chrome side-bar for chatGPT and OpenAI API supporting custom prompts and text highlighting
- chathub-dev/chathub
- Glarity open-source chrome extension to write summaries for various websites including custom ones and YouTube videos. Extensible
- superpower-chatgpt chrome extension / firefox addon to add missing features like Folders, Search, and Community Prompts to ChatGPT
- Lumos Chrome Extension with OLlama Backend as a RAG LLM co-pilot for browsing the web
- chatGPTBox add useful LLM chat-boxes to github and other websites, supporting self-hosted model (RWKV, llama.cpp, ChatGLM)
- Auto GPT
- AgentGPT Deploy autonomous AI agents, using vectorDB memory, web browsing via LangChain, website interaction and more including a GUI
- microGPT Autonomous GPT-3.5/4 agent, can analyze stocks, create art, order pizza, and perform network security tests
- Auto GPT Plugins
- AutoGPT-Next-Web An AgentGPT fork as a Web GUI
- AutoGPT Web
- AutoGPT.js
- LoopGPT a re-implementation of AutoGPT as a proper python package, modular and extensible
- Camel-AutoGPT Communicaton between Agents like BabyAGI and AutoGPT
- BabyAGIChatGPT is a fork of BabyAGI to work with OpenAI's GPT, pinecone and google search
- GPT Assistant An autonomous agent that can access and control a chrome browser via Puppeteer
- gptchat a client which uses GPT-4, adding long term memory, can write its own plugins and can fulfill tasks
- Chrome-GPT AutoGPT agent employing Langchain and Selenium to interact with a Chrome browser session, enabling Google search, webpage description, element interaction, and form input
- autolang Another take on BabyAGI, focused on workflows that complete. Powered by langchain.
- ai-legion A framework for autonomous agents who can work together to accomplish tasks.
- generativeAgent_LLM Generative Agents with Guidance, Langchain, and local LLMs, implementation of the "Generative Agents: Interactive Simulacra of Human Behavior" paper, blogpost
- gpt-engineer generates a customizable codebase based on prompts using GPT4, and is easy to adapt and extend; runs on any hardware that can run Python.
- gpt-migrate takes your existing code base and migrates to another framework or language
- MetaGPT multi agent meta programming framework. takes requirements as input and outputs user stories, analysis, data structures, etc. MetaGPT includes product managers, architects, PMs, engineers and uses SOPs to run, paper
- aider command-line chat tool that allows you to write and edit code with OpenAI's GPT models
- AutoChain Build lightweight, extensible, and testable LLM Agents
- chatdev Develop Custom Software using Natural Language, while an LLM-powered Multi-Agent Team develops the software for you, paper
- AutoAgents Generate different roles for GPTs to form a collaborative entity for complex tasks, paper
- RestGPT LLM-based autonomous agent controlling real-world applications via RESTful APIs
- MemGPT intelligently manages different memory tiers in LLMs to provide extended context, supporting vector DBs, SQL, Documents etc
- XAgent Autonomous LLM Agent for Complex Task Solving
- HAAS Hierarchical Autonomous Agent Swarm create a self-organizing and ethically governed ecosystem of AI agents, inspired by ACE Framework
- agency-swarm agent orchestration framework enabling the creation of a collaborative swarm of agents (Agencies), each with distinct roles and capabilities
- Auto Vicuna Butler Baby-AGI fork / AutoGPT alternative to run with local LLMs
- BabyAGI AI-Powered Task Management for OpenAI + Pinecone or Llama.cpp
- Agent-LLM Webapp to control an agent-based Auto-GPT alternative, supporting GPT4, Kobold, llama.cpp, FastChat, Bard, Oobabooga textgen
- auto-llama-cpp fork of Auto-GPT with added support for locally running llama models through llama.cpp
- AgentOoba autonomous AI agent extension for Oobabooga's web ui
- RecurrentGPT Interactive Generation of (Arbitrarily) Long Text. Uses LSTM, prompt-engineered recurrence, maintains short and long-term memories, and updates these using semantic search and paragraph generation.
- SuperAGI open-source framework that enables developers to build, manage, and run autonomous agents. Supports tools extensions, concurrent agents, GUI, console, vector DBs, multi modal, telemetry and long term memory
- GPT-Pilot writes scalable apps from scratch while the developer oversees the implementation
- DevOpsGPT Multi agent system for AI-driven software development. Combine LLM with DevOps tools to convert natural language requirements into working software
- ToRA Tool-integrated Reasoning Agents designed to solve challenging mathematical reasoning problems by interacting with tools, e.g., computation libraries and symbolic solvers, paper
- ACE Autonomous Cognitive Entities Framework to automatically create autonomous agents and sub agents depending on the tasks at hand
- SuperAgent Build, deploy, and manage LLM-powered agents
- aiwaves-cn/agents Open-source Framework for Autonomous Language Agents with LSTM, Tool Usage, Web Navigation, Multi Agent Communication and Human-Agent interaction, paper
- autogen framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks, paper
- openagents an Open Platform for Language Agents in the Wild, paper
- TaskWeaver code-first agent framework for planning and executing data analytics tasks interpreting user requests and coordinating plugins
- crewAI framework for orchestrating role-playing, autonomous AI agents
- phidata toolkit for building AI Assistants using function calling enabling RAG and other workflows
- FRIDAY Framework for Computer Agents with Self-Improvement on OSX and Linux
- agentkit Starter-kit to build constrained agents with Nextjs, FastAPI and Langchain
- huggingGPT / JARVIS Connects LLMs with huggingface specialized models
- Langchain-huggingGPT reimplementation of HuggingGPT using langchain
- OpenAGI AGI research platform, solves multi step tasks with RLTF and supports complex model chains
- ViperGPT implementation for visual inference and reasoning with openAPI
- TaskMatrix former visual-chatgpt connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
- PandaGPT combines ImageBind and Vicuna to understand and combine multimodal inputs from text, image, audio, depth, thermal, and IMU.
- AGiXT agents with memory, model agnostic, docker deployment, plugin extendable, chat feature, speech to text and text to speech, REST api and more
- SelfTalker Talk with your virtual self using voice cloning, LLMs and computer vision models
- CoDi Any to any generation via composable diffusion
- AutoMix Mixing Language Models with Self-Verification and Meta-Verification, paper
- NExT-GPT Any-to-Any Multimodal LLM for arbitary input-output combinations (any-to-any) for text, image, video, audio and beyond, paper, weights
- SpeechGPT Empowering LLMs with Intrinsic Cross-Modal Conversational Abilities for speech audio input and output
- OpenFLamingo-v2 MPT and RedPajama fine tuned on the OpenFLamingo data set for training Autoregressive Vision-Language Models, models
- Obsidian 3B open source multimodal visual LLM
- ml-ferret Refer and Ground Anything Anywhere at Any Granularity
- CogVLM SOTA open visual language model and Agent
- Video-LLaVA Image and Video dense LLM and MoE-LLaVA 3B sparse Mixture of Expert model outperforming the original dense 7B model
- MobileAgent Autonomous Multi-Modal Mobile Device Agent with Visual Perception that can execute tasks
- FauxPilot open source Copilot alternative using Triton Inference Server
- Turbopilot open source LLM code completion engine and Copilot alternative
- Tabby Self hosted Github Copilot alternative with RAG-based code completion which utilizes repo-level context
- starcoder.cpp
- GPTQ-for-SantaCoder 4bit quantization for SantaCoder
- supercharger Write Software + unit tests for you, based on Baize-30B 8bit, using model parallelism
- Autodoc toolkit that auto-generates codebase documentation using GPT-4 or Alpaca, and can be installed in a git repository in about 5 minutes.
- smol-ai developer a personal junior developer that scaffolds an entire codebase with a human-centric and coherent whole program synthesis approach using <200 lines of Python and Prompts.
- locai kobold/oobabooga -compatible api for vscode
- oasis local LLaMA models in VSCode
- aider cli tool for writing and modifying code with GPT-3.5 and GPT-4
- continue open-source copilot alternative for software development as a VS Code plugin, can use gpt-4 API or local codellama and other models
- chatgpt-vscode vscode extension to use unofficial chatGPT API for a code context based chat side bar within the editor
- codeshell-vscode vscode extension to use the CodeShell-7b models
- localpilot vscode copilot alternative using local llama.cpp/ggml models on Mac
- sweep AI-powered Junior Developer for small features and bug fixes.
- acheong08/ChatGPT Python reverse engineerded chatGPT API
- gpt4free Use reverse engineered GPT3.5/4 APIs of other website's APIs
- GPTCache, serve cached results based on embeddings in a vector DB, before querying the OpenAI API.
- kitt TTS + GPT4 + STT to create a conference call audio bot
- Marvin simplifies AI integration in software development with easy creation of AI functions and bots managed through a conversational interface
- chatgpt.js client-side JavaScript library for ChatGPT
- ChatGPT-Bridge use chatGPT plus' GPT-4 as a local API
- Powerpointer connects to openAPI GPT3.5 and creates a powerpoint out of your content
- EdgeGPT Reverse engineered API of Microsoft's Bing Chat using Edge browser
- simpleaichat python package for simple and easy interfacing with chat AI APIs
- Dotnet SDK for openai chatGPT, Whisper, GPT-4 and Dall-E SDK for .NET
- node-llama-cpp TS library to locally run many models supported by llama.cpp, enhanced with many convenient features, like forcing a JSON schema on the model output on the generation level
- FastLLaMA Python wrapper for llama.cpp
- WebGPT Inference in pure javascript
- TokenHawk performs hand-written LLaMA inference using WebGPU, utilizing th.cpp, th-llama.cpp, and th-llama-loader.cpp, with minimal dependencies
- WasmGPT ChatGPT-like chatbot in browser using ggml and emscripten
- AutoGPTQ easy-to-use model GPTQ quantization package with user-friendly CLI
- gpt-llama.cpp Replace OpenAi's GPT APIs with llama.cpp's supported models locally
- llama-node JS client library for llama (or llama based) LLMs built on top of llama-rs and llama.cpp.
- TALIS serves a LLaMA-65b API, optimized for speed utilizing dual RTX 3090/4090 GPUs on Linux
- Powerpointer-For-Local-LLMs connects to oobabooga's API and creates a powerpoint out of your content
- OpenChatKit open-source project that provides a base to create both specialized and general purpose chatbots and extensible retrieval system, using GPT-NeoXT-Chat-Base-20B as a base model
- webgpu-torch Tensor computation with WebGPU acceleration
- llama-api-server that uses llama.cpp and emulates an openAI API
- CTransformers python bindings for transformer models in C/C++ using GGML library, supporting GPT-2/J/NeoX, StableLM, LLaMA, MPT, Dollyv2, StarCoder
- basaran GUI and API as a drop-in replacement of the OpenAI text completion API. Broad HF eco system support (not only llama)
- CodeTF one-stop Python transformer-based library for code LLMs and code intelligence, training and inferencing on code summarization, translation, code generation
- CTranslate2 provides fast Transformer (llama, falcon and more) inference for CPU and GPU, featuring compression, parallel execution, framework support
- auto-gptq easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ for GPU inference
- exllama Memory-Efficient Llama Rewrite in Python/C++/CUDA for 4bit quantized GPTQ weights, running on GPU, faster than llama.cpp (2023-06-13), autoGPTQ and GPTQ-for-llama
- SimpleAI Self-Hosted Alternative to openAI API
- rustformer llm Rust-based ecosystem for llms like BLOOM, GPT-2/J/NeoX, LLaMA and MPT offering a CLI for easy interaction and powered by ggml
- Haven Fine-Tune and Deploy LLMs On Your Own Infrastructure
- llama-cpp-python Python Bindings for llama.cpp with low level C API interface, python API, openai like API and LangChain compatibility
- candle a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use
- tabbyAPI OpenAI API emulation using exllamav2 API that's both lightweight and fast
- LangChain Framework for LLM Application Development (example, paolorechia/learn-langchain with vicuna and GPTQ 4 bit support)
- Langstream a lighter alternative to LangChain
- LangFlow GUI for Langchain using graphs/flows
- Toolformer implementation Allows LLMs to use Tools
- megabots to create LLM bots by providing Q&A, document retrieval, vector DBs, FastAPI, Gradio UI, GPTCache, guardrails, whisper, supports OpenAI API (local LLMs planned)
- gorilla Enables LLMs to use tools by semantically and syntactically correctly invoking APIs. Reduces hallucination, custom trained model weights based on llama-7b
- agency A fast and minimal actor model framework allows humans, AIs, and other computing systems to communicate with each other through shared environments called "spaces".
- Vercel AI SDK a library for building edge-ready AI-powered streaming text and chat UIs in React, Svelte and Vue supporting LangChain, OpenAI, Anthropic and HF
- tinygrad Geohot's implementation for a PyTorch killer with the target to be 2x faster
- Xorbits Inference (Xinference) versatile library designed to deploy and serve language, speech recognition, and multimodal models
- data-juicer zero code, low code and off the shelf data processing for LLMs
- Microsoft semantic-kernel a lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages
- LlamaIndex provides a central interface to connect your LLM's with external data
- haystack LLM orchestration framework to connect models, vector DBs, file converters to pipelines or agents that can interact with your data to build RAG, Q&A, semantic search or conversational agent chatbots
- rivet Visual graph/flow/node based IDE for creating AI agents and prompt chaining for your applications
- promptflow visual graph/flow/node based IDE for creating AI agents
- litellm Use OpenAI API call format for any LLM backend (Local, Huggingface, Cohere, TogetherAI, Azure, Ollama, Replicate, Sagemaker, Anthropic, etc)
- Flowise Drag & drop UI with visual graph/flow/nodes to build your customized LLM app
- ChainForge visual graph/flow/node based prompt engineering UI for analyzing and evaluating LLM responses
- LangStream Event-Driven Developer Platform for Building and Running LLM AI Apps, also providing a visual graph/flow/node based UI. Powered by Kubernetes and Kafka
- activepieces Automation with SaaS tools and GPT using a visual graph/flow/node based workflow
- kernel-memory Index and query any data using LLM and natural language, tracking sources and showing citations, ideal for RAG pipelines
prompt templating / grammar / engineering:
- Jsonformer Generate Structured JSON from Language Models by handling JSON synthax, and letting LLM just output the values
- Microsoft guidance templating / grammar for LLMs, Demo project by paolorechia for local text-generation-webui. reddit thread. guidance fork and llama-cpp-python fork how-to on reddit
- outlines Guidance alternative templating / grammar for LLM generation to follow JSON Schemas, RegEx, Caching supporting multiple models, model APIs, and HF transformers
- lmql LMQL templating / grammar language for LLMs based on a superset of Python going beyond constrain-based templating
- TypeChat templating / grammar for LLMs to enforce constraints for text generation
- GBNF templating / grammar implementation using Bakus-Naur Form (BNF) in llama.cpp to guide output, BNF Playground
- sglang structured generation language designed for LLMs with multiple chained generation calls, advanced prompting techniques, control flow, multiple modalities, parallelism, and external interaction
- DSPy a framework for algorithmically optimizing LM prompts and weights
- AlphaCodium Automatic Code Generation improvements with Prompt Engineering and Flow Engineering
- simple llama finetuner
- LLaMA-LoRA Tuner
- alpaca-lora
- StackLLaMA Fine-Tuning Guide by huggingface
- xTuring LLM finetuning pipeline supporting LoRa & 4bit
- Microsoft DeepSpeed Chat
- How to train your LLMs
- H2O LLM Studio Framework and no-code GUI for fine tuning SOTA LLMs
- Implementation of LLaMA-Adapter, to fine tune instructions within hours
- Hivemind Training at home
- Axolotl a llama, pythia, cerebras training environment optimized for Runpod supporting qlora, 4bit, flash attention, xformers
- LMFlow toolbox for finetuning, designed to be user-friendly, speedy, and reliable
- qlora uses bitsandbytes quantization and PEFT and transformers for efficient finetuning of quantized LLMs
- GPTQlora Efficient Finetuning of Quantized LLMs with GPTQ QLoRA and AutoGPTQ for quantization
- Landmark Attention QLoRA for landmark attention with 50x context compression and efficient token selection
- ChatGLM Efficient Finetuning fine tuning ChatGLM models with PEFT
- AutoTrain Advanced by Huggingface, faster and easier training and deployments of state-of-the-art machine learning models
- Pearl Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta
- LLaMA-Factory Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
- LLaMa2lang convenience scripts to finetune any foundation model for chat towards any language
- unsloth 2-5x faster and 60% less memory local QLoRA finetuning supporting Llama, CodeLlama, Mistral, TinyLlama etc. using Triton
- mergekit Tools for merging pretrained large language models.
- MergeLM LLMs are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
- SLERP Spherical Linear Interpolation Model Merging
- AutoAWQ
- Alpaca-lora instruction finetuned using Low Rank Adaption
- codealpaca Instruction training data set for code generation
- LAION AI / Open-Assistant Dataset (https://github.com/LAION-AI/Open-Assistant / https://projects.laion.ai/Open-Assistant/ / https://open-assistant.io)
- ShareGPT pre-cleaned, English only "unfiltered," and 2048 token split version of the ShareGPT dataset ready for finetuning
- Vicuna ShareGPT pre-cleaned 90k conversation dataset
- Vicuna ShareGPT unfiltered
- GPTeacher
- alpaca-cleaned
- codealpaca 20k
- gpt3all pruned
- gpt4all_prompt_generations_with_p3
- gpt4all_prompt_generations
- alpaca-plus-gpt4all-without-p3
- Alpaca dataset from Stanford, cleaned and curated
- Alpaca Chain of Thought fine tuning dataset for EN and CN
- PRESTO paper Multilingual dataset for parsing realistic task-oriented dialogues by Google & University of Rochester, California, Santa Barbara, Columbia
- RedPajama Dataset and model similar to LLaMA but truly open source and ready for commercial use. hf
- BigCode The Stack
- open-instruct-v1
- awesome-instruction-dataset list of instruction datasets by yadongC
- The Embedding Archives Millions of Wikipedia Article Embeddings in multiple languages
- Rereplit-finetuned-v1-3b & replit-code-v1-3b outperforming all coding OSS models, gets released soon
- alpaca_evol_instruct_70k an instruction-following dataset created using Evol-Instruct, used to fine-tune WizardLM
- gpt4tools_71k.json from GPT4Tools paper, having 71k instruction-following examples for sound/visual/text instructions
- WizardVicuna 70k dataset used to fine tune WizardVicuna
- Numbers every LLM Developer should know
- airoboros uncensored
- CoT collection, paper
- airoboros-gpt4 fine-tuning dataset optimized for trivia, math, coding, closed context question answering, multiple choice, writing
- fin-llama a LLaMA finetuned for finance, code, model
- dataset
- SlimPajama-627B Deduplicated and cleaned RedPajama based dataset for higher information density and quality at lower token length
- dolphin an attempt to replicate Microsoft Orca using FLANv2 augmented with GPT-4 and 3.5 completions
- OpenOrca collection of augmented FLAN data with distributions aligned with the orca paper
- ExpertQA Expert-Curated Questions and Attributed Answers dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers, paper
- annas-archive world’s largest open-source open-data library. ⭐️ Mirrors Sci-Hub, Library Genesis, Z-Library, and more. 📈 22,052,322 books, 97,847,390 papers, 2,451,032 comics, 673,013 magazines
- RedPajama-Data-v2 Open Dataset with 30 Trillion Tokens for Training, HF
- LLM Model Cards
- GPTs are GPTs: An early look at the labor market impact potential of LLMs
- ViperGPT Visual Inference via Python Execution for reasoning
- Emergent Abilities of LLMs , blog post
- facts checker reinforcement
- LLaVA: Large Language and Vision Assistant, combining LLaMA with a visual model. Delta-weights released
- Mass Editing Memory in a Transformer
- MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
- WizardLM | Fine tuned LLaMA 7B with evolving instructions, outperforming chatGPT and Vicuna 13B on complex test instructions (code, delta weights)
- Scaling Transformer to 1M tokens and beyond with RMT
- AudioGPT | Understanding and Generating Speech, Music, Sound, and Talking Head (github, hf space)
- Chameleon-llm, a paper about Plug-and-Play Compositional Reasoning with GPT-4
- GPT-4-LLM share data generated by GPT-4 for building an instruction-following LLMs with supervised learning and reinforcement learning. paper
- GPT4Tools Teaching LLM to Use Tools via Self-instruct. code
- CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society. preprint paper, website
- Poisoning Language Models During Instruction Tuning
- SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
- Dromedary: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision, code, weights
- Unlimiformer: transformer-based model that can process unlimited length input by offloading attention computation to a k-nearest-neighbor index, extending the capabilities of existing models like BART and Longformer without additional weights or code modifications. code
- Salesforce LAVIS provides a comprehensive Python library for language-vision intelligence research, including state-of-the-art models like BLIP-2 for vision-language pretraining and Img2LLM-VQA for visual question answering, alongside a unified interface
- FLARE an active retrieval augmented generation technique that iteratively predicts, retrieves, and refines content, improving the accuracy and efficiency of long-form text generation in language models
- Hyena a subquadratic-time layer that has the potential to significantly increase context length in sequence models, using a combination of long convolutions and gating. Long Convs and Hyena implementations
- FastServe an efficient distributed inference serving system for LLMs that minimizes job completion time using preemptive scheduling and efficient GPU memory management, built on NVIDIA FasterTransformer.
- FrugalGPT is a model that uses LLM cascade to optimize the performance and cost-efficiency of LLMs like GPT-4.
- Landmark Attention LLaMa 7B with 32k tokens. Code, llama7b diff weights, merged llama7b weights
- QLORA Efficient Finetuning of Quantized LLMs
- Tree of Thoughts (ToT) Enables exploration over text, improves strategic decision-making in language models. Code. Example implementation, discussion
- MEGABYTE Efficient multiscale decoder architecture for long-sequence modeling.
- PandaGPT: project page, code, model combines ImageBind and Vicuna to understand and combine multimodal inputs from text, image, audio, depth, thermal, and IMU.
- LIMA Less Is More for Alignment. Shows fine-tuning with 1000 carefully curated prompts without reinforcement learning can outperforms GPT-4 in many cases
- Gorilla a finetuned LLaMA-based model that surpasses GPT-4 in writing API calls and reduces hallucination. project, code
- Voyager Open-Ended Embodied Minecraft Agent using LLMs, project, code
- BigTrans llama adapted to multilingual translation over 100 languages, outperforming chatGPT in 8 language-pairs
- BPT memory-efficient approach to processing long input sequences in Transformers
- Lion efficiently transfers knowledge from a closed-source LLM to an open-source student model
- Undetectable Watermarks for Language Models using one-way functions
- ALiBi Train Short Test Long. Attention with Linear Biases Enables Input Length Extrapolation. code
- The Curse of Recursion: Training on Generated Data Makes Models Forget
- Brainformers a complex block for natural language processing that outperforms state-of-the-art Transformers in efficiency and quality
- AWQ Activation aware Weight Quantization for better LLM Compression and Acceleration, code
- SpQR quantization by Tim Dettmers, code, twitter
- InternLM Technical report. A 104B parameters multilingual LLM with SOTA performance in knowledge understanding, reading comprehension, math and coding, outperforms open-source models and ChatGPT in 4 benchmarks
- Naive Bayes-based Context Extension NBCE extends context length of LLMs using Naive Bayes to 50k under 8*A100
- The Safari of Deep Signal Processing: Hyena and Beyond
- Orca Progressive Learning from Complex Explanation Traces of GPT-4. Fine-tunes small models by prompting large foundational models to explain their reasoning steps
- How Far Can Camels Go? optimizing instruction on open resources, Tulu models released
- FinGPT open-source, accessible and cost efficient re-training for updating financial data inside LLMs for robo-advising, algorithmic trading, and other applications, code, dataset
- LongMem proposes new framework, allowing for unlimited context length along with reduced GPU memory usage and faster inference speed. Code
- WizardCoder empowers Coding Large Language Models with Evol-Instruct for complex instruction fine-tuning, outperforming open-source and closed LLMs on several benchmarks, github repo, model
- Infinigen a procedural generator for foto realistic 3D scenes, based on Blender and running on GPUs, paper, github
- Do Large Language Models learn world models or just surface statistics
- Large Language Models Can Self-improve, openreview.net
- Switch Transformers scaling to Trillion Parameter Models with efficient sparsity, a paper speculated to had an influence on GPT-4's undisclosed architecture using a sparsely activated Mixture of Experts (MoE) architecture
- 2022 & beyond Algorithms for efficient deep learning Google Research proposed various new architectures to scale LLMs further, including MoE
- Wanda Pruning by Weights and Activations a no-retraining pruning method for LLMs requires no retraining and outperforms existing methods, code
- Textbooks Are All You Need a 1.3B parameter LLM focusing on programming and coding from Microsoft, which outperforms all models on MBPP except GPT-4, ranks third on HumanEval above GPT-3.5, and exhibits emergent properties
- RoPE Enhanced Transformer with Rotary Position Embedding to extend context length
- LongChat a new level of extended context length up to 16K tokens, with two released models LongChat-7B and 13B
- salesforce xgen a series of 7B LLMs with standard dense attention on up to 8K sequence length for up to 1.5T tokens
- LongNet Scaling transformers to 1 billion tokens
- Lost in the Middle recent LLMs have longer context and this paper finds that information is best retrieved at the beginning or the end, but mostly lost in the middle of long context
- FoT Focused Transformer with contrastive learning to achieve a 256k context length for passkey retrieval, code
- OpenLLMs Less is More for Open-source Models, uses only ~6K GPT-4 conversations filtered for quality and achieves SOTA scores on Vicuna GPT-4 eval and AlpacaEval
- CoDi Any-to-Any Generation via Composable Diffusion
- LEDITS Real Image Editing with DDPM Inversion and Semantic Guidance, demo, code
- Mixture of Experts meets Instruction Tuning MoE + Instruction Tuning is a winning combination for LLMs, likely being used for GPT-4
- MoE Mixture of Experts LoRA Proof of Concept by AiCrumb, reddit discussion
- LLM Attacks Universal and Transferable Adversarial Attacks on Aligned Language Models, code
- factool framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT)
- codellama Llama 2 fine tuned by meta for code completion, github
- Graph of Thoughts introducing Graph of Thoughts and comparing its performance to Chain of Thoughts and Tree of Thoughts, code
- LIDA Automatic Generation of Visualizations and Infographics using Large Language Models, code
- Distilling step-by-step Outperforming larger language models with less training data and smaller model sizes
- LongLoRA Efficient Fine-tuning of Long-Context Large Language Models, code
- LLMLingua Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, code
- flagembedding, an embedding model for Retrieve Anything To Augment Large Language Models code
- mistral-7b pretrained llm with 7 billion parameters outperforming Llama 2 13B using Grouped-Query-Attention, Sliding-Window Attention and Byte-Fallback BPE tokenizer, weights
- CoVe Chain-of-Verification Reduces Hallucination in Large Language Models, implementation in LangChain Expression Language,
- MemGPT Towards LLMs as Operating Systems, perpetual chat bots with self editing memory, chat with your SQL database and local files etc, code
- microxcaling AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Format: Microscaling Data Formats for Deep Learning
- AoT Algorithm of Thoughts: Enhancing Exploration of Ideas in LLMs
- Chain of Density Prompting From Sparse to Dense: GPT-4 Summarization, gpt-3.5 fine tune rivaling the quality of the original Chain of Density
- Self-RAG Learning to Retrieve, Generate and Critique through Self-Reflections outperforming ChatGPT and retrieval-augmented LLama2 Chat on six tasks, selfrag finetuned llama2-13b, mistral-7b finetune
- LoRAShear Efficient Large Language Model Structured Pruning and Knowledge Recovery
- Making LLaMA SEE and Draw with SEED Tokenizer, Multi Modal fine tune of LLaMA with image generation, image recognition and text generation capabilities, weights, github
- BSM Branch-Solve-Merge for LLMs enhancing coherence, planning, and task decomposition outperforming GPT-4 in some tasks
- Skeleton-of-Thought Large Language Models Can Do Parallel Decoding. SoT aims at decreasing the end-to-end generation latency of large language models
- ML-Bench Large Language Models Leverage Open-source Libraries for Machine Learning Tasks, page, code
- QuIP# E8P 2-Bit Quantization of Large Language Models achieving near fp16 quantization performance
- HQQ Half-Quadratic Quantization for LLMs significantly accelerating quantization speed without requiring calibration data, outperforming existing methods in processing speed and memory efficiency. Sub 10GB VRAM Mixtral 8x7B implemented through mixtral-offloading, guide
- QMoE Practical Sub-1-Bit Compression of Trillion-Parameter Models, code, bitsandbytes sparse_MoE implementation, QMoE in llama.cpp, LoRa experts as alternative to QMoE
- mamba alternative to transformer architecture for LLMs using Linear-Time Sequence Modeling with Selective State Spaces code
- StreamingLLM Efficient Streaming Language Models with Attention Sinks for bigger Context Windows, code
- Chain of Abstraction CoA A New Method for LLMs to Better Leverage Tools in Multi-Step Reasoning
- LLM Worksheet using an early CoT example by randomfoo2
- The full story of LLMs
- Brief history of llama models
- A timeline of transformer models
- Every front-end GUI client for ChatGPT API
- LLMSurvey a collection of papers and resources including an LLM timeline
- rentry.org/lmg_models a list of llama derrivates and models
- Timeline of AI and language models and Model Comparison Sheet by Dr. Alan D. Thompson
- Brex's Prompt Engineering Guide an evolving manual providing historical context, strategies, guidelines, and safety recommendations for building programmatic systems on OpenAI's GPT-4
- LLMs Practical Guide actively curated collection of a timeline and guides for LLMs, providing a historical context and restrictions based on this paper and community contributions
- LLMSurvey based on this paper, builds a collection of further papers and resources related to LLMs including a timeline
- LLaMAindex can now use Document Summary Index for better QA performance compared to vectorDBs
- ossinsight.io chat-gpt-apps Updated list of top chatGPT related repositories
- GenAI_LLM_timeline Organized collection of papers, products, services and news of key events in Generative AI and LLMs with focus on ChatGPT
- AIGC-progress an awesome list of all things ml models and projects with daily updates
- Things I'm learning while training SuperHOT talks about LiMA, Multi-Instruct and how to extend llama to 8k context size github discussion, reddit discussion
- LLM Utils An index of useful LLM related blog posts and tools
- Awesome-Multimodal-Large-Language-Models Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
- FourthBrain ML Edication backed by Andrew NG's AI fund, tutorials about LLM deployment, API Endpoint creation, MLOps, QLoRA fine tuning, etc.
- companion-app AI Getting Started template for developers using Clerk, Next.js, Pinecone, Langchain.js, OpenAI or Vicuna13b, Twilio
- ppromptor Prompt-Promptor is a Python library with a web UI designed to automatically generate and improve prompts for LLMs and consists of three agents: Proposer, Evaluator, and Analyzer. These agents work together with human experts to continuously improve the generated prompts
- RAG Guide A Comprehensive Guide for Building RAG-based LLM Applications as a jupyter notebook, HN
- RAG is more than just embedding search learnings for building a good RAG-based LLM Application, HN
- llm-agent-paper-list The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al., paper
- awesome-ai-agents open and closed source agents by categories and industries
- Azure OpenAI resources Azure OpenAI, LLMs +🌌 Brief overview,🦙Summary notes,🔎References, and 🎋Cheatsheet
- alignment-handbook Huggingface's robust recipes for to align language models with human and AI preferences
- llama-recipes Llama 2 demo apps, recipes etc for RAG, Fine tuning, inference etc.
- Something-of-THoughts in LLM Prompting Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Graph-of-Thoughts (GoT), and beyond, … What are these thoughts?
- GPT-RAG learnings when implementing Azure OpenAI with RAG at scale in a secure manner
- AI and Open Source in 2023 a Summary of what happened in 2023 with all the learnings
- convert text into graph of concepts Tutorial on how to use Knowledge Based QnA (KBQA) using Knowledge Graphs which can improve RAG context quality in some domains
- Generative AI for Beginners 12 Lessons, Get Started Building with Generative AI from Microsoft
- LLM Visualization Explaining how transformers work visually using nano-gpt
- Visual explanations of core machine learning concepts Visually learn how Neural networks, Regression, Reinforcement Learning, Random Forests and more concepts work
- easily train a specialized llm PEFT, LoRA, QLoRA, LLaMA-Adapter, and More
- promptbase an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models
- rag-survey an updated view on RAG in the wild, their approaches, taxonomy, tech stack and evolution paper
- Survey of Reasoning with Foundation Models, awesome reasoning list
- llm-course Course to get into LLMs with roadmaps and notebooks covering Fundamentals, LLM-Scientist and LLM-Engineer roles
- ML Papers of The Week dair.ai curated list of weekly ML Papers
- The Illustrated Transformer Illustrated Guide to Transformers- Step by Step Explanation
- ai-exploits A collection of real world AI/ML exploits for responsibly disclosed vulnerabilities
- AI Trends features key numbers and data visualizations in AI, related Epoch reports and other sources that showcase the change and growth in AI over time
- Awesome-LLM-Inference curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
- Opinionate.io AI Debating AI
- phind.com Developer Search Engine
- Voice Q&A Assistant using ChatGPT API, Embeddings, Gradio, Eleven Labs and Whisper
- chatpdf, Q&A for PDFs
- ai collection collecting startups and SaaS solutions using AI at its core
- screenshot-to-code this converts a website screenshot to approximated HTML/CSS code by using GPT-4-Vision
- Outfit Anyone Ultra-high quality virtual try-on for Any Clothing and Any Person
- llavavision simple "Be My Eyes" web app with a llama.cpp/llava backend explaining what the camera sees for blind assistance
- Petals
- FlexGen High-throughput Generative Inference of LLMs with a Single GPU
- XLA Accelerated Linear Algebra is a ML compiler for GPU, CPU and accelerators
- zipslicer
- AITemplate a Python framework which renders neural network into high performance CUDA/HIP C++ code
- Flash-attention Fast and memory-efficient exact attention
- tokenmonster ungreedy tokenizer increases inference speed and context-length by 35% for pre-training on new LLMs
- LOMO fuses the gradient computation and the parameter update in one step to reduce memory usage enables the full parameter fine-tuning of a 7B model on a single RTX 3090
- Open LLM Leaderboard by HuggingFace, had a serious bug that made LLaMA models perform worse in the past
- LMSys Chatbot Arena Leaderboard, blogpost is an anonymous benchmark platform for LLMs that features randomized battles in a crowdsourced manner
- Current best choices on LocalLLaMA reddit
- LLM Logic Tests by YearZero on reddit/localllama
- paperswithcode has LLM SOTA leaderboards, but usually just for foundation models
- Can AI code a self-evaluating interview for AI coding models. code
- Gotzmann LLM Score v2 by Gatzuma on Reddit
- Aviary Explorer open source utility to compare leading OSS LLMs and see votes, pricing per token etc.
- Comparative look at (ggml) quantization and parameter size part 1 by KerfuffleV2
- Updated relative comparison of ggml quantization types and effect on perplexity part 2 by KerfuffleV2
- Programming performance ranking for popular LLaMAs using HumanEval+ by ProfessionalHand9945
- llm-humaneval-benchmarks HumanEval+
- CoT Hub
- C-Eval Benchmark
- programming eval by catid from reddit, code
- HumanEval+ raking for open vs closed programming LLMs by ProfessionalHand9945
- LLM Comparison Sheet by OptimalScale/LMFlow
- llm-jeopardy Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
- llama presets arena testing different generation presets by oobabooga, reddit discussion
- MTEB Leaderboard Massive Text Embedding Benchmark (MTEB) Leaderboard
- hallucination-leaderboard Hughes Hallucination Evaluation Model (HHEM) evaluates how often an LLM introduces hallucinations when summarizing a document code
- Big Code Models Leaderboard evaluates base coding models
- EvalPlus Leaderboard evaluates AI Coders with rigorous tests
- AgentBoard Evaluation Board of Multi-turn LLM Agents
- Enterprise Scenarios Leaderboard evaluates the performance of LLMs on real-world enterprise use cases, some of the test sets are closed source to prevent cheating
- NP Hard Eval Leaderboard benchmark for assessing the reasoning abilities of LLMs by using NP Hard problems
- Big-bench a collaborative benchmark featuring over 200 tasks for evaluating the capabilities of llms
- Pythia interpretability analysis for autoregressive transformers during training
- AlpacaEval automatic evaluation for instruction following LLMs, validated against 20k human annotations, reddit announcement
- LMFlow Benchmark automatic evaluation framework for open source LLMs
- lm-evaluation-harness framework for few-shot evaluation of autoregressive language models from EleutherAI
- sql-eval evaluation of LLM generated SQL queries
- ragas RAG assessment: an evaluation framework for Retrieval Augmented Generation pipelines
- ToolQA an evaluation framework for RAG and Tool LLM pipelines
- LangCheck Simple, Pythonic building blocks to evaluate LLM applications
- PromethAI-Memory Open-source framework for building and testing RAGs and Cognitive Architectures, designed for accuracy, transparency, and control
- PromptBench a Pytorch-based Python package for Evaluation of LLMs providing APIs
- CanItEdit Evaluating the Ability of Large Language Models to Follow Code Editing Instructions, paper
- deepeval evaluation framework specialized for unit testing LLM applications based on metrics such as hallucination, answer relevancy, RAGAS, etc.
- mlflow llm-evaluate use-case specific standard metrics and custom metrics, optional ground truth
- AgentBoard Evaluation Board of Multi-turn LLM Agents
- LLM-Uncertainty-Bench Benchmarking LLMs via Uncertainty Quantification
- Vicuna FastChat
- SynapseML (previously known as MMLSpark),an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines
- Colossal-AI unified deep learning system that provides a collection of parallel components for distributed deep learning models. Provides data parallelism, pipeline parallelism, and tensor parallelism
- OpenLLM Run, deploy, and monitor open-source LLMs on any platform
- skypilot Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution
- ONNX Runtime cross-platform inference and training machine-learning accelerator compatible with PyTorch, TensorFlow/Keras, scikit-learn, LightGBM, XGBoost, etc. and runs with different hardware, drivers, and operating systems
- vllm high-throughput and memory-efficient inference and serving engine for LLMs, paper
- openllmetry observability for your LLM application, based on OpenTelemetry
- DeepSpeed-FastGen High-throughput Text Generation for LLMs at 2x vLLM speeds
- DeepSparse Sparsity-aware deep learning inference runtime for CPUs
- dvc ML Experiments Management with Git
- S-LoRA Serving Thousands of Concurrent LoRA Adapters
- PowerInfer Fast LLM Serving with a Consumer-grade GPU leveraging activation locality, PR on llama.cpp, issue on ollama
- TaskingAI open source platform for AI-native application development
- inferflow LLM inference serving engine with support for Multi-GPU, Quantization supporting gguf, llama2, safetensors and many model families
- [LMDeploy](https://github.com/InternLM/lmdeploy multi-model, multi-machine, multi-card inference service for many models
- powerinfer High-speed Model Inference Serving on Consumer GPU/CPU using activation locality for hot/cold neurons
- lorax Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
- Pinecone proprietary vector search for semantic search, recommendations and information retrieval
- FAISS Library for Efficient Similarity Search and Clustering using vectors
- Weaviate open source vector DB for services like OpenAI, HF etc for text, image, Q&A etc.
- vespa.ai one of the only scalable vector DBs that supports multiple vectors per schema field
- LanceDB free open-source serverless vector DB with support for langchain, llamaindex and multi-modal data
- Deeplake Vector Database for audio, text, vectors, video
- milvus open-source cloud-native vector DB focusing on embedding vectors converted from unstructured data
- chroma open-source embedding database
- pgvector open-source vector similarity search for Postgres.