GitHub - mftyy7/sglang: SGLang is a fast serving framework for large language models and vision language models.

News

[2024/10] 🔥 The First SGLang Online Meetup (slides).
[2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
[2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).

More

[2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
[2024/04] SGLang is used by the official LLaVA-NeXT (video) release (blog).
[2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
[2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:

Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (INT4/FP8/AWQ/GPTQ).
Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.) and embedding models (e5-mistral), with easy extensibility for integrating new models.
Active Community: SGLang is open-source and backed by an active community with industry adoption.

Install

See https://sgl-project.github.io/start/install.html

Backend: SGLang Runtime (SRT)

See https://sgl-project.github.io/backend/backend.html

Frontend: Structured Generation Language (SGLang)

See https://sgl-project.github.io/frontend/frontend.html

Benchmark And Performance

Learn more in our release blogs: v0.2 blog, v0.3 blog

Roadmap

Development Roadmap (2024 Q4)

Citation And Acknowledgment

Please cite our paper, SGLang: Efficient Execution of Structured Language Model Programs, if you find the project useful. We also learned from the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL.

Name		Name	Last commit message	Last commit date
Latest commit History 1,124 Commits
.github		.github
3rdparty/amd		3rdparty/amd
assets		assets
benchmark		benchmark
docker		docker
docs		docs
examples		examples
python		python
rust		rust
scripts		scripts
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

About

Install

Backend: SGLang Runtime (SRT)

Frontend: Structured Generation Language (SGLang)

Benchmark And Performance

Roadmap

Citation And Acknowledgment

About

Releases

Packages

Languages

License

mftyy7/sglang

Folders and files

Latest commit

History

Repository files navigation

News

About

Install

Backend: SGLang Runtime (SRT)

Frontend: Structured Generation Language (SGLang)

Benchmark And Performance

Roadmap

Citation And Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages