SGlang JSON Generation from Specified Schema (Parallel Requests) #641

Closed Answered by merrymercy

velocity33 asked this question in Q&A

velocity33
Jul 17, 2024

Is there a feature built into sGlang which would enable the generation of different JSON schemas in parallel? For example, if each request consisted of [sample_text, json_schema] could multiple requests be served in parallel and produce responses separated in a list or dictionary?

Answered by merrymercy

Yes. It should be easy. You can implement a multi-thread client in Python with https://docs.python.org/3/library/threading.html. Each thread sends requests to the server using the backend directly https://github.com/sgl-project/sglang?tab=readme-ov-file#backend-sglang-runtime-srt or using the frontend https://github.com/sgl-project/sglang?tab=readme-ov-file#json-decoding.

View full answer

Replies: 1 comment

merrymercy
Jul 27, 2024
Maintainer

Yes. It should be easy. You can implement a multi-thread client in Python with https://docs.python.org/3/library/threading.html. Each thread sends requests to the server using the backend directly https://github.com/sgl-project/sglang?tab=readme-ov-file#backend-sglang-runtime-srt or using the frontend https://github.com/sgl-project/sglang?tab=readme-ov-file#json-decoding.

0 replies

Answer selected by merrymercy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment