SGlang JSON Generation from Specified Schema (Parallel Requests) #641
-
Is there a feature built into sGlang which would enable the generation of different JSON schemas in parallel? For example, if each request consisted of [sample_text, json_schema] could multiple requests be served in parallel and produce responses separated in a list or dictionary? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Yes. It should be easy. You can implement a multi-thread client in Python with https://docs.python.org/3/library/threading.html. Each thread sends requests to the server using the backend directly https://github.com/sgl-project/sglang?tab=readme-ov-file#backend-sglang-runtime-srt or using the frontend https://github.com/sgl-project/sglang?tab=readme-ov-file#json-decoding. |
Beta Was this translation helpful? Give feedback.
Yes. It should be easy. You can implement a multi-thread client in Python with https://docs.python.org/3/library/threading.html. Each thread sends requests to the server using the backend directly https://github.com/sgl-project/sglang?tab=readme-ov-file#backend-sglang-runtime-srt or using the frontend https://github.com/sgl-project/sglang?tab=readme-ov-file#json-decoding.