Skip to content

Commit

Permalink
Suggestions on Pipeline_webserver (huggingface#25570)
Browse files Browse the repository at this point in the history
* Suggestions on Pipeline_webserver

docs: reorder the warning tip for pseudo-code

Co-Authored-By: Wonhyeong Seo <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/ko/pipeline_webserver.md

Co-authored-by: Wonhyeong Seo <[email protected]>

---------

Co-authored-by: Wonhyeong Seo <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
  • Loading branch information
3 people authored Aug 18, 2023
1 parent 659ab04 commit 08e3251
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 10 deletions.
13 changes: 8 additions & 5 deletions docs/source/en/pipeline_webserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,13 @@ of the model on the webserver. This way, no unnecessary RAM is being used.
Then the queuing mechanism allows you to do fancy stuff like maybe accumulating a few
items before inferring to use dynamic batching:

<Tip warning={true}>

The code sample below is intentionally written like pseudo-code for readability.
Do not run this without checking if it makes sense for your system resources!

</Tip>

```py
(string, rq) = await q.get()
strings = []
Expand All @@ -104,11 +111,7 @@ for rq, out in zip(queues, outs):
await rq.put(out)
```

<Tip warning={true}>
Do not activate this without checking it makes sense for your load!
</Tip>

The proposed code is optimized for readability, not for being the best code.
Again, the proposed code is optimized for readability, not for being the best code.
First of all, there's no batch size limit which is usually not a
great idea. Next, the timeout is reset on every queue fetch, meaning you could
wait much more than 1ms before running the inference (delaying the first request
Expand Down
11 changes: 6 additions & 5 deletions docs/source/ko/pipeline_webserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@ curl -X POST -d "test [MASK]" http://localhost:8000/
중요한 점은 모델을 **한 번만** 가져온다는 것입니다. 따라서 웹 서버에는 모델의 사본이 없습니다. 이런 방식은 불필요한 RAM이 사용되지 않습니다. 그런 다음 큐 메커니즘을 사용하면, 다음과 같은
동적 배치를 사용하기 위해 추론 전 단계에 몇 개의 항목을 축적하는 것과 같은 멋진 작업을 할 수 있습니다:

<Tip warning={true}>
코드는 의도적으로 가독성을 위해 의사 코드처럼 작성되었습니다!
아래 코드를 작동시키기 전에 시스템 자원이 충분한지 확인하세요!
</Tip>

```py
(string, rq) = await q.get()
strings = []
Expand All @@ -91,11 +96,7 @@ for rq, out in zip(queues, outs):
await rq.put(out)
```

<Tip warning={true}>
위의 코드를 작동시키기 전에 당신의 시스템 자원이 충분한지 확인하세요!
</Tip>

제안된 코드는 가독성을 위해 최적화되었으며, 최상의 코드는 아닙니다.
다시 말씀 드리자면, 제안된 코드는 가독성을 위해 최적화되었으며, 최상의 코드는 아닙니다.
첫째, 배치 크기 제한이 없으며 이는 일반적으로 좋은 방식이 아닙니다.
둘째, 모든 큐 가져오기에서 타임아웃이 재설정되므로 추론을 실행하기 전에 1ms보다 훨씬 오래 기다릴 수 있습니다(첫 번째 요청을 그만큼 지연시킴).

Expand Down

0 comments on commit 08e3251

Please sign in to comment.