Suggestions on Pipeline_webserver (huggingface#25570)

* Suggestions on Pipeline_webserver docs: reorder the warning tip for pseudo-code Co-Authored-By: Wonhyeong Seo <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Update docs/source/ko/pipeline_webserver.md Co-authored-by: Wonhyeong Seo <[email protected]> --------- Co-authored-by: Wonhyeong Seo <[email protected]> Co-authored-by: Steven Liu <[email protected]>
ylacombe · Aug 18, 2023 · 08e3251 · 08e3251
1 parent 659ab04
commit 08e3251
Show file tree

Hide file tree

Showing 2 changed files with 14 additions and 10 deletions.
diff --git a/docs/source/en/pipeline_webserver.md b/docs/source/en/pipeline_webserver.md
@@ -87,6 +87,13 @@ of the model on the webserver. This way, no unnecessary RAM is being used.
 Then the queuing mechanism allows you to do fancy stuff like maybe accumulating a few
 items before inferring to use dynamic batching:
 
+<Tip warning={true}>
+
+The code sample below is intentionally written like pseudo-code for readability.
+Do not run this without checking if it makes sense for your system resources!
+
+</Tip>
+
 ```py
 (string, rq) = await q.get()
 strings = []
@@ -104,11 +111,7 @@ for rq, out in zip(queues, outs):
     await rq.put(out)
 ```
 
-<Tip warning={true}>
-Do not activate this without checking it makes sense for your load!
-</Tip>
-
-The proposed code is optimized for readability, not for being the best code.
+Again, the proposed code is optimized for readability, not for being the best code.
 First of all, there's no batch size limit which is usually not a 
 great idea. Next, the timeout is reset on every queue fetch, meaning you could
 wait much more than 1ms before running the inference (delaying the first request 

diff --git a/docs/source/ko/pipeline_webserver.md b/docs/source/ko/pipeline_webserver.md
@@ -74,6 +74,11 @@ curl -X POST -d "test [MASK]" http://localhost:8000/
 중요한 점은 모델을 **한 번만** 가져온다는 것입니다. 따라서 웹 서버에는 모델의 사본이 없습니다. 이런 방식은 불필요한 RAM이 사용되지 않습니다. 그런 다음 큐 메커니즘을 사용하면, 다음과 같은
 동적 배치를 사용하기 위해 추론 전 단계에 몇 개의 항목을 축적하는 것과 같은 멋진 작업을 할 수 있습니다:
 
+<Tip warning={true}>
+코드는 의도적으로 가독성을 위해 의사 코드처럼 작성되었습니다!
+아래 코드를 작동시키기 전에 시스템 자원이 충분한지 확인하세요!
+</Tip>
+
 ```py
 (string, rq) = await q.get()
 strings = []
@@ -91,11 +96,7 @@ for rq, out in zip(queues, outs):
     await rq.put(out)
 ```
 
-<Tip warning={true}>
-위의 코드를 작동시키기 전에 당신의 시스템 자원이 충분한지 확인하세요!
-</Tip>
-
-제안된 코드는 가독성을 위해 최적화되었으며, 최상의 코드는 아닙니다.
+다시 말씀 드리자면, 제안된 코드는 가독성을 위해 최적화되었으며, 최상의 코드는 아닙니다.
 첫째, 배치 크기 제한이 없으며 이는 일반적으로 좋은 방식이 아닙니다.
 둘째, 모든 큐 가져오기에서 타임아웃이 재설정되므로 추론을 실행하기 전에 1ms보다 훨씬 오래 기다릴 수 있습니다(첫 번째 요청을 그만큼 지연시킴).