-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add and use post endpoint for streaming #6093
Conversation
c817617
to
94348f6
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #6093 +/- ##
==========================================
- Coverage 76.36% 76.17% -0.20%
==========================================
Files 145 145
Lines 14007 14020 +13
==========================================
- Hits 10697 10680 -17
- Misses 3310 3340 +30
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
359dccb
to
72037b5
Compare
@pytest.mark.asyncio | ||
async def test_issue_6090(): | ||
"""Tests if streaming works with pydantic models with complex fields which are not | ||
str, int, or float. | ||
""" | ||
|
||
class NestedFieldSchema(BaseDoc): | ||
name: str = "test_name" | ||
dict_field: Dict = Field(default_factory=dict) | ||
|
||
class InputWithComplexFields(BaseDoc): | ||
text: str = "test" | ||
nested_field: NestedFieldSchema = Field(default_factory=NestedFieldSchema) | ||
dict_field: Dict = Field(default_factory=dict) | ||
bool_field: bool = False | ||
|
||
class MyExecutor(Executor): | ||
@requests(on="/stream") | ||
async def stream( | ||
self, doc: InputWithComplexFields, parameters: Optional[Dict] = None, **kwargs | ||
) -> InputWithComplexFields: | ||
for i in range(4): | ||
yield InputWithComplexFields(text=f"hello world {doc.text} {i}") | ||
|
||
docs = [] | ||
protocol = "http" | ||
with Deployment(uses=MyExecutor, protocol=protocol, port=11112) as dep: | ||
client = Client(port=11112, protocol=protocol, asyncio=True) | ||
example_doc = InputWithComplexFields(text="my input text") | ||
async for doc in client.stream_doc( | ||
on="/stream", | ||
inputs=example_doc, | ||
input_type=InputWithComplexFields, | ||
return_type=InputWithComplexFields, | ||
): | ||
docs.append(doc) | ||
|
||
assert [d.text for d in docs] == [ | ||
"hello world my input text 0", | ||
"hello world my input text 1", | ||
"hello world my input text 2", | ||
"hello world my input text 3", | ||
] | ||
assert docs[0].nested_field.name == "test_name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoanFM I added this test to make sure complex Docs can be streamed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how the solution looks, I just have some small comments.
jina/clients/base/helper.py
Outdated
req_dict = doc.dict() | ||
else: | ||
req_dict = doc.to_dict() | ||
req_dict = doc.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not do this change, I think the signature of json
may return a str
. Why is there a need to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be passed as request_kwargs['data']
rather than request_kwargs['params']
, it needs to be serialized as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to keep this by using json
instead of data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but where does the need for passing as 'str' come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were using data
, data needs to be a string, but since I swithced to json, I am able to use dict again. Here is the current code:
'json': doc.dict() if docarray_v2 else doc.to_dict(),
@JoanFM I addressed the your comments. Please let me know If there's anything else I can do. |
c80d8af
to
3ce6e1e
Compare
Hey @NarekA , Thanks for the contrubution, could yiu add a note in the documentation https://docs.jina.ai/concepts/serving/executor/add-endpoints/#streaming-endpoints making explicit note that SSE can only be used with flat schemas? |
2d6b949
to
60c9b5e
Compare
It looks good to me, but there seems to be a failing test |
@JoanFM I am trying to fix the tests, but I'm having issues recreating CI tests locally, or even seeing what the failure is in CI. (Will go back and read the docs to see what I missed) |
) | ||
start_time = None | ||
async for doc in stream: | ||
start_time = start_time or time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like streaming takes longer to initialize, but has all of the data emitted at once:
Expected delay to be less than 0.5, got 7.867813110351562e-06 on iteration 1
Expected delay to be less than 1.0, got 0.00012993812561035156 on iteration 2
Expected delay to be less than 1.5, got 0.0001659393310546875 on iteration 3
Expected delay to be less than 2.0, got 0.00019884109497070312 on iteration 4
Expected delay to be less than 2.5, got 0.00022912025451660156 on iteration 5
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these times do not include a 2.5 second initialization time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the gateway might be breaking streaming.
9f262a7
to
f1f9a88
Compare
Thanks @NarekA for the great contribution |
This reverts commit f2085a9.
Making a new PR that follows conventions to replace #6091
Http streaming breaks when the input doc schema has fields which are not str, int, or float. This includes dicts, bools, and nested objects (See Issue 6090). In addition it caps the input size to 2000 characters (Much less when you factor in URL encoding)
This PR changes the get endpoints to post endpoints. Not sure if we want to keep the get endpoints for backwards compatibility.
Goals: