Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add and use post endpoint for streaming #6093

Merged
merged 16 commits into from
Oct 25, 2023

Conversation

NarekA
Copy link
Contributor

@NarekA NarekA commented Oct 20, 2023

Making a new PR that follows conventions to replace #6091

Http streaming breaks when the input doc schema has fields which are not str, int, or float. This includes dicts, bools, and nested objects (See Issue 6090). In addition it caps the input size to 2000 characters (Much less when you factor in URL encoding)

This PR changes the get endpoints to post endpoints. Not sure if we want to keep the get endpoints for backwards compatibility.

Goals:

  • resolves Issue 6090
  • check and update documentation. See guide and ask the team.

@NarekA NarekA force-pushed the fix-streaming-6091 branch from c817617 to 94348f6 Compare October 20, 2023 21:16
@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (2c3d7d3) 76.36% compared to head (f1f9a88) 76.17%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6093      +/-   ##
==========================================
- Coverage   76.36%   76.17%   -0.20%     
==========================================
  Files         145      145              
  Lines       14007    14020      +13     
==========================================
- Hits        10697    10680      -17     
- Misses       3310     3340      +30     
Flag Coverage Δ
jina 76.17% <38.46%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
jina/__init__.py 56.00% <100.00%> (ø)
jina/clients/base/helper.py 82.94% <100.00%> (-2.21%) ⬇️
jina/serve/runtimes/worker/http_fastapi_app.py 69.42% <61.53%> (-12.73%) ⬇️
...ve/runtimes/gateway/http_fastapi_app_docarrayv2.py 0.00% <0.00%> (ø)

... and 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@NarekA NarekA mentioned this pull request Oct 20, 2023
1 task
@NarekA NarekA force-pushed the fix-streaming-6091 branch from 359dccb to 72037b5 Compare October 20, 2023 23:38
Comment on lines 99 to 142
@pytest.mark.asyncio
async def test_issue_6090():
"""Tests if streaming works with pydantic models with complex fields which are not
str, int, or float.
"""

class NestedFieldSchema(BaseDoc):
name: str = "test_name"
dict_field: Dict = Field(default_factory=dict)

class InputWithComplexFields(BaseDoc):
text: str = "test"
nested_field: NestedFieldSchema = Field(default_factory=NestedFieldSchema)
dict_field: Dict = Field(default_factory=dict)
bool_field: bool = False

class MyExecutor(Executor):
@requests(on="/stream")
async def stream(
self, doc: InputWithComplexFields, parameters: Optional[Dict] = None, **kwargs
) -> InputWithComplexFields:
for i in range(4):
yield InputWithComplexFields(text=f"hello world {doc.text} {i}")

docs = []
protocol = "http"
with Deployment(uses=MyExecutor, protocol=protocol, port=11112) as dep:
client = Client(port=11112, protocol=protocol, asyncio=True)
example_doc = InputWithComplexFields(text="my input text")
async for doc in client.stream_doc(
on="/stream",
inputs=example_doc,
input_type=InputWithComplexFields,
return_type=InputWithComplexFields,
):
docs.append(doc)

assert [d.text for d in docs] == [
"hello world my input text 0",
"hello world my input text 1",
"hello world my input text 2",
"hello world my input text 3",
]
assert docs[0].nested_field.name == "test_name"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoanFM I added this test to make sure complex Docs can be streamed

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the solution looks, I just have some small comments.

jina/serve/runtimes/worker/http_fastapi_app.py Outdated Show resolved Hide resolved
req_dict = doc.dict()
else:
req_dict = doc.to_dict()
req_dict = doc.json()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do this change, I think the signature of json may return a str. Why is there a need to change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be passed as request_kwargs['data'] rather than request_kwargs['params'], it needs to be serialized as a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to keep this by using json instead of data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but where does the need for passing as 'str' come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were using data, data needs to be a string, but since I swithced to json, I am able to use dict again. Here is the current code:

 'json': doc.dict() if docarray_v2 else doc.to_dict(),

@NarekA
Copy link
Contributor Author

NarekA commented Oct 24, 2023

@JoanFM I addressed the your comments. Please let me know If there's anything else I can do.

@NarekA NarekA force-pushed the fix-streaming-6091 branch from c80d8af to 3ce6e1e Compare October 24, 2023 16:48
@JoanFM
Copy link
Member

JoanFM commented Oct 24, 2023

Hey @NarekA ,

Thanks for the contrubution, could yiu add a note in the documentation https://docs.jina.ai/concepts/serving/executor/add-endpoints/#streaming-endpoints making explicit note that SSE can only be used with flat schemas?

@NarekA NarekA force-pushed the fix-streaming-6091 branch from 2d6b949 to 60c9b5e Compare October 24, 2023 17:59
@NarekA NarekA requested a review from JoanFM October 24, 2023 18:31
@JoanFM
Copy link
Member

JoanFM commented Oct 24, 2023

@JoanFM I addressed the your comments. Please let me know If there's anything else I can do.

It looks good to me, but there seems to be a failing test

@NarekA
Copy link
Contributor Author

NarekA commented Oct 24, 2023

@JoanFM I am trying to fix the tests, but I'm having issues recreating CI tests locally, or even seeing what the failure is in CI. (Will go back and read the docs to see what I missed)

)
start_time = None
async for doc in stream:
start_time = start_time or time.time()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like streaming takes longer to initialize, but has all of the data emitted at once:

Expected delay to be less than 0.5, got 7.867813110351562e-06 on iteration 1
Expected delay to be less than 1.0, got 0.00012993812561035156 on iteration 2
Expected delay to be less than 1.5, got 0.0001659393310546875 on iteration 3
Expected delay to be less than 2.0, got 0.00019884109497070312 on iteration 4
Expected delay to be less than 2.5, got 0.00022912025451660156 on iteration 5
``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these times do not include a 2.5 second initialization time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the gateway might be breaking streaming.

@NarekA NarekA force-pushed the fix-streaming-6091 branch from 9f262a7 to f1f9a88 Compare October 24, 2023 23:16
@JoanFM JoanFM changed the title fix: Add and use post endpoint for streaming fix: add and use post endpoint for streaming Oct 25, 2023
@JoanFM JoanFM merged commit f2085a9 into jina-ai:master Oct 25, 2023
125 of 130 checks passed
@JoanFM
Copy link
Member

JoanFM commented Oct 25, 2023

Thanks @NarekA for the great contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming breaks for HTTP client when fields are not str, int, or float
2 participants