Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add async support for csv and dataframe methods #56

Merged
merged 14 commits into from
Jul 14, 2023

Conversation

gosuto-inzasheru
Copy link
Contributor

@gosuto-inzasheru gosuto-inzasheru commented Jun 4, 2023

opening this as a draft to get some feedback.

  • add async versions of get_result_csv, _refresh, refresh_csv and refresh_into_dataframe
  • reduce diff between client.py and client_async.py, to make future maintenance easier
  • make test-all passes 100%

closes #54

@github-actions
Copy link

github-actions bot commented Jun 4, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@gosuto-inzasheru
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@gosuto-inzasheru
Copy link
Contributor Author

im a bit out of depth here.

in the basic client, the requests' response is converted to a binary stream using BytesIO:

return ExecutionResultCSV(data=BytesIO(response.content))

however, the asyncio variant of our session returns a StreamReader type response. although a stream, it cannot be handled by pandas' read_csv. currently i was able to solve this as such:

return ExecutionResultCSV(data=BytesIO(await response.content.read(-1)))

but is this really the most efficient way to convert the asynchronous response to something pandas can read?

@gosuto-inzasheru gosuto-inzasheru marked this pull request as ready for review June 4, 2023 07:04
@gosuto-inzasheru
Copy link
Contributor Author

client.py versus client_async.py: https://www.diffchecker.com/tXrNtPqY/

Copy link
Collaborator

@bh2smith bh2smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey this looks excellent, really nice job here. I wish there was some way to reduce the overall duplication between the two clients, but I guess not. I have never actually used the async variant and wonder if anyone out there is...

Anyway, this looks great to me!

@bh2smith bh2smith requested a review from a team June 4, 2023 08:48
@bh2smith
Copy link
Collaborator

bh2smith commented Jun 4, 2023

I would be willing to merge as is and we can tag a new release after a few other of the pending/incoming PRs are landed. Just let me know

@gosuto-inzasheru
Copy link
Contributor Author

@eliseygusev care to comment on the async aspect here maybe?

@gosuto-inzasheru
Copy link
Contributor Author

i will resolve conflicts and update for updates made to client in #53

@bh2smith
Copy link
Collaborator

bh2smith commented Jun 8, 2023

Were you still planning on resolving conflicts here?

@gosuto-inzasheru
Copy link
Contributor Author

yes will do, just also have a day job :(

@gosuto-inzasheru
Copy link
Contributor Author

should be good to merge @bh2smith!

@gosuto-inzasheru
Copy link
Contributor Author

@bh2smith what is in the way of merging this? i think waiting will just creates more conflicts down the road for other prs...

@bh2smith
Copy link
Collaborator

Ya we can merge this... but the performance tier is still failing. Its fine though (aw you explained)

@bh2smith
Copy link
Collaborator

bh2smith commented Jul 14, 2023

Could you also review #59 --- @gosuto-inzasheru? Then I can tag a new release.

@bh2smith bh2smith merged commit 914b0d4 into duneanalytics:main Jul 14, 2023
@gosuto-inzasheru gosuto-inzasheru deleted the issue/54 branch July 14, 2023 11:06
@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncDuneClient does not have get_result_csv method
2 participants