Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attempt to read gzip content from responses that cannot contain content #792

Conversation

seedifferently
Copy link

@seedifferently seedifferently commented Nov 3, 2023

Colly currently returns an EOF error on responses that contain a Content-Encoding: gzip header, but no body content.

This happens due to the initialization of the gzip reader here attempting to parse gzip header data from the response body, even though the body is empty.

RFC 9110 section 15 states that 1xx, 204, and 304 responses cannot contain content. However, some servers will still send a Content-Encoding header.

This is my best guess at how to instruct colly not to reach for the gzip reader in these scenarios (the non-gzip reader gracefully handles empty bodies).

@WGH-
Copy link
Collaborator

WGH- commented Nov 9, 2023

I think my #746 solves it in more general case. The problem is the PR got stalled on review, as I can't self-approve and merge

@seedifferently
Copy link
Author

seedifferently commented Nov 9, 2023

@WGH- thanks, I had not seen that PR. However, if you copy my test case over there you will see it fail because the Peek() still returns an EOF, so some additional checking might be needed to determine if the EOF is OK to ignore and move on from.

@WGH-
Copy link
Collaborator

WGH- commented Nov 9, 2023

That's a good catch. I'll update my PR, and credit you.

@seedifferently
Copy link
Author

Thanks @WGH-.
Closing this in favor of #746.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants