Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up processing junk before or after first or last boundary #178

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions python_multipart/multipart.py
Original file line number Diff line number Diff line change
Expand Up @@ -1102,10 +1102,17 @@ def data_callback(name: CallbackName, end_i: int, remaining: bool = False) -> No
c = data[i]

if state == MultipartState.START:
# Stop parsing if there is no boundary within the first chunk
if i == 16:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm not happy about that either. Two line breaks should be allowed, that's not that uncommon. More are uncommon, but who knows? It's just an arbitrary number between 2 and "too much to waste time on it". Maybe ignore the second commit for now and just remove the log lines?

Copy link
Contributor Author

@defnull defnull Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is lying, I was first checking for i == length-1 but that broke tests. This needs more work I guess.

msg = "Too much junk in front of first boundary (%d)" % (i,)
self.logger.warning(msg)
e = MultipartParseError(msg)
e.offset = i
raise e

# Skip leading newlines
if c == CR or c == LF:
i += 1
self.logger.debug("Skipping leading CR/LF at %d", i)
continue

# index is used as in index into our boundary. Set to 0.
Expand Down Expand Up @@ -1398,9 +1405,9 @@ def data_callback(name: CallbackName, end_i: int, remaining: bool = False) -> No
i -= 1

elif state == MultipartState.END:
# Do nothing and just consume a byte in the end state.
if c not in (CR, LF):
self.logger.warning("Consuming a byte '0x%x' in the end state", c) # pragma: no cover
# Skip junk after the last boundary
i = length
break

else: # pragma: no cover (error case)
# We got into a strange state somehow! Just stop processing.
Expand Down
Loading