rfc: Define matching rules for form-urlencoded body #99

tienvx · 2024-09-05T21:03:17Z

No description provided.

JP-Ellis · 2024-09-06T09:02:59Z

Amazing to suggest an RFC!

I haven't had a chance to read over it yet, but I just want to bring attention some of the pitfalls of x-www-urlencoded. We should make sure we abide by the specification, which I might quote here when it comes to the parsing:

Let sequences be the result of splitting input on 0x26 (&).

Let output be an initially empty list of name-value tuples where both name and value hold a string.

For each byte sequence bytes in sequences:

If bytes is the empty byte sequence, then continue.

If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes. If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value will be the empty byte sequence.

Otherwise, let name have the value of bytes and let value be the empty byte sequence.

Replace any 0x2B (+) in name and value with 0x20 (SP).

Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value, respectively.

Append (nameString, valueString) to output.

Return output.

This means that we cannot assume the data can be deserialised to a dictionary, and we would have to assume at the very least the data is deserialized to a list[tuple[str, str]]. It may be possible to convert this to dict[str, str], but this assumes that keys are unique and that the key ordering is unimportant.

The spec also implies that we only can ever have strings. a=1 has a key "a" with value "1". We need to be careful about casting the string to another type (and not just for the values, but also keys: 1=a).

We need to handle pathological cases like:

keys with no values: a=&b=&c=
values with no keys: =a&=b&=c
repeated keys: a=1&a=2 vs a=2&a=1
support for / and ?: a=?&b=/ (yes, these's are valid, even in urls: https://example.com/foo?a=?&b=/&c=123 is valid, as per §3.4 of RFC 3986)
random things like:
- &&&
- ===
- =&=&=

Anyway, I'll take a better look at it in the next few days :)

YOU54F · 2024-09-11T13:00:07Z

👋🏾 Hey @tienvx, will take a read next week as I am stacked out this week, but thank you for raising and supporting the RFC process 👍🏾

tienvx force-pushed the define-matching-rules-for-form-urlencoded-body branch from 0fe042c to a7b122c Compare September 6, 2024 03:42

rfc: Define matching rules for form-urlencoded body

1d177f2

tienvx force-pushed the define-matching-rules-for-form-urlencoded-body branch from a7b122c to 1d177f2 Compare September 6, 2024 04:01

YOU54F requested review from rholshausen, mefellows, JP-Ellis and YOU54F October 2, 2024 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: Define matching rules for form-urlencoded body #99

rfc: Define matching rules for form-urlencoded body #99

tienvx commented Sep 5, 2024

JP-Ellis commented Sep 6, 2024

YOU54F commented Sep 11, 2024

rfc: Define matching rules for form-urlencoded body #99

Are you sure you want to change the base?

rfc: Define matching rules for form-urlencoded body #99

Conversation

tienvx commented Sep 5, 2024

JP-Ellis commented Sep 6, 2024

YOU54F commented Sep 11, 2024