Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: Define matching rules for form-urlencoded body #99

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tienvx
Copy link

@tienvx tienvx commented Sep 5, 2024

No description provided.

@tienvx tienvx force-pushed the define-matching-rules-for-form-urlencoded-body branch from 0fe042c to a7b122c Compare September 6, 2024 03:42
@tienvx tienvx force-pushed the define-matching-rules-for-form-urlencoded-body branch from a7b122c to 1d177f2 Compare September 6, 2024 04:01
@JP-Ellis
Copy link
Contributor

JP-Ellis commented Sep 6, 2024

Amazing to suggest an RFC!

I haven't had a chance to read over it yet, but I just want to bring attention some of the pitfalls of x-www-urlencoded. We should make sure we abide by the specification, which I might quote here when it comes to the parsing:

  1. Let sequences be the result of splitting input on 0x26 (&).

  2. Let output be an initially empty list of name-value tuples where both name and value hold a string.

  3. For each byte sequence bytes in sequences:

    1. If bytes is the empty byte sequence, then continue.

    2. If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes. If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value will be the empty byte sequence.

    3. Otherwise, let name have the value of bytes and let value be the empty byte sequence.

    4. Replace any 0x2B (+) in name and value with 0x20 (SP).

    5. Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value, respectively.

    6. Append (nameString, valueString) to output.

  4. Return output.

This means that we cannot assume the data can be deserialised to a dictionary, and we would have to assume at the very least the data is deserialized to a list[tuple[str, str]]. It may be possible to convert this to dict[str, str], but this assumes that keys are unique and that the key ordering is unimportant.

The spec also implies that we only can ever have strings. a=1 has a key "a" with value "1". We need to be careful about casting the string to another type (and not just for the values, but also keys: 1=a).

We need to handle pathological cases like:

  • keys with no values: a=&b=&c=
  • values with no keys: =a&=b&=c
  • repeated keys: a=1&a=2 vs a=2&a=1
  • support for / and ?: a=?&b=/ (yes, these's are valid, even in urls: https://example.com/foo?a=?&b=/&c=123 is valid, as per §3.4 of RFC 3986)
  • random things like:
    • &&&
    • ===
    • =&=&=

Anyway, I'll take a better look at it in the next few days :)

@YOU54F
Copy link
Member

YOU54F commented Sep 11, 2024

👋🏾 Hey @tienvx, will take a read next week as I am stacked out this week, but thank you for raising and supporting the RFC process 👍🏾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants