Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support charset=utf-8 in Content-Type header #230

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fedorfo
Copy link
Contributor

@fedorfo fedorfo commented Oct 18, 2023

No description provided.

@fedorfo fedorfo requested a review from a team as a code owner October 18, 2023 11:38
@@ -37,7 +37,7 @@ class Header:
collections.abc.Mapping[str | multidict.istr, str] | multidict.CIMultiDictProxy[str] | multidict.CIMultiDict[str]
)

json_re = re.compile(r"^application/(?:[\w.+-]+?\+)?json", re.RegexFlag.IGNORECASE)
json_re = re.compile(r"^application/(?:[\w.+-]+?\+)?json(;\w+charset\w+=\w+utf-8\w+)?", re.RegexFlag.IGNORECASE)
Copy link

@outring outring Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with minimal implementation according to the spec: https://www.rfc-editor.org/rfc/rfc9110#field.content-type
Basically we don't need to look at anything after the first ;

Btw, not sure how does the current regex cause problems because it doesn't have $ at the end so it should match ; charset=xxx suffixes 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

6b5f1dc

I have pushed the tests and it works even w/o the patch.

Copy link

@outring outring Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the regex now and found one more issue in the existing one. We have a + in character class at (?:[\w.+-]+?\+) capture but we expect it to be terminated at the first +, so to disambiguate it we must either remove + from character class (?:[\w.-]+?\+) or make it greedy (?:[\w.+-]+\+)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants