Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] single quote string parse differently in array of inline table #439

Open
laazy opened this issue Apr 10, 2024 · 1 comment
Open

[BUG] single quote string parse differently in array of inline table #439

laazy opened this issue Apr 10, 2024 · 1 comment

Comments

@laazy
Copy link

laazy commented Apr 10, 2024

code:

import toml
s = """
[foo]
bar1 = [
    {msg = ["1'2"] },
],
[[foo.bar2]]
msg = "1'2"
"""

print(toml.loads(s))

output:

{'foo': {'bar1': [], 'bar2': [{'msg': "1'2"}]}}
@laazy laazy changed the title [BUG] single quote string parsing differently in array of inline table [BUG] single quote string parse differently in array of inline table Apr 10, 2024
@JamesParrott
Copy link

JamesParrott commented Apr 10, 2024

Firstly, to reproduce this, the value in the inline table doesn't need to be in an array.

Secondly the bug is in decoder.TomlDecoder.load_array

Thirdly it occurs in all 4 types of Toml string.

Running:

>python toml_bug.py

with toml_bug.py as:

import toml

dec = toml.decoder.TomlDecoder()
print(dec.load_array("""[{msg = "'"}]"""))
print(dec.load_array("""[{msg = '"'}]"""))
print(dec.load_array("""[{msg = '''"'''}]"""))
print(dec.load_array('''[{msg = """'"""}]'''))
print(dec.load_array("""[{msg = "a"}]"""))

Gives:

[]
[]
[]
[]
[{'msg': 'a'}]

Based solely on the fact that I can't see such a test, I think the issue is that there is no test for matching quotation marks to take the decoder out of "string" mode by flipping in_str. As far as I understand the code below, the boolean in_str is always toggled when it hits a quote, even when it's within a pair of the other type of quotes.

                while end_group_index < len(a[1:]):
                    if a[end_group_index] == '"' or a[end_group_index] == "'":
                        if in_str:
                            backslash_index = end_group_index - 1
                            while (backslash_index > -1 and
                                   a[backslash_index] == '\\'):
                                in_str = not in_str
                                backslash_index -= 1
                        in_str = not in_str

if a[end_group_index] == '"' or a[end_group_index] == "'":

Parsing toml is now possible with the core Python library tomllib, there are plenty of alternatives without this bug (that also support Tomls >= 1.0.0, not just 0.5.0), and it'll take me more time than it's worth to tinker with that code and ensure all the possible edge cases are avoided, so I'm not going to fix this. But it's probably straightforward for anyone who wants to give it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants