[BUG] single quote string parse differently in array of inline table #439

laazy · 2024-04-10T03:24:39Z

code:

import toml
s = """
[foo]
bar1 = [
    {msg = ["1'2"] },
],
[[foo.bar2]]
msg = "1'2"
"""

print(toml.loads(s))

output:

{'foo': {'bar1': [], 'bar2': [{'msg': "1'2"}]}}

JamesParrott · 2024-04-10T16:29:53Z

Firstly, to reproduce this, the value in the inline table doesn't need to be in an array.

Secondly the bug is in decoder.TomlDecoder.load_array

Thirdly it occurs in all 4 types of Toml string.

Running:

>python toml_bug.py

with toml_bug.py as:

import toml

dec = toml.decoder.TomlDecoder()
print(dec.load_array("""[{msg = "'"}]"""))
print(dec.load_array("""[{msg = '"'}]"""))
print(dec.load_array("""[{msg = '''"'''}]"""))
print(dec.load_array('''[{msg = """'"""}]'''))
print(dec.load_array("""[{msg = "a"}]"""))

Gives:

[]
[]
[]
[]
[{'msg': 'a'}]

Based solely on the fact that I can't see such a test, I think the issue is that there is no test for matching quotation marks to take the decoder out of "string" mode by flipping in_str. As far as I understand the code below, the boolean in_str is always toggled when it hits a quote, even when it's within a pair of the other type of quotes.

                while end_group_index < len(a[1:]):
                    if a[end_group_index] == '"' or a[end_group_index] == "'":
                        if in_str:
                            backslash_index = end_group_index - 1
                            while (backslash_index > -1 and
                                   a[backslash_index] == '\\'):
                                in_str = not in_str
                                backslash_index -= 1
                        in_str = not in_str

toml/toml/decoder.py

Line 960 in 65bab75

if a[end_group_index] == '"' or a[end_group_index] == "'":

Parsing toml is now possible with the core Python library tomllib, there are plenty of alternatives without this bug (that also support Tomls >= 1.0.0, not just 0.5.0), and it'll take me more time than it's worth to tinker with that code and ensure all the possible edge cases are avoided, so I'm not going to fix this. But it's probably straightforward for anyone who wants to give it a shot.

laazy changed the title ~~[BUG] single quote string parsing differently in array of inline table~~ [BUG] single quote string parse differently in array of inline table Apr 10, 2024

JamesParrott mentioned this issue May 2, 2024

[BUG] empty dict in array list will not be parsed #440

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] single quote string parse differently in array of inline table #439

[BUG] single quote string parse differently in array of inline table #439

laazy commented Apr 10, 2024

JamesParrott commented Apr 10, 2024 •

edited

Loading

[BUG] single quote string parse differently in array of inline table #439

[BUG] single quote string parse differently in array of inline table #439

Comments

laazy commented Apr 10, 2024

JamesParrott commented Apr 10, 2024 • edited Loading

JamesParrott commented Apr 10, 2024 •

edited

Loading