Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return []any for hash arrays when decoding to map / any. #421

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

veigaribo
Copy link

Currently they are returned in []map[string]any, which makes the tree impractical to traverse.

Example: https://github.com/itchyny/gojq/blob/0607aa5af33a4f980e3e769a1820db80e3cc7b23/encoder.go#L72 triggers the panic, instead of being encoded as an array.

Before my changes, the library depended on the peculiar behaviour of hash arrays and unfinished inline arrays being []map[string]any while finished inline arrays became []any. That was relevant for preventing finished inline arrays from being added to.

Since now everything is []any, I added locked to the key info once the inline array (or any top-level field, really) is finished in order to be able to tell.

I'm not happy about the duplicate error lines, but am not sure it would be better to engineer something just for that.

@arp242
Copy link
Collaborator

arp242 commented Aug 1, 2024

In principle this looks good at a glance. My main concern is that it might break compatibility where people use the type in some way; that's also why I haven't changed it before. I'm not entirely sure if that fear is warranted though.

The fuzzer does seem to fail on the round-trip test though.

@veigaribo
Copy link
Author

Agree. It would be safest to put this in a new major version as a breaking change, if you guys are willing to do that. Even if I don't think many people will be affected by it, judging by the lack of related reported issues (that I could find at least).

About the fuzzer, I'm a bit confused. Is it failing to parse / decode l=[{''.''=1},"",{''.''.''.''.''.''.''.''.''.''.''.''=0}]?

In my machine it seems to work fine:

package main

import (
	"fmt"
	"encoding/json"

	"github.com/BurntSushi/toml"
)

func main() {
	t := `l=[{''.''=1},"",{''.''.''.''.''.''.''.''.''.''.''.''=0}]`

	var data any
	toml.Unmarshal([]byte(t), &data)

	pp, _ := json.Marshal(data)
	fmt.Println(string(pp))
}

Outputs

veigaribo
{"l":[{"":{"":1}},"",{"":{"":{"":{"":{"":{"":{"":{"":{"":{"":{"":{"":0}}}}}}}}}}}}]}

I have a go.mod with replace github.com/BurntSushi/toml v1.4.0 => ../toml, pointing to my code, in my branch, and I added the fmt.Println("veigaribo") to make sure it's working.

@arp242
Copy link
Collaborator

arp242 commented Aug 1, 2024

It's this bit that fails: https://github.com/BurntSushi/toml/blob/master/ossfuzz/fuzz.go#L36

It's the decode + encode + decode that errors, not just a decode. Here's a test case:

diff --git i/decode_test.go w/decode_test.go
index 5b43730..cdf97ca 100644
--- i/decode_test.go
+++ w/decode_test.go
@@ -1144,3 +1144,20 @@ func BenchmarkKey(b *testing.B) {
                k.String()
        }
 }
+
+func TestXXX(t *testing.T) {
+       doc := `l=[{''.''=1},"",{''.''.''.''.''.''.''.''.''.''.''.''=0}]`
+       var m any
+       _, err := Decode(doc, &m)
+       fmt.Println(err)
+       fmt.Println(m)
+
+       out, err := Marshal(m)
+       fmt.Println(err)
+       fmt.Println(string(out))
+
+       var m2 any
+       _, err = Decode(string(out), &m2)
+       fmt.Println(err)
+       fmt.Println(m2)
+}

On your branch:

[~cg/toml](slice-of-any)% go test -run TestXXX
<nil>
map[l:[map[:map[:1]]  map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:0]]]]]]]]]]]]]]
<nil>
l = [{"" = {"" = 1}}, "", {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = 0}}}}}}}}}}}}]

toml: line 1: Key 'l' was already created and cannot be used as an array.
<nil>
PASS
ok      github.com/BurntSushi/toml      0.002s

On master:

[~cg/toml](master)% go test -run TestXXX
<nil>
map[l:[map[:map[:1]]  map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:0]]]]]]]]]]]]]]
<nil>
l = [{"" = {"" = 1}}, "", {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = {"" = 0}}}}}}}}}}}}]

<nil>
map[l:[map[:map[:1]] map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:map[:0]]]]]]]]]]]]]]
PASS
ok      github.com/BurntSushi/toml      0.002s

I haven't looked into it why this fails beyond reproducing it. Regardless of what we do with respect of compatibility, that should probably work.

@veigaribo
Copy link
Author

veigaribo commented Aug 1, 2024

Thank you. One more TODO gone.

Had to create an specific function for setting the types of hashes because setType was being invoked with "" for empty keys and, due to the conditional I removed, caused the type of the parent array to be set and locked instead of itself. I believe. Weird how that worked before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants