Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: a single universal separation method for inline tables, arrays, and more #903

Closed
eksortso opened this issue May 13, 2022 · 10 comments

Comments

@eksortso
Copy link
Contributor

eksortso commented May 13, 2022

My attitudes towards the use of newlines and commas within inline tables and in general have softened considerably since #525. Since nesting is trending in a JSON-like manner, allow me to propose a bound on the other extreme. How about a single universal standard for separating anything in different parts of the document?

Currently

Here is how we separate items within TOML. Note that all three separation methods are intentionally different.

  • Between table and array-of-table headers, we separate key/value pairs (or "KVPs") with newlines (and optional comments), with surrounding whitespace characters. in the ABNF, this is called ws-comment-newline.
  • Within an inline table, we separate KVPs with commas and whitespace only. Trailing commas aren't allowed. (And the way things are trending, this will change, but I digress.)
  • Within an array, we separate values with ws-comment-newline, with a single comma permitted between or after values. Newlines (with optional comments) are not sufficient. The comma must exist between each pair of values, and an optional comma may exist after the last value.

Proposed

For any of the three separation scenarios described above, let's allow the following as separators:

  1. At the beginning of one of these regions, we will only allow ws-comment-newline. That is, we will permit whitespace (space and tab) and newlines (with newlines preceded with optional comments) before the first item, and nothing else.
  2. Between items, we will allow ws-comment-newline, so long as one comma (not in a comment) or one newline is present.
  3. After the last item, we will allow ws-comment-newline, with an optional comma (not in a comment) permitted.
  4. The between-item separators can have a mixture of commas and newlines, and there is no requirement that they must all be consistent.

That's it. This is backwards compatible with current separation formats.

Newly Permitted

How would this standard extend the separators currently used?

  • Inline tables could span multiple lines. A comma is optional if there's a newline in the separator. And an optional trailing comma would be allowed.
  • Array values could be separated with newlines only. Commas wouldn't be required in this case.
  • Most radically, we could separate KVPs outside of inline tables with just commas and whitespace on lines, allowing the same syntax in table sections that are allowed in inline tables.

This is admittedly very broad, but it is more permissive than what #516 and #525 allow.

I can write up examples of what would newly be permitted, later in the day.

What are your thoughts?

@ChristianSi
Copy link
Contributor

ChristianSi commented May 15, 2022

I do like the idea of treating inline tables and (inline) arrays the same way, with a single comma, one or more newlines, or both, separating values, and with a trailing comma allowed.

However, I don't like the idea of extending this syntax to KVPs in block-level tables. IMHO, the line-oriented nature of this most fundamental TOML element is part of TOML's genetics, and removing it would essential create a new format that should have a new name and a separate spec.

It would also serve no obvious benefit – writing a newline is not harder than writing a comma.

@marzer
Copy link
Contributor

marzer commented May 16, 2022

I don't love the idea of making commas entirely optional, but I can't really put my finger on why. Largely I think it's the potential for inconsistency. For example, imagine a user wishing to specify a 3x3 matrix in a TOML file as an array, and using newlines to break up the rows:

transform = [
    1, 0, 0
    0, 1, 0
    0, 0, 1
]

With newlines acting as delimiters as above, elements [2] and [5] are delimited by newlines, while the rest use commas, which feels unconventional or non-obvious somehow. I dunno. Guess I'm thinking too much like a programmer. Happy to be convinced otherwise.

I am 1000000% on board with making trailing commas and newlines legal in inline tables, however.

@arp242
Copy link
Contributor

arp242 commented May 16, 2022

One concern here is error messages; consider:

tbl = {
    a = 1
    b = 2
    c = 3
}
k  = 4
k2 = 5

Okay, so this is all good; but now someone accidentally left off the closing }; probably a common mistake:

tbl = {
    a = 1
    b = 2
    c = 3
k  = 4
k2 = 5

# Maybe many more lines...

And it's quite hard for implementations to detect this; at the end of the document would be an error as we never saw that }, but generating a good error at the right location might be quite hard in some cases. In the above example, I can't really see a good way (only a human can see it due to the indentation, which may also be wrong and enforced indentation is probably not something we want to codify in the spec).

Actually, come to think of it, this is also a problem with allowing a trailing comma in inline tables:

tbl = {
    a = 1,
    b = 2,
    c = 3,
k  = 4
k2 = 5

# Maybe many more lines...

Although there it's not quite as bad, because the k = 4 line is missing the comma so we can generate an error sooner (although it will be on the k = 4 line rather than c = 3 line).

With arrays, things are easier to detect since it doesn't use the key = value syntax:

arr = [
    1
    2
k = 3

The problem here is, essentially, that the key = value syntax is re-used for both "normal" key/values and inline tables. If TOML would use key: value for inline tables this problem wouldn't exist, as the k = 4 line would be invalid in the inline table context.


I'm not opposed to this proposal as such, but this is my main concern. There may be some other tricky errors too; I think it would be good to do an experimental implementation and get a list of all failure modes to see how good the error messages for them can be, and then decide if it's worth the trade-off.

I do like the syntax simplification as such, but I also like good error messages, and generally speaking I would prioritize error messages over syntax.

@marzer
Copy link
Contributor

marzer commented May 16, 2022

@arp242 That is a very compelling argument. I'm trying to think what I could do in toml++ to improve the error reporting in the inline table example, and the only realistic thing I can come up with is just saying where the opening { was, which isn't very helpful if it is right up near the start of the document 😅

@ChristianSi
Copy link
Contributor

I find @arp242 's argument on error reporting quite convincing too. And @marzer 's discomfort about mixed syntax is at least understandable. So maybe it's indeed better to say that "inline stuff is comma-delimited, while top-level KVPs are newline-delimited"?

Then the needed change would shrink down to @marzer 's "making trailing commas and newlines legal in inline tables". That one would be good in any case.

@eksortso
Copy link
Contributor Author

There's a lot to think about, so far. Not enough time to reply to it all, as of now.

Errors are worthy of their own discussions, though, and whether we ought to address them in the spec is uncharted territory. @arp242 could lead that exploration.

But as far as it pertains to this proposal, I'll say, it's more a matter of good style than technical restrictions that would help users fix errors like missing braces. I am not calling for the styles of writing TOML already prevalent to be overhauled. Old documents and styles would still work.

@arp242
Copy link
Contributor

arp242 commented May 17, 2022

Errors are worthy of their own discussions, though, and whether we ought to address them in the spec is uncharted territory. @arp242 could lead that exploration.

I think we don't really need to do anything with the spec here as such, just consider the impact of newly proposed syntax for implementations, including what kind of errors can and can't be reported.

@eksortso
Copy link
Contributor Author

@arp242 I do prefer a consistency within an inline table for its separators: use commas, or use just newlines, but do not mix the two. I don't insist upon it, but depending on context, insisting upon it could either mask mistakes (like in your 2nd example) or make them stand out (like in your 3rd example). If you prefer the latter, then a requirement for consistent separator types would need to be codified in the spec.

@eksortso
Copy link
Contributor Author

eksortso commented May 18, 2022

@pradyunsg I would like to shift my focus to #516 and get a PR in place to allow newlines and trailing commas in inline tables. Since the proposal described here would be backwards compatible with any such change, I wish to keep this issue open. for possible future modifications to separators. Could we tag this "new-syntax" for later considerations? Some good observations have already been made about this, so keeping it open would make sense if this idea would have future utility.

@pradyunsg
Copy link
Member

With #904, I think we're good to close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants