Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds chapter on security #366

Merged
merged 2 commits into from
Nov 19, 2024
Merged

Adds chapter on security #366

merged 2 commits into from
Nov 19, 2024

Conversation

popematt
Copy link
Contributor

Issue #, if available:

None

Description of changes:

Adds a chapter describing various security considerations for Ion 1.1


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@popematt popematt requested a review from zslayton November 14, 2024 19:59
Comment on lines +47 to +48
Even without using `repeat` or `for`, a [Billion laughs attack](https://en.wikipedia.org/wiki/Billion_laughs_attack)
could exist for any data format with macro expansion, and it is certainly possible with Ion 1.1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, we've discussed this attack a bunch of times but I didn't know it had an established name.

_books/ion-1-1/src/security-considerations.md Show resolved Hide resolved
Comment on lines 102 to 104
This is unlikely to be a concern in practice because TDL is not arbitrary code.
TDL is intentionally not Turing complete, to make it impossible to perform arbitrary computation.
It also has a very limited domain—it can only transform/produce Ion data model values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a brief section on the intentional limitations built into TDL at the top of this section. Stating early on that it cannot recurse, has no forward references, etc would help clarify the reader's mental model going into the discussion of actual vulnerabilities.

## Arbitrary-sized values

The Ion specification places no limits on the size of Ion values, so an attacker could send a sufficiently large value,
it could consume enough system resources to disrupt the application reading the value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token bucket solution covers a lot of outsized-expansion issues, but it doesn't cover macros where (by using built-in operations like make_string) an enormous single value could be produced. We may wish to either restrict the number of input arguments or put a size limit on make_string, make_blob, etc.

Copy link
Contributor Author

@popematt popematt Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that most programming languages already have a limit on the length of strings, byte arrays, lists/vecs, etc. Do you think we need more than that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're talking about two different things. I'm describing a flavor of the small-input-big-output problem that doesn't generate lots of values, it generates one gigantic value.

(.make_string (.repeat 1_000_000_000 "lol"))

The token bucket/fuel approach doesn't handle this case because:

  1. repeat is nested inside of an e-expression, so it isn't subject to the token limit
  2. make_string "only" produces one value, so it's within the token limit.

It makes sense for us to offer limits on (for example) the number of arguments that can be passed to make_string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat is nested inside of an e-expression, so it isn't subject to the token limit

Re-reading that section of the doc, I may have misunderstood this part. If repeat is also pulling from the same token bucket as the parent e-expression, this might be ok as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is my intention—that every value that is produced at any level in a macro evaluator would count, and perhaps that any macro that evaluates to nothing also uses one token to produce nothing.

I don't know how prescriptive we need to be. The point is to set a limit on the expansion to prevent expansion bombs, so I don't think Ion 1.1 libraries need to implement this in exactly the same way because we don't necessarily need to have a precise limit. (After all, if your data looks like an expansion bomb, then you might need to rethink your data.)

Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good read.

Comment on lines +152 to +154
Although Ion 1.1 does not specify a maximum size for symbol tables or macro tables, Ion implementations _MAY_ impose
upper bounds on the size of symbol tables, macro tables, module bindings, and any other direct or indirect component of
the encoding context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that an implementation may also allow the user to configure size limits for any kind of value, including user values.

@popematt popematt merged commit fb31d85 into amazon-ion:gh-pages Nov 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants