-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NegativeArraySizeException and ArrayIndexOutOfBoundsException in ByteQuadsCanonicalizer #1289
Comments
Please try another version of Jackson. 2.17.1 was just released. More recent versions of Jackson have this And that should hopefully mean that Jackson will fail fast with very large strings. |
@richard-timpson Thank you for reporting this! As @pjfanning suggested, we would need a reproduction for a later version as there won't be new releases of 2.12.x; so ideally at least 2.16.x. This may be challenging for reproduction due to But if reproduction can be shown to work with relaxed (but not insanely so) limits -- say, max name legth of 1 meg? -- it could still be considered a valid flaw. |
Thanks for the information here. I've got to get the code in to disable |
@cowtowncoder Am I right to expect that we'll see a significant performance hit by disabling? I'm guessing that the canonicalization is doing a lot of good for the smaller json keys (which I know we have in our json objects). Ofc there are memory issues for the larger keys and possibly memory overhead (a lot of array copying) that might slow parsing down for those keys. So I guess the answer is probably dependent on what the structure of our json looks like. I'm hoping to also run a performance benchmark on our workload to compare the two, not sure if I'll have extra time to get to that. Any insight here is much appreciated :) |
@richard-timpson Yes you are correct in your assessments: it does depend on whether names repeat across parsing -- not just within a single document but across documents being parsed using parsers created by same Performance benefit comes from not only avoiding In cases where there are tons of non-repeating names (UUID as key) things won't work that well; if so it may make sense to disable canonicalization. Obviously performance benchmark would be useful, but it can take some effort to do that. |
It's been a few weeks and I wanted to provide an update. We rolled out the change to disable Additionally, we have another system that depends on Spinnaker execution context payloads. That service performed an upgrade from Jackson v2.15 -> v2.17 and saw the following error. That was super helpful as it allowed us to identify which json keys were large. The other service increased the configurable max size and hasn't seen the issue since. For now we have no plan to reduce the json key size, but can focus our efforts there if it creeps up again.
There is still an open question about the bug still existing even with the size limits introduced in later versions. I doubt that we'll have the time to work on a reproduction. For our purposes we have a path forward, and we can revisit if the issue comes up again. |
Thank you for sharing @richard-timpson -- this is useful information, in case issue resurfaces. |
Versions:
Jackson Version: 2.12.7
Java Version: 11
Context:
We operate a large installation of the open source Spinnaker CD platform. Spinnaker uses a complicated json object to manage and facilitate context during a pipeline execution. At our scale this execution json object can be extremely large for certain use cases. There is an endpoint exposed to the Spinnaker UI that will return all of the executions for a given application.
For this endpoint we frequently see a single payload over 2Gb, and sometimes up to 10Gb.
Problem:
Lately we have been seeing at least two different exceptions originate from the ByteQuadsCanonicalizer class. Here are stack traces for both.
NegativeArraySizeException
Error message:
"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is retrofit.RetrofitError: com.fasterxml.jackson.databind.JsonMappingException: -2147481843 (through reference chain: java.util.ArrayList[0])] with root cause"
Stack Trace:
ArrayIndexOutOfBoundsException
Error Message:
Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is retrofit.RetrofitError: com.fasterxml.jackson.databind.JsonMappingException: arraycopy: last destination index 2153683734 out of bounds for int[2147026828] (through reference chain: java.util.ArrayList[158])] with root cause
Stack Trace:
Investigation
Both exceptions are caused by integer overflows that happen in the
_appendLongName
method. This is the version of_appendLongName
in the 2.12 branchThe
NegativeArraySizeException
is almost certainly being caused by an integer overflow happening on the newSize calculationint newSize = _hashArea.length + Math.max(toAdd, minAdd);
. TheArrayIndexOutOfBoundsException
is also being caused by an integer overflow that happens on theif ((start + qlen) > _hashArea.length)
statement. Becausestart + qlen
overflows, the if block never executes, and the_hashArea = Arrays.copyOf(_hashArea, newSize);
copy never happens. Without that copy the _hashArea size is not expanded, and length parameter (qlen) in System.arraycopy creates the ArrayIndexOutOfBoundsException because the _hashArea array isn't big enough to perform the copy.Reproduction
I was able to reproduce the ArrayIndexOutOfBoundsException with a local unit test. It's sort of a crazy reproduction but I think it's still valid. I'm not sure what exactly determines the qlen size (I know it happens somewhere in the
UTF8StreamJsonParser
) class, but after some experimentation I realized that for very large json keys the qlen size is something like 1/4 of the key size. So I wrote a little go script that will generate json files that have a single, extremely large, key.If you execute the script as
go run main.go -type createLargeKey -keySize 3000000000 -numFiles 5 -filePath /path/to/java/test/resources/
it will create 5 files with a key that is approximately 3Gb.And then the following unit test will produce a ArrayIndexOutOfBoundsException exception
The specific error in this case is
java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 2250000128 out of bounds for int[1500000256]
.Explanation?
There have been several other issues in
jackson-core
that have revealed similar bugs and some with explanations on the point ofByteQuadsCanoniclizer
and what it is doing. Some references:ByteQuadsCanonicalizer.addName(String, int, int)
has incorrect handling for case ofq2 == null
#687ByteQuadsCanonicalizer
cleanly #686ByteQuadsCanonicalizer.calcHash()
#564As far as I can tell none of these issues or fixes target this specific error, as it seems to be caused only in the part of the _hashArea that handles large json keys. Based on comments in other threads (specifically this one), the ByteQuadsCanonicalizer class doesn't really do much in cases where there are many unique json keys. And based on these errors, it seems that there are also possibly design limitations for keys that are both very large and unique?
I haven't performed an analysis on what the json we return looks like (it's quite difficult with payloads that are so large) but I am guessing that somewhere in Spinnaker executions are being created with very many large json keys. I'm hoping to get some guidance on whether this assumption is correct or not.
Mitigation
As mentioned in the reproduction code, using
jsonFactory.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES);
makes the problem go away by disabling the problem class. We are planning on using this as our mitigation to avoid the bug, although based on other issues we are concerned about possible performance hits that this will come with. Given that huge amount of json data processing that we do, you can imagine that we are always looking to squeeze out json parsing otpimizations where we can.So our plan for now looks something like.
CANONICALIZE_FIELD_NAMES
(hopefully temporarily) to avoid the bugI wanted to put up the issue for awareness because, although I will admit it is caused by a crazy json workload, this does seem like an actual bug.
I also was hoping to get more understanding as to the actual cause. I was able to reproduce with 3Gb json keys, but there's no way we have keys that large in our real workload. The one specific thing I am trying to understand is how the class handles rehashing and resizing. There is a bit of code in the class that should be resetting the _longNameOffset size when rehashing takes place.
So I'm guessing that the overflows are being caused by some weird combination of
Some confirmation here would be helpful.
Next Steps
If I get time I'm hoping to run some more experimentation to see if I can reproduce with smaller json keys. I have also only run these tests against the 2.12.7 version. I'd like to also run them against the latest 2.18 version to verify it's still a problem. I have looked at the latest version of the code however and don't see any changes to suggest that there is a fix.
The text was updated successfully, but these errors were encountered: