-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Painless - ASM/org.objectweb errors - 2.0.0 JDK17 #2664
Comments
Can confirm, I'm the coworker |
I agree that we should have consistent API versions. Looks like dependabot is not that.. dependa...ble. However, I'm unable to reproduce this in the 2.0 branch w/ JDK 17. Even after upgrading
It's odd our PR gradle checks wouldn't have caught this as well. Did you clear gradle cache? Kill any stale gradle daemons? |
That's interesting, it passed for you on JDK 17 with asm-analysis on 7.2 and on 9.2? I had Mohammad pull it down on to his dev desktop to test it as a way to get a second data point before nuking my caches as that always causes me other issues getting things set up again sometimes. I guess I can now try and see what happens now since we have another data point of it passing for you.. |
Yes, it passes w/ both. Although it feels uncomfortable running 7.2 when the rest are on 9.2. Let's find out if this is a 9.2 runtime bug w/ JDK 17 before making a decision to either rollback or upgrade |
@nknize Any chance there's a difference between the 17 JDK from Oracle and 17 JDK from OpenJDK?
And you have
Mohammad also tested with OpenJDK. In the meantime I am nuking caches etc. and rerunning the test on my end. |
Did this..
Reran the test and still getting the failure on my end. |
@dbbaughe Is the issue in
|
Sure it's a possibility. I'm not in a position to test that for a while (maybe @mch2 can investigate?). If that ends up being the case then it sounds like a JDK 17 issue since you're seeing this failure on both 7.2 and 9.2 but only when using 17? |
@dblock @mch2 Can you try out the steps using OpenJDK 17 and see if you can replicate? @nknize Yes, seeing it for OpenJDK 17 as is, upgrading asm-analysis to 9.2, and downgrading all asm in painless to 7.2 (did not try downgrading all asm dependencies in OpenSearch to 7.2 though). |
I'd give this a shot. I did it and the tests pass but that's not saying anything since I can't reproduce w/ Oracle JDK. Mixed dependency versions is always a problem and the only changes since the last release here are the dependency versions and jdk upgrade. If you can downgrade the dependency to 7.2 across the board and it passes for you in jdk 17 then I'd be fine reverting the upgrade for 2.0 and investigating the problem further for a future 2.x release. There's no urgency for us to upgrade. |
Trying this with 2.0 as is, first run after 13m failed with a timeout but no further output, might be flaky. Trying again with -i. |
Checked out the advisory in the thread @mch2 found. https://alas.aws.amazon.com/announcements/2021-001.html Tried disabling the hot patch service I tried upgrading it instead
But it still fails, worried this will be a ticking time bomb for anyone running on Amazon EC2 (or higher abstraction services) then with that fat jar for the Log4j patch being injected. |
CI is still running jdk 14, that is why it hasn't failed there. For context, issue - elastic/elasticsearch#81915 (comment). |
I'm able to repro the failure with AL2 + JDK17. I've tried downgrading ASM to 7.2 with no success. Looking at shadowing those dependencies. |
This seems like a bit of a concern and probably just a miss - did we open an issue to make sure we are running CI on the versions that we require our plugins to run on? I believe there has been work across all plugins to make sure their CI is running on the supported JDK versions.. and probably should have the expectation that upstream core is stable on those versions too. |
CI for 2.0 should not/is not being run on 14, where do we see that? if we're talking gradle check, that's being fixed now in #2671 |
I checked directly on the jenkins job that runs gradle checks. It is not pulling in any script under version control. I believe what has changed is the bundle build CI, which I would expect is now failing for this reason. |
@mch2 the jenkinsfile will be used by public jenkins later. The current CI is still using JDK-14 and has the old setup as of now. More context here |
I'm not sure its wise for us to have inconsistent versions between the build & check jobs. I'd suggest we either manually change the check job to also be 17 or revert to 11. With that said, I am still working on resolving this issue. |
Makes sense. @peterzhuamazon can we run gradle check on the current CI with JDK-17 once this issue gets fixed? |
+1. Same issue when query with any painless script.
Solution |
If that's the case, then this is the most straightforward solution I've ever seen 😆 |
This will disable the log4j patch on those hosts. As a workaround we could use shadow pr here. This works for me with jdk 17 on AL2. |
AD plugin integ test are also failing due to this issue as the nodes are not coming up healthy .
|
An update here. The root problem is the same conflict of ASM classes with jdk17. I think this is related to jdk17 blocking illegal access to runtime internals with JEP 403. The hotpatch is bringing in its version of these classes that are getting loaded by the system classloader and conflicting with the versions included by the painless module. I thought maybe there was a difference in the jar contents between what is distributed in the gem, so I compared it to a locally built jar from https://github.com/corretto/hotpatch-for-apache-log4j2. There does look like a difference in the namespace of these dependencies and I'd expect this is related. This is the hotpatch contents. The namespace is still the same
Whereas cloning https://github.com/corretto/hotpatch-for-apache-log4j2 and unpacking...
The fix I put in #2681 works because it creates a fat jar that rewrites the path of the objectweb dependencies but I don't think that should be required. |
Ok looks like the jar in the hotpatch is the issue. I'm able to reproduce using the static agent attached to the process with ./gradle run. Not sure why adding the agent to gradle.properties didn't load properly. Repro steps:
|
So to move this fwd I see two options. We either 1 - disable the hotpatch on CI & warn users of this issue with jdk17 + the hotpatch or 2 - move fwd with shadowing lang-painless's version of these jars ourselves. While we can't control whats running on user's hosts, I see 2 as a safer option, thoughts? |
First, thanks for the thorough debugging and repro, solid work. I think we can move forward with (2) now as a workaround. Let's leave the bug open or open a new one to undo the shadowing, and treat it as a bug in the hot patch. The latter shouldn't have this kind of effect on any application IMHO. |
To be clear, this is the commit missing from the hotpatch. This previous fix is also missing. Am working on an update to our fix. Its also worth calling out this will increase the size of the lang-painless jar from ~700kb to ~1.1MB. I've also not been able to repro the bug with lang-expression that also includes these dependencies. I think until we can we shouldn't make any change there. |
Closing with fix merged. Please reopen if this reappears. |
Hi, my projects use both asm (7.2) and asm-util (7.1). In order to upgrade my project to Open JDK 17 LTS, I need to update asm to 9.1 which supports JDK 17. However, the build failed due to: "Unsupported api 589824" error for asm (9.1) and asm-util (9.1) If I downgrade asm-util to 7.1, the build succeeded. asm (9.1) and asm-util (7.1) Not sure if you have any idea. Thanks. |
Describe the bug
I'm getting some issues w/ running tests in Index Management for 2.0.
At a certain point while running integration tests the entire cluster crashes, so I tried tracking it down to see if it's a specific test.. but it seems to be flaky and sometimes crashes and sometimes doesn't.
An example of an integ test suite run and failing and running again and passing:
https://gist.github.com/dbbaughe/898d02beeb593fcf66ef97b2546a713b
It appears to be related to the org.objectweb.asm/org.ow2.asm dependencies, and I can see we recently committed some dependency upgrades for them that were also backported to the 2.0 branch.
lang-painless dependencies
From googling around..
apache/camel-quarkus#1182
quarkusio/quarkus#9832
CodeIntelligenceTesting/jazzer#55
https://youtrack.jetbrains.com/issue/IDEA-245330
I was wondering if perhaps IM was pulling in the asm dependency somewhere too and it was causing issues.. or I can see that the current lang-painless module still has an asm dependency of 7.2 for the analysis one and 9.2 for everything else which might of been causing it.
But, I tried bumping that 7.2 to 9.2 and I still see the failures when running the Index Management tests.
Also tried downgrading 9.2 to 7.2 and still see the failures when running the Index Management tests.
And then I tried just running the lang-painless tests to see and I'm getting failures (so unrelated to Index Management) by using
./gradlew :modules:lang-painless:check
https://gist.github.com/dbbaughe/a0eb025c7d9c950ea05161a58394b585
It's failing for me on JDK 17.
Passing for me on JDK 14.
Had a coworker try it out (just testing OpenSearch 2.0 w/ the lang-painless check) and it's passing for them on JDK 14 and failing on JDK 17 too.
The text was updated successfully, but these errors were encountered: