-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LTW with a parallel-capable class loader and generated classes #278
Fix LTW with a parallel-capable class loader and generated classes #278
Conversation
Hi @urisimchoni. Thanks for this PR. Your explanation sounds reasonable. I hope I can look into this in more detail soon. It would be nice to have one or more regression tests reproducing the problem without this patch. A best effort reproducer would also help already. I can take care of incorporating it into the AspectJ test suite with its strange XML configuration mechanics, if that is what stops you from providing a reproducer. |
Thanks for looking into this @kriegaex. I compiled a reproducer here - https://gitlab.com/urisimchoni/aspectj-parallel-issue If you can point me to a close-enough test I can have a go at it with adapting it for AspectJ test suite. Because this is fundamentally a race condition, the reproduce doesn't reproduce 100% of the time, but I've seen it happen after a few runs (less than 10), whereas with the fix it does not reproduce in 1000 runs. One thing I discovered while putting this together was that the Java documentation WRT parallel class loading (https://docs.oracle.com/javase/8/docs/technotes/guides/lang/cl-mt.html) is a bit vague about whether it's possible to be loading the same class into the same class loader in two threads simultaneously. One paragraph says "...a class by a parallel capable class loader now synchronizes on the pair consisting of the class loader and the class name", Whereas another paragraph is saying that when constructing a parallel class loader one should "Decide upon an internal locking scheme", without specifying what this scheme is. Obviously, if parallel loading of same class was forbidden, there would not be an issue, but the WildFly loader seems to avoid locking altogether (instead it defines classes and catches the LinkageError exception that occurs if the class already exists), which brings us to this issue. |
I opened issue #279 so that we can associate the test code with an issue. |
@urisimchoni, sorry for the inconvenience, but practically simultaneously with your PR, another issue came in, #277, which I just fixed with #280. Unfortunately, I touched the same classes your PR does. Even if it merges without conflicts, I kindly request you to rebase this one and verify that your change still does what it is supposed to in the new configuration. FYI, my change was about returning Thanks in advance. Your change is next in line, if I have free cycles next time. |
5328e54
to
9c255cb
Compare
I've rebased the PR. According to my analysis, the recent change is not affected-by and does not affect this PR:
I also re-tested with the reproducer. Regarding turning this reproducer to a test - maybe I'll have cycles for this in a few days, but a few pointers might come in handy:
|
9c255cb
to
faa36ca
Compare
Thanks for the reproducer. I already converted a modified version of it with renamed classes and a loop doing the parallel weaving several times into an integration test which reliably reproduces the problem before your fix. I then rebased your (slightly reworded) commits on the commit with the test and verified, that the problem is really fixed by them. I also reviewed your code, but did not fully immerse myself into it due to missing cycles. I hope not to regret that later, committing something I do not fully understand, but the build is green, and so I am confident that nothing important is broken. As soon as the CI build passes, I will merge. |
Test "same class woven concurrently in parallel-capable classloader" reproduces eclipse-aspectj#279. Originally taken and modified from: https://gitlab.com/urisimchoni/aspectj-parallel-issue Co-authored-by: Uri Simchoni <[email protected]> Signed-off-by: Alexander Kriegisch <[email protected]>
Store the woven class and not a generated (inner) class as value of the woven class name. This has no effect in this commit, because the value is not consulted at all, but will be used later on. Relates to eclipse-aspectj#279.
A parallel-capable class loader might load the same class multiple times in multiple threads, see https://docs.oracle.com/javase/7/docs/technotes/guides/lang/cl-mt.html. In this scenario, the JVM instrumentation mechanism invokes the AspectJ weaver twice on the same class. In simple weaving scenarios, this would just cause weaving logic to run multiple times and produce the same result for each thread, but if the weaver is generating new classes, it is undesirable to generate those classes multiple times. To avoid this, the generatedClasses map in WeavingAdaptor keeps track of classes for which inner classes were generated and avoids re-weaving them. However, before this change, if a class was found in the generated map, the weaver would return the un-woven bytes, which would cause AspectJ not to work for some of the threads. This change returns the woven bytes instead of re-running weaving. Fixes eclipse-aspectj#279.
WeavingAdaptor.couldWeave() tested two things: whether this specific class (by its name) should be excluded from weaving, and whether the class has generated classes associated with it (in which case we avoid weaving to avoid re-creating those generated classes). However, if the class has generated classes associated, couldWeave() is not called at all because of the new wovenWithGeneratedClass() test, so we have only the name test which is now called directly. Relates to eclipse-aspectj#279.
The generatedClasses map contained both keys of woven classes that had generated classes associated with them, and the keys of the generated classes themselves. It seems like this map is never consulted for the generated class key - the generated class is generated from within the context of woven class loading / retransformation, and thus no transform event is generated for it by the JVM. Because of that, we do not need to guard against re-weaving it. Other uses of generatedClasses map are for full/empty tests, where the woven class key is sufficient. This change simplifies deletion of a class because we do not have to look for its associated generated classes. Relates to eclipse-aspectj#279.
Drive-by cosmetics in the context of eclipse-aspectj#279. Signed-off-by: Alexander Kriegisch <[email protected]>
6117924
to
703d50b
Compare
Thank you so much for your exemplary PR and the very nice test template, which I could convert into an IT with reasonable effort. Thanks for helping to make AspectJ better. |
Thank you for landing this so quickly and for turning the reproducer into a test - realizing that you can loop instead of forking a new JVM per iteration because you can create a new class loader instance per iteration was brilliant! |
@urisimchoni, OSGI weaving in Eclipse is broken in AJDT due to this PR. I was looking for the root cause for days and thought it was to be found elsewhere in other changes I made in the last few months. Just now, I experimentally reverted this PR, and suddenly a weaving test in AJDT that was failing before is passing again. I have yet to analyse why that is, but maybe you can enlighten me. Are you an AJDT user, or do you work with an IDE other than Eclipse? Correction: For some reason, the revert fixes the test, but not the whole problem when using OSGi weaving in isolation. Maybe, it was a mix of your changes and mine, or maybe it is about changes in Eclipse JDT or Equinox. It is quite difficult to debug, basically I am using trial and error, because I know next to nothing about Eclipse. |
@kriegaex Alas, I'm not an AJDT user. If there's some way for me to reproduce I can have a look - I didn't spot any recent issue. |
@urisimchoni, it is accessed from child class |
This was introduced in commit 8a4aa03 of PR #278 contribution as part of the #279 fix. The contributor thought that the generated closure class entries were never used, but in fact AJDT class OSGiWeavingAdaptor relies on the presence of those entries. To the best of my present knowledge, it looks as if this change was the root cause of eclipse-aspectj/ajdt#57. Therefore, I reverted it, simultaneously refactoring Iterator::remove usage to delete entries from the map to Collection::removeIf. Signed-off-by: Alexander Kriegisch <[email protected]>
This was introduced in commit 8a4aa03 of PR #278 contribution as part of the #279 fix. The contributor thought that the generated closure class entries were never used, but in fact AJDT class OSGiWeavingAdaptor relies on the presence of those entries. To the best of my present knowledge, it looks as if this change was the root cause of eclipse-aspectj/ajdt#57. Therefore, I reverted it, simultaneously refactoring Iterator::remove usage to delete entries from the map to Collection::removeIf. Signed-off-by: Alexander Kriegisch <[email protected]>
This PR fixes the situation where AspectJ LTW Java agent is invoked by the instrumentation interface simultaneously in two threads, for the same class in the same class loader, and the weaving generates an inner class.
Parallel-capable class loaders have been around since Java 7, and they might be loading the same class twice in two threads.
AspectJ LTW weaver sometimes generates classes to handle the closure of around advices. With the same class being loaded twice, AspectJ wants to avoid generating the class twice. It consults a generateClasses map to see if a class by the same name with associated generated classes has already been woven by this weaver.
However, if a match is found in the map, AspectJ would return the original bytes, and that causes the thread that "lost the race" to use an unwoven class. This was discovered with WildFly when defining multiple DataSources with the same driver - it looks like WildFly is using some parallel executor to create the DataSource instances, and when trying to weave the DataSource it would sometimes not work.
The fix here returns the woven bytes of the previous weaving operation.
The PR also includes a simplification of the code, of not storing the names of generated classes in the generateClasses map, only names of woven classes with associated generated classes. This was done when I realized that this map is never consulted for the generated class key, only for the woven class name.