Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Full build takes more time since 2024-09 (type inference) #3327

Closed
iloveeclipse opened this issue Nov 19, 2024 Discussed in #3305 · 17 comments · Fixed by #3384 · May be fixed by #3333 or #3387
Closed

[Performance] Full build takes more time since 2024-09 (type inference) #3327

iloveeclipse opened this issue Nov 19, 2024 Discussed in #3305 · 17 comments · Fixed by #3384 · May be fixed by #3333 or #3387
Assignees
Labels
bug Something isn't working performance regression Something was broken by a previous change

Comments

@iloveeclipse
Copy link
Member

Discussed in #3305

Originally posted by fedejeanne November 13, 2024
Hi,
I've been looking into some performance issues that started with 2024-09, specifically the time it takes to perform a full build. Interestingly enough, the time it takes to do a clean all also increased in 2024-09 but the biggest increase happened in the build phase.

  • In 2024-06 it takes ~12 minutes to do a full clean and build.
  • In 2024-09 it takes ~30 minutes to do a full build (sampled after the clean all)

samples.zip

The part that caught my eye was the increase in amount of calls (Hits) and therefore the Self Time (CPU) for the method(s) org.eclipse.jdt.internal.compiler.lookup.TypeBinding.findSuperTypeOriginatingFrom(...) (there are 2 flavors of this method).

2024-06

  • 431 hits
  • ~20s spent in these methods
    image

2024-09

  • 6972 hits
  • ~9m spent in these methods
    image

Question 1

  • Why are there suddenly so many more calls to these methods ?

Question 2

  • Would it be OK to simply build a cache to speed up some subsequent repeated calls or would this introduce any undesired effects (e.g. inconsistencies)?

I ask because I noticed that many of the calls are "repeated" i.e. the input parameters are the same so a cache would be a trivial optimization.

@iloveeclipse iloveeclipse added bug Something isn't working regression Something was broken by a previous change performance labels Nov 19, 2024
@jukzi
Copy link
Contributor

jukzi commented Nov 19, 2024

my bet it's caused by changes in org.eclipse.jdt.internal.compiler.lookup.BoundSet.deriveTypeArgumentConstraints(TypeBound, TypeBound, InferenceContext18):
image
@fedejeanne can you test if a revert of cc0e66e helps?

@jukzi
Copy link
Contributor

jukzi commented Nov 19, 2024

@fedejeanne it may help if you share a script that generates an artificial project with such a complex type hierarchy

@iloveeclipse
Copy link
Member Author

@fedejeanne it may help if you share a script that generates an artificial project with such a complex type hierarchy

Have you tried with https://github.com/iloveeclipse/java-project-generator? The code generates large type hierarchies by default.

@fedejeanne
Copy link
Contributor

@fedejeanne can you test if a revert of cc0e66e helps?

I'm on it. It's kind of hacky but I managed to revert the changes of cc0e66e in the I-BUILD 20241118-1800 and I am compiling with it right now.

@fedejeanne it may help if you share a script that generates an artificial project with such a complex type hierarchy

I am going to take a look at the generator (#3327 (comment)) and see if I can provide a large project to reproduce the issue.

@fedejeanne
Copy link
Contributor

fedejeanne commented Nov 19, 2024

I managed to revert the commit and sample again and the build time was reduced back to (almost) that of 2024-06: ~12 m.

20241118-1800-no-cc0e66e-Build all (clean workspace).zip

This confirms your theory @jukzi : cc0e66e introduced the performance degradation.

EDIT: the build time of 2024-06 was ~10 m so the performance of 20241118-1800-no-cc0e66e is not quite the same but it is safe to assume that cc0e66e introduced most of the performance degradation.

@jukzi
Copy link
Contributor

jukzi commented Nov 19, 2024

it is safe to assume that cc0e66e introduced most of the performance degradation.

@stephan-herrmann please take a look.

@iloveeclipse
Copy link
Member Author

EDIT: the build time of 2024-06 was ~10 m so the performance of 20241118-1800-no-cc0e66e is not quite the same but it is safe to assume that cc0e66e introduced most of the performance degradation.

@stephan-herrmann : could you please look at this ticket?

Unfortunately no reproducer yet, but might be you know where the bottleneck could be.

PS
I've played with https://github.com/iloveeclipse/java-project-generator but so far I don't see any difference with/without the commit cc0e66e. Probably there must be something special like multiple inheritance or more generics etc.

@fedejeanne
Copy link
Contributor

PS I've played with https://github.com/iloveeclipse/java-project-generator but so far I don't see any difference with/without the commit cc0e66e. Probably there must be something special like multiple inheritance or more generics etc.

Our product is quite big: ~2000 projects in the workspace and a few 100's of MB in source code. It has also generated code and meta-models (similar to EMF but a bit more complex). So yes: there is multi-inheritance and probably a lot more complex stuff in there. If there's anything I can debug in order to check for such special cases that might be interesting to analyze this issue, please let me know.

@stephan-herrmann
Copy link
Contributor

I happened to see your requests, but I don't currently have the time for anything that could help 4.34.

I don't think the bug fixed by cc0e66e has a "simpler" solution. Java is a complex beast.

So I see these options:

  1. regain performance by sacrificing correctness -- would anybody want that?
  2. create additional caching structures for reducing the performance impact.

Anything along the lines of (2) probably needs quite a bit of experimentation to strike a good balance: don't penalize use cases that are not affected by the current issue, don't waste memory by speculative optimization etc. pp.

=> Sounds like a task for the next release cycle.

@jukzi
Copy link
Contributor

jukzi commented Nov 19, 2024

@stephan-herrmann could you handcraft an idea of an simple reproducer that can trigger this behavior?

don't penalize use cases that are not affected by the current issue,

sounds fair!

@stephan-herrmann
Copy link
Contributor

@stephan-herrmann could you handcraft an idea of an simple reproducer that can trigger this behavior?

cc0e66e has some test cases that trigger the code in question. The original issue #2413 also discusses when/why/where we need to investigate all possible pairs of super types.

Also interesting: in #2413 (comment) a variant is shown, that does not require that fix. Read: it is a matter of heavy type inference combined with wildcards in certain positions.

@jukzi jukzi changed the title [Performance] Full build takes more time since 2024-09 [Performance] Full build takes more time since 2024-09 (type inference) Nov 19, 2024
@jukzi
Copy link
Contributor

jukzi commented Nov 19, 2024

some ideas: As far as i understand deriveTypeArgumentConstraints() calls allSuperPairsWithCommonGenericType() to return a Set which can be significant long and can contain duplicates. After that the only caller incorporate() may skip most of the result because it is only interested if any ConstraintTypeFormula that yields !reduceOneConstraint(). i.e. most of the computation could have be totally unnecessary. Instead one could implement a function that eager returns false while iterating the result elements without ever calculating any Set of Pair<TypeBinding>.
Also the code looks like allSuperPairsWithCommonGenericType(superInterface, t) can generate significant amount of duplicates in its result list, which is not needed and could be avoided by some caching.
Maybe even simply returning a HashSet instead of List would avoid an large inflation of Pairs.

@iloveeclipse
Copy link
Member Author

iloveeclipse commented Nov 19, 2024

I've played with https://github.com/iloveeclipse/java-project-generator but so far I don't see any difference with/without the commit cc0e66e. Probably there must be something special like multiple inheritance or more generics etc.

Hmm. I've added one extra generic argument in my code generator to the already generified generated code and see now that master builds 20% slower as reverted commit (was almost same before).

Not sure if the revert was correct, as had few merge issues which I believed to fix right, but I've run out of time. Also it is not 2x slower, but of course my dummy code is generated in a very simple way. Still 20% seem to be relevant. All what I did was to change
class Foo10<C> extends a.Foo9<C> implements a.IFoo10<C>
to
class Foo10<C, D> extends a.Foo9<C, D> implements a.IFoo10<C, D>
and change variable definition from
Callable<C> callable
to
Callable<? extends C> callable.
Probably adding even more fancy generics could make the difference even higher.
So feel free to test with latest commit on iloveeclipse/java-project-generator@8b6b37d

Update: I've also added more deep type hierarchies, so A extends B extends C etc is deeper as before (not 5 levels but ~1000).

@fedejeanne
Copy link
Contributor

I tried some of the ideas by @jukzi (#3327 (comment)) but the performance didn't really improve. The best improvement I got was from ~30 min to ~29 min.

What I tried

  1. I let allSuperPairsWithCommonGenericType return a Set instead of a List --> Result: No improvement
  2. I changed these methods so they return boolean instead of Lists and I pushed the necessary conditions into them instead of evaluating upon the returned collections (i.e. I implemented the early return): deriveTypeArgumentConstraints, typeArgumentEqualityConstraints and allSuperPairsWithCommonGenericType --> Result: down from ~30 min to ~29 min
  3. I added a very simple cache (a HashMap) to the method org.eclipse.jdt.internal.compiler.lookup.TypeBinding.findSuperTypeOriginatingFrom(TypeBinding) --> Result: No significant improvement and the memory consumption skyrocketed: ~13 GB after the first build and it hung while doing a 2nd build (I used -Xmx16G)

I'll keep looking into it.

Any hints/ideas are appreciated :)

@jukzi
Copy link
Contributor

jukzi commented Nov 20, 2024

3. memory consumption skyrocketed

such may be caused by by insufficient equals() or hashCode() of the key, so that duplicates are not really found.

@fedejeanne
Copy link
Contributor

fedejeanne commented Nov 20, 2024

such may be caused by by insufficient equals() or hashCode() of the key, so that duplicates are not really found.

I tried some variations of equals and hashCode in TypeBinding but it turns out that they are not so efficient. I assume it is because the fields on that class are all public / protected and they are being changed often from the outside, which makes the results of both methods change upon invocations.

In any case, the memory consumption was slightly reduced but the performance didn't improve.

Thank you for the hint though :-)

fedejeanne added a commit to fedejeanne/eclipse.jdt.core that referenced this issue Nov 21, 2024
…pse-jdt#3227

This simple approach was faster than the new one but it didn't cover
some very specific use cases, which is why the newer and more complex
(slower) approach was introduced. This commit brings back the older,
simpler, approach and falls back to the newer, more expensive, approach
only when necessary.

Fixes eclipse-jdt#3327
fedejeanne added a commit to fedejeanne/eclipse.jdt.core that referenced this issue Nov 21, 2024
…pse-jdt#3227

This simple approach was faster than the new one but it didn't cover
some very specific use cases, which is why the newer and more complex
(slower) approach was introduced. This commit brings back the older,
simpler, approach and falls back to the newer, more expensive, approach
only when necessary.

Reduce the visibility of BoundSet::allSuperPairsWithCommonGenericType
from protected to private too.

Fixes eclipse-jdt#3327
fedejeanne added a commit to fedejeanne/eclipse.jdt.core that referenced this issue Nov 21, 2024
…pse-jdt#3227

This simple approach was faster than the new one but it didn't cover
some very specific use cases, which is why the newer and more complex
(slower) approach was introduced. This commit brings back the older,
simpler, approach and falls back to the newer, more expensive, approach
only when necessary.

Reduce the visibility of BoundSet::allSuperPairsWithCommonGenericType
from protected to private too.

Fixes eclipse-jdt#3327
fedejeanne added a commit to fedejeanne/eclipse.jdt.core that referenced this issue Nov 25, 2024
…pse-jdt#3227

This simple approach was faster than the new one but it didn't cover
some very specific use cases, which is why the newer and more complex
(slower) approach was introduced. This commit brings back the older,
simpler, approach and falls back to the newer, more expensive, approach
only when necessary.

Reduce the visibility of BoundSet::allSuperPairsWithCommonGenericType
from protected to private too.

Fixes eclipse-jdt#3327
fedejeanne added a commit to fedejeanne/eclipse.jdt.core that referenced this issue Nov 27, 2024
…pse-jdt#3227

This simple approach was faster than the new one but it didn't cover
some very specific use cases, which is why the newer and more complex
(slower) approach was introduced. This commit brings back the older,
simpler, approach and falls back to the newer, more expensive, approach
only when necessary.

Reduce the visibility of BoundSet::allSuperPairsWithCommonGenericType
from protected to private too.

Fixes eclipse-jdt#3327
stephan-herrmann added a commit to stephan-herrmann/eclipse.jdt.core that referenced this issue Dec 3, 2024
optimize 1:
+ don't record pairs without any type arguments

optimize 2:
+ stop traversal when both types are the same

Fixes eclipse-jdt#3327
stephan-herrmann added a commit to stephan-herrmann/eclipse.jdt.core that referenced this issue Dec 3, 2024
strengthen optimization 1:
+ record only pairs where both types are parameterized

Fixes eclipse-jdt#3327
@jukzi
Copy link
Contributor

jukzi commented Dec 3, 2024

reproducer boils down to:

class ReproducerGitHub3327SlowTypeInference {

	interface I0 {}
	interface I1 extends I0 {}
	interface I2 extends I1, I0 {}
	interface I3 extends I2, I1 {}
	interface I4 extends I3, I2 {}
	interface I5 extends I4, I3 {}
	interface I6 extends I5, I4 {}
	interface I7 extends I6, I5 {}
	interface I8 extends I7, I6 {}
	interface I9 extends I8, I7 {}
	interface I10 extends I9, I8 {}
	interface I11 extends I10, I9 {}
	interface I12 extends I11, I10 {}
	interface I13 extends I12, I11 {}
	interface I14 extends I13, I12 {}
	interface I15 extends I14, I13 {}
	interface I16 extends I15, I14 {}
	interface I17 extends I16, I15 {}
	interface I18 extends I17, I16 {}
	interface I19 extends I18, I17 {}
	interface I20 extends I19, I18 {}
	interface I21 extends I20, I19 {}
	interface I22 extends I21, I20 {}
	interface I23 extends I22, I21 {}
	interface I24 extends I23, I22 {}
	interface I25 extends I24, I23 {}
	interface I26 extends I25, I24 {}
	interface I27 extends I26, I25 {}
	interface I28 extends I27, I26 {}
	interface I29 extends I28, I27 {}
	interface I30 extends I29, I28 {}
//	...
	
	interface TopLevel extends I20 {} // play with that number, but be carefull! ;-)
	interface TopLevel2 extends TopLevel{}
	interface TopLevel3 extends TopLevel2 {}
	public static TopLevel3 main(TopLevel t) {
		return method(t, TopLevel3.class);
	}

	public static <T extends TopLevel2> T method(TopLevel a, Class<T> b) {
		return null;
	}
}

stephan-herrmann added a commit to stephan-herrmann/eclipse.jdt.core that referenced this issue Dec 3, 2024
(eclipse-jdt#3384)

optimize 3:
+ never visit the same super type more than once

Fixes eclipse-jdt#3327
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment