Replies: 4 comments 5 replies
-
It seems there is some code |
Beta Was this translation helpful? Give feedback.
-
FWIW, it's not at all a new topic and there were already some increments to improve that. A few years ago, m2e was incapable of processing projects with more than ~100 modules; and this got improved by smartening up the repository manager and import, to minimize the amount of MavenProject instantiated. IMavenProjectFacade must be retained forever (at least until the pom is modified and then updated/recreated). There must be 1 MavenProjectFacade for each Maven project in the IDE (ie an IProject with m2e nature); we could even consider creating 1 MavenProjectFacade for each pom resource in the future. On the other hand the MavenProject is a memory expensive object; and it does not scale to keep too many of them; that's why they are managed in a cache. The current cache could probably be smarter and IMavenProjectFacade.getProject() could also be smarter (I'll come back to it later); but we'll always need discard MavenProject from memory to not burn RAM. For bigger projects (eg Apache Camel or Fuse) we're talking about several GB of data. The MavenProjectFacade is created by loading the MavenProject once and keeping the interesting data to reuse often in the IDE, such as the GAV. Most other consuming data (eg resolved deps) is dropped,
They do not assume the cache is hot, I guess they intentionally avoid loading the project in the cache if there is no compelling reason to do it (eg the project is not likely to be reused soon).
That's a wrong assumption. Some operations may accept the project not being available in memory and have logic to troubleshot that.
A centralized cache is necessary because you need to 1. avoid loading projects too often (consumes much CPU/time) while 2. not keeping all of them in memory (too much RAM). The current approach isn't too bad, but can be improved:
It is big. If you profile a deep module of Apache Camel, the MavenProject instance for this module retain almost 2MB. |
Beta Was this translation helpful? Give feedback.
-
This is spot on! We had to add some hacks^Woptimizations to our fork of m2e so that it performs good enough with many projects in the workspace. In particular, we make use of the maven-tiles extension which creates "virtual parent projects" under the hood for mixin-style reuse of Maven configuration snippets. So we have not only many projects in the workspace, but also a deep (virtual) parent hierarchy, and ran both into
FWIW, our optimizations are in this branch: https://github.com/GEBIT/m2e-core/commits/1.13.0-GEBIT
They may not be useful as is, and might make you blind if you look at them, so be careful 😉 |
Beta Was this translation helpful? Give feedback.
-
I implemented a real "dumb" deduplication for parent projects already loaded and it performed very well:
So it currently only has one cache miss and reduced the number of projects cached from 3500 > 575 in the camel example! |
Beta Was this translation helpful? Give feedback.
-
I'm currently investigating a bit about the mavenlifcyclelisteners and cam across the
ProjectRegistryManager#readProjectsWithDependencies
vsProjectRegistryManager#readMavenProjectFacades
... both read the maven model, but it seems a read for the facade actually almost always creates a read on the project (with dependencies).I would assume the first is actually to be able to read the facade even if it has unresolved dependencies.
Apart from this,
IMavenProjectFacade
are retained forever, while theMavenProject
cache itself is limited to 20 items by default and we just keep some basic data.Then there is
IMavenProjectFacade#getMavenProject()
that only return cached value andIMavenProjectFacade#getMavenProject(IProgressMonitor)
that force loads the maven project. And while i would assume that the first is very seldom used, it actually is used very often, so many parts seem to assume the cache is "hot".This leads me to the question: What should we actually retain? Maybe it is even enough to only store the GAV, given that most access seem to assume the project is fetched anyways?
And should we use a fixed cache, or maybe better let the facade cache the MavenProject in the facade itself with a WeakReference or something letting java clean out everything if required?
I even did some quick profiling and it seems the mavenProject itself is not very big, but keeps a reference to the buildingrequest what itself is much larger but seems unused after construction. Sadly I don't know if there is some big testing project one probably could use to verify this, is there any?
Beta Was this translation helpful? Give feedback.
All reactions