Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

mbladel · 2024-10-08T11:13:46Z

Partially addresses #43692

This change allows Hibernate to avoid using reflection when instantiating managed classes by creating org.hibernate.bytecode.spi.ReflectionOptimizer.InstantiationOptimizers at build-time, and making them available at runtime.

Access optimizers are not part of this change as we don't have enough information to generate them at build-time, see the linked issue for more details.

quarkus-bot · 2024-10-08T11:14:05Z

/cc @gsmet (hibernate-orm), @yrodiere (hibernate-orm)

yrodiere

Thanks, I added a comment below.

Though I would still love @franz1981 to explain why bytecode generation is to be preferred to reflection: I understand why java.reflect is slow, but I'd expect java.lang.invoke.MethodHandle, at least, to be just as effective/flexible as custom generated code like this, especially since it will likely lead to megamorphic calls. Otherwise I'm probably missing some constraints that make it impossible for method handles to perform adequately.

.../java/io/quarkus/hibernate/orm/runtime/service/bytecodeprovider/RuntimeBytecodeProvider.java

franz1981 · 2024-10-08T13:03:54Z

Though I would still love @franz1981 to explain why bytecode generation is to be preferred to reflection: I understand why java.reflect is slow, but I'd expect java.lang.invoke.MethodHandle

I have added some comments to the original issue exactly on the generation at startup part (which probably is not what we want).
Instead, for MethodHandle there are few points:

recent JVMs re implemented reflection with MethodHandle
such JEP reports that only by storing MethodHandle/Reflection Field/Methods into static final fields would get the expected speedup (which means saving security checks and other odd checks..), and instead mich harder slowdown without doing it
the actual speedup to measure should be measured in proper benchmark

Considered the granularity of the existing optimized accessor API (which work with sets of properties, in batch) the generation part can be branchy, at worse, or very optimized in case the fields requested are a known set, granting both inlining to happen and saving some branch.
Whilst a solution which relies on MH/Reflection, can easily become megamorphic based on the Cartesian Product (fields X type).
I can further explains this with an example or via a call.

yrodiere · 2024-10-08T13:21:02Z

* the actual speedup to measure should be measured in proper benchmark

+1 on that. Maybe let's work towards a few very ugly prototypes (e.g. with an entity model hardcoded into the Quarkus Hibernate ORM extension) and a benchmark before we spend significant time on either solution.

yrodiere

LGTM, thanks. @franz1981 should we merge as-is, or run a benchmark first, since the only goal of this change is to improve performance?
I know it's not fixing the problem you noticed, but it's related.
I suppose the best way to artificially highlight entity instantiation performance in a benchmark is to perform a single query with lots of returned entities, each with only a single attribute (its ID). You can even test it without a network access by calling EntityManager#getReference(Class entityClass, Object id) many times.

mbladel · 2024-10-08T13:58:19Z

You can even test it without a network access by calling EntityManager#getReference(Class entityClass, Object id) many times.

Actually, EntityManager#getReference will create proxies, not actual entity instances, and so will not go through the instantiation optimizers. If we want a synthetic way of creating many entities without hitting the database, I would suggest something like:

SessionFactoryImplementor sessionFactory = em.getEntityManagerFactory().unwrap(SessionFactoryImplementor.class);
Object entityInstance = sessionFactory.getMappingMetamodel().findEntityDescriptor(entityClassName)
        .getRepresentationStrategy().getInstantiator().instantiate(sessionFactory);

yrodiere · 2024-10-08T14:01:19Z

Actually, EntityManager#getReference will create proxies, not actual entity instances, and so will not go through the instantiation optimizers

Even with bytecode enhancement, which results in the actual entity classes being used as proxies? That's weird, no? Should we fix it?

mbladel · 2024-10-08T14:08:29Z

Bytecode proxies are not actually the same as the entity class, it's subclasses (named like <entity_class_name>$HibernateProxy...) of the original entity class, also generated at build time. For anyone curious, you can find the implementation that generates those using byte buddy here.

They get instantiated differently, by the QuarkusProxyFactory#getProxy method, so they do not go through the instantiation optimizers. It's not a bug, I believe it always worked like this.

franz1981 · 2024-10-08T14:23:25Z

I am at devoxx but:

yes would be better still to have numbers
it could be implemented in a synthetic jmh benchmark too (I can help with it, in some regard - but I need help as well from you guys)
I suggest to add docs with the expected "shape" of bytecode generated by using some decompiler to verify it

yrodiere · 2024-10-08T15:00:02Z

It's not a bug, I believe it always worked like this.

Right, I did not mean to imply it's a bug. "Should we improve it?", if you prefer.

Bytecode proxies are not actually the same as the entity class, it's subclasses (named like <entity_class_name>$HibernateProxy...) of the original entity class, also generated at build time. For anyone curious, you can find the implementation that generates those using byte buddy here.

Wild. I'd have expected Hibernate ORM to use the enhanced entity class, just with all fields un-initialized. Since the enhanced class can actually handle lazy initialization -- and, at least since a recent patch from Andrea, can handle arbitrary fields being marked as lazy, to support entity graphs.
Though I suppose that wouldn't be possible when the entity class being requested has entity subclasses.

Does what I'm saying at least make sense? I suppose there are reasons not to do it that way, though.

franz1981 · 2024-10-08T15:54:36Z

I have checked what quarkus + hibernate generate at build time at using https://github.com/quarkusio/quarkus-quickstarts/tree/3.15/hibernate-orm-quickstart and adding quarkus.package.jar.decompiler.enabled=true
at https://github.com/quarkusio/quarkus-quickstarts/blob/3.15/hibernate-orm-quickstart/src/main/resources/application.properties: the decompiled classes should be dumped in the target folder (under decompiled children)

quarkus-bot · 2024-10-09T07:38:48Z

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit ae95894.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 extensions/smallrye-reactive-messaging-kafka/deployment

✖ io.quarkus.smallrye.reactivemessaging.kafka.deployment.testing.KafkaDevServicesContinuousTestingWorkingAppPropsTestCase.testContinuousTestingScenario3 - History

io.quarkus.builder.BuildException: Build failure: Build failed due to errors [error]: Build step io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor\#startApicurioRegistryDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image quay.io/apicurio/apicurio-registry-mem:2.4.2.Final at io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor.startApicurioRegistryDevService(DevServicesApicurioRegistryProcessor.java:90) at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732) at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:856) - java.lang.RuntimeException

java.lang.RuntimeException: 
io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor#startApicurioRegistryDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image quay.io/apicurio/apicurio-registry-mem:2.4.2.Final
	at io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor.startApicurioRegistryDevService(DevServicesApicurioRegistryProcessor.java:90)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
	at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:856)
	at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)

yrodiere · 2024-10-09T08:11:40Z

I have checked what quarkus + hibernate generate at build time

Yes, bytecode enhancement happens at build time, both in Quarkus and (in most cases) in vanilla Hibernate ORM. That's possible because bytecode enhancement relies on relatively simple metadata (basically just annotations).

Instantiation optimizers require even less metadata (basically just the list of managed classes) so they can be generated at build time too.

Access optimizer generation, however, happens at bootstrap in vanilla Hibernate ORM, and cannot happen at build time at the moment, even in Quarkus, because it requires advanced metadata that is only available later, at static init.

franz1981 · 2024-10-09T09:12:32Z

@yrodiere

because it requires advanced metadata that is only available later, at static init.

Given that I am ignorant on this topic (but eager to learn) - what about specific (hopefully worthy?) cases which can make it possible? They can even exists?
From what I see we can have different optimizers per Class so...maybe there are immutable or other cases which can still partially benefit?

yrodiere · 2024-10-09T09:33:08Z

@yrodiere

because it requires advanced metadata that is only available later, at static init.

Given that I am ignorant on this topic (but eager to learn) - what about specific (hopefully worthy?) cases which can make it possible? They can even exists? From what I see we can have different optimizers per Class so...maybe there are immutable or other cases which can still partially benefit?

Immutable classes can still have all sorts of complicated attributes that would require this advanced metadata to handle them properly.
Also, immutable classes are a rather niche usecase, though YMMV. They're mainly useful for embeddables or embedded IDs, but even then they're not that frequent.

I do not think trying to improve performance in specific cases is worth the effort: even just relyably detecting the few "supportable" cases would be significant work, because that amounts to building that advanced metadata.

Not to mention that we've been introducing lots of complicated, duplicated code in Quarkus these past few years, and I'm not keen to continue that. If there's a need of improvements, let's implement them upstream...

We can start with the easiest solution, which would be mehtod handles, and working our way towards whatever is best. We could for example start adapting Hibernate ORM to build the optimizer based on Metadata, which we could then consider building at build-time just for the purpose of generating optimizers, and as a third step we'd reuse that metadata during static init.
There are all kinds of ways to improve performance, but let's just not create more edge cases and technical debt, please.

Though I'd be tempted to first evaluate the performance of all approaches we discussed (method handles, generated bytecode, ...) using dirty prototypes (e.g. harcoding the metadata of a particular model at build time), so that we can evaluate what's at stake and how to prioritize the effort.

mbladel · 2024-10-09T09:38:43Z

Though I'd be tempted to first evaluate the performance of all approaches we discussed (method handles, generated bytecode, ...) using dirty prototypes (e.g. harcoding the metadata of a particular model at build time), so that we can evaluate what's at stake and how to prioritize the effort.

Method handles is going to require some exploration on the Hibernate ORM side first, that can be of course be pushed if we think it might benefit performance overall, though the inlining requiring static final fields might make the actual benefits harder to achieve than a simple switch to new APIs.

I have a "dirty prototype" almost ready that builds access optimizers at static-init time, if we only want to measure runtime performance we might also use that. I would like to also work on a benchmark just the instantiation optimizers in the context of this PR, but I would need some help creating a meaningful benchmark in the Quarkus context from you guys.

yrodiere · 2024-10-09T10:29:01Z

I would need some help creating a meaningful benchmark in the Quarkus context from you guys.

Let's just start with whatever benchmark @franz1981 was using when he reported the reflection usage as a problem? We can always work our way to more focused benchmarks from there... if necessary.

mbladel · 2024-10-10T13:21:47Z

This will surely avoid reflective access for mapped object allocation, and all the bytecode generation is done at build-time, so performance impact should be very minor. I'll wait to hear from @franz1981 if we want to go ahead, or if there's anything else I can do for this.

franz1981 · 2024-10-10T13:25:36Z

As it stands right now is fine to me, need to hear @geoand as well for the generation part - I would likely to see what is produced, but I can run it myself as soon as I come back from devoxx

mbladel · 2024-10-10T13:29:11Z

I've actually added javadoc to the method which does bytecode generation with an example of what's produced, For example, the generated bytecode for an entity com.example.MyEntity will look something like:

package com.example;

import org.hibernate.bytecode.spi.ReflectionOptimizer;

public class MyEntity$QuarkusInstantiator implements ReflectionOptimizer.InstantiationOptimizer {
    public Object newInstance() {
        return new MyEntity();
    }
}

I've checked by setting -Dquarkus.debug.generated-classes-dir and decompiling the generated class files locally.

franz1981 · 2024-10-10T13:37:22Z

Perfect, thanks @mbladel

franz1981

LGTM and thanks again, well done 👍

geoand · 2024-10-14T07:00:29Z

Let's also get an approval from @yrodiere

yrodiere · 2024-10-14T07:23:02Z

I'm busy with (much) higher priority stuff right now unfortunately.

If you want to merge this, fine, but again I'd appreciate benchmarks and actual numbers to justify the added complexity, @franz1981 . While @mbladel's work looks very nice, it's still more code and thus more potential for bugs, and this could just perform worse for all I know (hint: I know nothing :) ).

franz1981 · 2024-10-14T10:25:40Z

and this could just perform worse for all I know (hint: I know nothing :) ).

It could be possible to write a benchmark in https://github.com/hibernate/hibernate-orm-benchmark by using the code produced by quarkus build while adding quarkus.package.jar.decompiler.enabled=true to the application.properties: I can help designing the benchmark @mbladel wdyt?
This can be useful for your team if you wish.

Said that, @yrodiere although I'm aware that being cautious is important (so let's do it), there is some very "difficult to measure" effects in reflection which make it saving is a good choice regardless i.e.:

pre JDK 18: "reflection deinflation": the optimized constructors/accessors are generated based on an inflection thresholds , see https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/java.base/share/classes/jdk/internal/reflect/NativeConstructorAccessorImpl.java#L65 , which means that additional RSS and CPU time is required (once, per entity eh!) to generate the bytecode on the flight (see Malloc Arena RSS increase between 2.16.11.Final and 3.x/main #36204 (comment) which shows that bytecode generation is not healthy for RSS footprint in JVM mode)
post JDK 18: still learning the process here, it is likely it is still performing some non-naive compilation/generation at runtime, but I have to reached out the JDK team to learn better about its trade-offs
regardless JDK version: as the flamegraphs I've posted shows, the reflective construction still perform few checks (see below) - which doesn't happen in the "real" allocation-based version

with reflection:

without:

where the non-reflective cost become negligible (the nearly invisible tiny bar leftmost) if compared to the checks performed by the reflective ones

These changes usually are not granted to deliver "big" value alone (if not on microbenchs) but can easily pile-up or just become more relevant if combined with other factors - which makes it hard to judge the ratio of complexity vs effectiveness.
That doesn't mean is not worthy, especially when the amount of changes is little and self-contained enough.

mbladel · 2024-10-15T12:06:14Z

It could be possible to write a benchmark in https://github.com/hibernate/hibernate-orm-benchmark by using the code produced by quarkus build while adding quarkus.package.jar.decompiler.enabled=true to the application.properties: I can help designing the benchmark @mbladel wdyt?
This can be useful for your team if you wish.

Sounds good to me, I'll try to prepare the basic structure and see if I can get some significant results (FYI I'm away for a few days, I'll update you when I'm back on my PC).

mbladel · 2024-10-22T14:46:52Z

@franz1981 sorry for the delay, I'm back and I managed to prepare a benchmark that measures the difference with the optimized vs non-optimized (i.e. reflection) cases: hibernate/hibernate-orm-benchmark#13. The code itself is very simple, still I'd really appreciate if you could confirm whether my approach is correct.

@yrodiere you can find my initial findings in terms of performance numbers and flamegraphs in the PR I linked.

quarkus-bot bot added the area/hibernate-orm Hibernate ORM label Oct 8, 2024

yrodiere reviewed Oct 8, 2024

View reviewed changes

.../java/io/quarkus/hibernate/orm/runtime/service/bytecodeprovider/RuntimeBytecodeProvider.java Outdated Show resolved Hide resolved

mbladel force-pushed the instantiation branch 2 times, most recently from 1fe3bdc to 6f715dc Compare October 8, 2024 12:58

mbladel force-pushed the instantiation branch from 6f715dc to 621e0e9 Compare October 8, 2024 13:32

yrodiere reviewed Oct 8, 2024

View reviewed changes

mbladel force-pushed the instantiation branch 2 times, most recently from e368395 to f44a690 Compare October 8, 2024 14:53

This comment has been minimized.

Sign in to view

quarkus-bot bot added the triage/flaky-test label Oct 8, 2024

Generate Hibernate ORM InstantiationOptimizers to avoid reflection

ae95894

mbladel force-pushed the instantiation branch from f44a690 to ae95894 Compare October 8, 2024 16:17

franz1981 approved these changes Oct 10, 2024

View reviewed changes

geoand approved these changes Oct 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

mbladel commented Oct 8, 2024

quarkus-bot bot commented Oct 8, 2024

yrodiere left a comment

franz1981 commented Oct 8, 2024

yrodiere commented Oct 8, 2024

yrodiere left a comment

mbladel commented Oct 8, 2024

yrodiere commented Oct 8, 2024

mbladel commented Oct 8, 2024 •

edited

Loading

franz1981 commented Oct 8, 2024

yrodiere commented Oct 8, 2024 •

edited

Loading

This comment has been minimized.

franz1981 commented Oct 8, 2024 •

edited

Loading

quarkus-bot bot commented Oct 9, 2024 •

edited by github-actions bot

Loading

yrodiere commented Oct 9, 2024

franz1981 commented Oct 9, 2024 •

edited

Loading

yrodiere commented Oct 9, 2024

mbladel commented Oct 9, 2024

yrodiere commented Oct 9, 2024

mbladel commented Oct 10, 2024

franz1981 commented Oct 10, 2024

mbladel commented Oct 10, 2024

franz1981 commented Oct 10, 2024

franz1981 left a comment •

edited

Loading

geoand commented Oct 14, 2024

yrodiere commented Oct 14, 2024

franz1981 commented Oct 14, 2024 •

edited

Loading

mbladel commented Oct 15, 2024

mbladel commented Oct 22, 2024

Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

Are you sure you want to change the base?

Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

Conversation

mbladel commented Oct 8, 2024

quarkus-bot bot commented Oct 8, 2024

yrodiere left a comment

Choose a reason for hiding this comment

franz1981 commented Oct 8, 2024

yrodiere commented Oct 8, 2024

yrodiere left a comment

Choose a reason for hiding this comment

mbladel commented Oct 8, 2024

yrodiere commented Oct 8, 2024

mbladel commented Oct 8, 2024 • edited Loading

franz1981 commented Oct 8, 2024

yrodiere commented Oct 8, 2024 • edited Loading

This comment has been minimized.

franz1981 commented Oct 8, 2024 • edited Loading

quarkus-bot bot commented Oct 9, 2024 • edited by github-actions bot Loading

Status for workflow Quarkus CI

Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 extensions/smallrye-reactive-messaging-kafka/deployment

yrodiere commented Oct 9, 2024

franz1981 commented Oct 9, 2024 • edited Loading

yrodiere commented Oct 9, 2024

mbladel commented Oct 9, 2024

yrodiere commented Oct 9, 2024

mbladel commented Oct 10, 2024

franz1981 commented Oct 10, 2024

mbladel commented Oct 10, 2024

franz1981 commented Oct 10, 2024

franz1981 left a comment • edited Loading

Choose a reason for hiding this comment

geoand commented Oct 14, 2024

yrodiere commented Oct 14, 2024

franz1981 commented Oct 14, 2024 • edited Loading

mbladel commented Oct 15, 2024

mbladel commented Oct 22, 2024

mbladel commented Oct 8, 2024 •

edited

Loading

yrodiere commented Oct 8, 2024 •

edited

Loading

franz1981 commented Oct 8, 2024 •

edited

Loading

quarkus-bot bot commented Oct 9, 2024 •

edited by github-actions bot

Loading

Status for workflow `Quarkus CI`

franz1981 commented Oct 9, 2024 •

edited

Loading

franz1981 left a comment •

edited

Loading

franz1981 commented Oct 14, 2024 •

edited

Loading