Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Hibernate ORM InstantiationOptimizers to avoid reflection #43767

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mbladel
Copy link

@mbladel mbladel commented Oct 8, 2024

Partially addresses #43692

This change allows Hibernate to avoid using reflection when instantiating managed classes by creating org.hibernate.bytecode.spi.ReflectionOptimizer.InstantiationOptimizers at build-time, and making them available at runtime.

Access optimizers are not part of this change as we don't have enough information to generate them at build-time, see the linked issue for more details.

@quarkus-bot quarkus-bot bot added the area/hibernate-orm Hibernate ORM label Oct 8, 2024
@quarkus-bot
Copy link

quarkus-bot bot commented Oct 8, 2024

/cc @gsmet (hibernate-orm), @yrodiere (hibernate-orm)

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a comment below.

Though I would still love @franz1981 to explain why bytecode generation is to be preferred to reflection: I understand why java.reflect is slow, but I'd expect java.lang.invoke.MethodHandle, at least, to be just as effective/flexible as custom generated code like this, especially since it will likely lead to megamorphic calls. Otherwise I'm probably missing some constraints that make it impossible for method handles to perform adequately.

@mbladel mbladel force-pushed the instantiation branch 2 times, most recently from 1fe3bdc to 6f715dc Compare October 8, 2024 12:58
@franz1981
Copy link
Contributor

Though I would still love @franz1981 to explain why bytecode generation is to be preferred to reflection: I understand why java.reflect is slow, but I'd expect java.lang.invoke.MethodHandle

I have added some comments to the original issue exactly on the generation at startup part (which probably is not what we want).
Instead, for MethodHandle there are few points:

  • recent JVMs re implemented reflection with MethodHandle
  • such JEP reports that only by storing MethodHandle/Reflection Field/Methods into static final fields would get the expected speedup (which means saving security checks and other odd checks..), and instead mich harder slowdown without doing it
  • the actual speedup to measure should be measured in proper benchmark

Considered the granularity of the existing optimized accessor API (which work with sets of properties, in batch) the generation part can be branchy, at worse, or very optimized in case the fields requested are a known set, granting both inlining to happen and saving some branch.
Whilst a solution which relies on MH/Reflection, can easily become megamorphic based on the Cartesian Product (fields X type).
I can further explains this with an example or via a call.

@yrodiere
Copy link
Member

yrodiere commented Oct 8, 2024

* the actual speedup to measure should be measured in proper benchmark

+1 on that. Maybe let's work towards a few very ugly prototypes (e.g. with an entity model hardcoded into the Quarkus Hibernate ORM extension) and a benchmark before we spend significant time on either solution.

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. @franz1981 should we merge as-is, or run a benchmark first, since the only goal of this change is to improve performance?
I know it's not fixing the problem you noticed, but it's related.
I suppose the best way to artificially highlight entity instantiation performance in a benchmark is to perform a single query with lots of returned entities, each with only a single attribute (its ID). You can even test it without a network access by calling EntityManager#getReference(Class entityClass, Object id) many times.

@mbladel
Copy link
Author

mbladel commented Oct 8, 2024

You can even test it without a network access by calling EntityManager#getReference(Class entityClass, Object id) many times.

Actually, EntityManager#getReference will create proxies, not actual entity instances, and so will not go through the instantiation optimizers. If we want a synthetic way of creating many entities without hitting the database, I would suggest something like:

SessionFactoryImplementor sessionFactory = em.getEntityManagerFactory().unwrap(SessionFactoryImplementor.class);
Object entityInstance = sessionFactory.getMappingMetamodel().findEntityDescriptor(entityClassName)
        .getRepresentationStrategy().getInstantiator().instantiate(sessionFactory);

@yrodiere
Copy link
Member

yrodiere commented Oct 8, 2024

Actually, EntityManager#getReference will create proxies, not actual entity instances, and so will not go through the instantiation optimizers

Even with bytecode enhancement, which results in the actual entity classes being used as proxies? That's weird, no? Should we fix it?

@mbladel
Copy link
Author

mbladel commented Oct 8, 2024

Bytecode proxies are not actually the same as the entity class, it's subclasses (named like <entity_class_name>$HibernateProxy...) of the original entity class, also generated at build time. For anyone curious, you can find the implementation that generates those using byte buddy here.

They get instantiated differently, by the QuarkusProxyFactory#getProxy method, so they do not go through the instantiation optimizers. It's not a bug, I believe it always worked like this.

@franz1981
Copy link
Contributor

I am at devoxx but:

  • yes would be better still to have numbers
  • it could be implemented in a synthetic jmh benchmark too (I can help with it, in some regard - but I need help as well from you guys)
  • I suggest to add docs with the expected "shape" of bytecode generated by using some decompiler to verify it

@mbladel mbladel force-pushed the instantiation branch 2 times, most recently from e368395 to f44a690 Compare October 8, 2024 14:53
@yrodiere
Copy link
Member

yrodiere commented Oct 8, 2024

It's not a bug, I believe it always worked like this.

Right, I did not mean to imply it's a bug. "Should we improve it?", if you prefer.

Bytecode proxies are not actually the same as the entity class, it's subclasses (named like <entity_class_name>$HibernateProxy...) of the original entity class, also generated at build time. For anyone curious, you can find the implementation that generates those using byte buddy here.

Wild. I'd have expected Hibernate ORM to use the enhanced entity class, just with all fields un-initialized. Since the enhanced class can actually handle lazy initialization -- and, at least since a recent patch from Andrea, can handle arbitrary fields being marked as lazy, to support entity graphs.
Though I suppose that wouldn't be possible when the entity class being requested has entity subclasses.

Does what I'm saying at least make sense? I suppose there are reasons not to do it that way, though.

@quarkus-bot

This comment has been minimized.

@franz1981
Copy link
Contributor

franz1981 commented Oct 8, 2024

I have checked what quarkus + hibernate generate at build time at using https://github.com/quarkusio/quarkus-quickstarts/tree/3.15/hibernate-orm-quickstart and adding quarkus.package.jar.decompiler.enabled=true
at https://github.com/quarkusio/quarkus-quickstarts/blob/3.15/hibernate-orm-quickstart/src/main/resources/application.properties: the decompiled classes should be dumped in the target folder (under decompiled children)

@quarkus-bot
Copy link

quarkus-bot bot commented Oct 9, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit ae95894.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.


Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 extensions/smallrye-reactive-messaging-kafka/deployment

io.quarkus.smallrye.reactivemessaging.kafka.deployment.testing.KafkaDevServicesContinuousTestingWorkingAppPropsTestCase.testContinuousTestingScenario3 - History

  • io.quarkus.builder.BuildException: Build failure: Build failed due to errors [error]: Build step io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor\#startApicurioRegistryDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image quay.io/apicurio/apicurio-registry-mem:2.4.2.Final at io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor.startApicurioRegistryDevService(DevServicesApicurioRegistryProcessor.java:90) at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732) at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:856) - java.lang.RuntimeException
java.lang.RuntimeException: 
io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor#startApicurioRegistryDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image quay.io/apicurio/apicurio-registry-mem:2.4.2.Final
	at io.quarkus.apicurio.registry.devservice.DevServicesApicurioRegistryProcessor.startApicurioRegistryDevService(DevServicesApicurioRegistryProcessor.java:90)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
	at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:856)
	at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)

@yrodiere
Copy link
Member

yrodiere commented Oct 9, 2024

I have checked what quarkus + hibernate generate at build time

Yes, bytecode enhancement happens at build time, both in Quarkus and (in most cases) in vanilla Hibernate ORM. That's possible because bytecode enhancement relies on relatively simple metadata (basically just annotations).

Instantiation optimizers require even less metadata (basically just the list of managed classes) so they can be generated at build time too.

Access optimizer generation, however, happens at bootstrap in vanilla Hibernate ORM, and cannot happen at build time at the moment, even in Quarkus, because it requires advanced metadata that is only available later, at static init.

@franz1981
Copy link
Contributor

franz1981 commented Oct 9, 2024

@yrodiere

because it requires advanced metadata that is only available later, at static init.

Given that I am ignorant on this topic (but eager to learn) - what about specific (hopefully worthy?) cases which can make it possible? They can even exists?
From what I see we can have different optimizers per Class so...maybe there are immutable or other cases which can still partially benefit?

@yrodiere
Copy link
Member

yrodiere commented Oct 9, 2024

@yrodiere

because it requires advanced metadata that is only available later, at static init.

Given that I am ignorant on this topic (but eager to learn) - what about specific (hopefully worthy?) cases which can make it possible? They can even exists? From what I see we can have different optimizers per Class so...maybe there are immutable or other cases which can still partially benefit?

Immutable classes can still have all sorts of complicated attributes that would require this advanced metadata to handle them properly.
Also, immutable classes are a rather niche usecase, though YMMV. They're mainly useful for embeddables or embedded IDs, but even then they're not that frequent.

I do not think trying to improve performance in specific cases is worth the effort: even just relyably detecting the few "supportable" cases would be significant work, because that amounts to building that advanced metadata.

Not to mention that we've been introducing lots of complicated, duplicated code in Quarkus these past few years, and I'm not keen to continue that. If there's a need of improvements, let's implement them upstream...

We can start with the easiest solution, which would be mehtod handles, and working our way towards whatever is best. We could for example start adapting Hibernate ORM to build the optimizer based on Metadata, which we could then consider building at build-time just for the purpose of generating optimizers, and as a third step we'd reuse that metadata during static init.
There are all kinds of ways to improve performance, but let's just not create more edge cases and technical debt, please.

Though I'd be tempted to first evaluate the performance of all approaches we discussed (method handles, generated bytecode, ...) using dirty prototypes (e.g. harcoding the metadata of a particular model at build time), so that we can evaluate what's at stake and how to prioritize the effort.

@mbladel
Copy link
Author

mbladel commented Oct 9, 2024

Though I'd be tempted to first evaluate the performance of all approaches we discussed (method handles, generated bytecode, ...) using dirty prototypes (e.g. harcoding the metadata of a particular model at build time), so that we can evaluate what's at stake and how to prioritize the effort.

Method handles is going to require some exploration on the Hibernate ORM side first, that can be of course be pushed if we think it might benefit performance overall, though the inlining requiring static final fields might make the actual benefits harder to achieve than a simple switch to new APIs.

I have a "dirty prototype" almost ready that builds access optimizers at static-init time, if we only want to measure runtime performance we might also use that. I would like to also work on a benchmark just the instantiation optimizers in the context of this PR, but I would need some help creating a meaningful benchmark in the Quarkus context from you guys.

@yrodiere
Copy link
Member

yrodiere commented Oct 9, 2024

I would need some help creating a meaningful benchmark in the Quarkus context from you guys.

Let's just start with whatever benchmark @franz1981 was using when he reported the reflection usage as a problem? We can always work our way to more focused benchmarks from there... if necessary.

@mbladel
Copy link
Author

mbladel commented Oct 10, 2024

This will surely avoid reflective access for mapped object allocation, and all the bytecode generation is done at build-time, so performance impact should be very minor. I'll wait to hear from @franz1981 if we want to go ahead, or if there's anything else I can do for this.

@franz1981
Copy link
Contributor

As it stands right now is fine to me, need to hear @geoand as well for the generation part - I would likely to see what is produced, but I can run it myself as soon as I come back from devoxx

@mbladel
Copy link
Author

mbladel commented Oct 10, 2024

I've actually added javadoc to the method which does bytecode generation with an example of what's produced, For example, the generated bytecode for an entity com.example.MyEntity will look something like:

package com.example;

import org.hibernate.bytecode.spi.ReflectionOptimizer;

public class MyEntity$QuarkusInstantiator implements ReflectionOptimizer.InstantiationOptimizer {
    public Object newInstance() {
        return new MyEntity();
    }
}

I've checked by setting -Dquarkus.debug.generated-classes-dir and decompiling the generated class files locally.

@franz1981
Copy link
Contributor

Perfect, thanks @mbladel

Copy link
Contributor

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks again, well done 👍

@geoand
Copy link
Contributor

geoand commented Oct 14, 2024

Let's also get an approval from @yrodiere

@yrodiere
Copy link
Member

I'm busy with (much) higher priority stuff right now unfortunately.

If you want to merge this, fine, but again I'd appreciate benchmarks and actual numbers to justify the added complexity, @franz1981 . While @mbladel's work looks very nice, it's still more code and thus more potential for bugs, and this could just perform worse for all I know (hint: I know nothing :) ).

@franz1981
Copy link
Contributor

franz1981 commented Oct 14, 2024

and this could just perform worse for all I know (hint: I know nothing :) ).

It could be possible to write a benchmark in https://github.com/hibernate/hibernate-orm-benchmark by using the code produced by quarkus build while adding quarkus.package.jar.decompiler.enabled=true to the application.properties: I can help designing the benchmark @mbladel wdyt?
This can be useful for your team if you wish.

Said that, @yrodiere although I'm aware that being cautious is important (so let's do it), there is some very "difficult to measure" effects in reflection which make it saving is a good choice regardless i.e.:

with reflection:
image

without:
image

where the non-reflective cost become negligible (the nearly invisible tiny bar leftmost) if compared to the checks performed by the reflective ones

These changes usually are not granted to deliver "big" value alone (if not on microbenchs) but can easily pile-up or just become more relevant if combined with other factors - which makes it hard to judge the ratio of complexity vs effectiveness.
That doesn't mean is not worthy, especially when the amount of changes is little and self-contained enough.

@mbladel
Copy link
Author

mbladel commented Oct 15, 2024

It could be possible to write a benchmark in https://github.com/hibernate/hibernate-orm-benchmark by using the code produced by quarkus build while adding quarkus.package.jar.decompiler.enabled=true to the application.properties: I can help designing the benchmark @mbladel wdyt?
This can be useful for your team if you wish.

Sounds good to me, I'll try to prepare the basic structure and see if I can get some significant results (FYI I'm away for a few days, I'll update you when I'm back on my PC).

@mbladel
Copy link
Author

mbladel commented Oct 22, 2024

@franz1981 sorry for the delay, I'm back and I managed to prepare a benchmark that measures the difference with the optimized vs non-optimized (i.e. reflection) cases: hibernate/hibernate-orm-benchmark#13. The code itself is very simple, still I'd really appreciate if you could confirm whether my approach is correct.

@yrodiere you can find my initial findings in terms of performance numbers and flamegraphs in the PR I linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants