-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema resolution slowing down encoding #117
Comments
Can you ensure it's not just the first one that is slow? Next ones should be a lot faster because of the cache? |
Hi, Yesterday I read https://medium.com/@saiharshavellanki/building-a-blazing-fast-kafka-ingestion-pipeline-in-rust-with-protobuf-1cdc2f768c5f by @saiharshavellanki . It says:
Could that be the issue ? |
Interesting, the sync module should also do some caching, but it might by before actual parsing the schema. |
Yes, I can ensure that the problem stays and I didn't consider the first event in the screenshots. |
I use the EasyAvroDecoder so I am using already the async implementation where the article claims that it is faster. |
@baltendo the article mentions about protobuf decoder, not the AvroDecoder |
It's been a while I wrote the code. I think there could be an additional option to bypass resolving. Which can be used if you are sure the schema you are using for producing the data is exactly the same as the schema in schema registry. Does that sound like somethign that makes sense and would give additional performance? |
Describe the bug
We heavily use Kafka, Avro and the Schema Registry with Java. I wanted to implement now a service in Rust. The service is running fine but producing a message is very slow and I found the schema resolution to be the slow part. I read about the schema resolution and I wonder why it is called during encoding. As far as I understood it is needed during decoding when the schema is different than the one used during encoding.
We are using a quite big schema with many records that are used multiple times so they become named references after the first definition. Unfortunately, I cannot just attach the schema. Its already mentioned in avro-rs that this path is slow:
To Reproduce
Steps to reproduce the behavior:
EasyAvroEncoder.encode_struct()
with a schema with many named referencesHere the .resolve() method is called and I don't understand why (see comment):
I tried to write a test. The child struct could be duplicated to get more named references:
Here is a screenshot of the running service from the IDE with some additional logs "Sending" and "Sent" around the
EasyAvroEncoder.encode_struct()
plus .await().Expected behavior
I expect it to be faster. When I remove all the data related to named references (because I have many nullable fields so its possible) then it is much faster. The following screenshot shows first sending of a big event with many named references and then a small event with no named references:
Options
The text was updated successfully, but these errors were encountered: