-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fastserde] Logical types are now supported for avro version 1.9+ #516
[fastserde] Logical types are now supported for avro version 1.9+ #516
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #516 +/- ##
============================================
- Coverage 45.83% 45.77% -0.06%
+ Complexity 4440 4438 -2
============================================
Files 398 403 +5
Lines 28040 28070 +30
Branches 4622 4622
============================================
- Hits 12851 12850 -1
- Misses 13634 13663 +29
- Partials 1555 1557 +2
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the big code drop! The commit breakdown definitely helps navigate the change more easily. It LGTM overall. I left some minor comments and questions.
...de-tests-common/src/test/java/com/linkedin/avro/fastserde/logical/types/InMemoryEncoder.java
Show resolved
Hide resolved
...ommon/src/test/java/com/linkedin/avro/fastserde/logical/types/LogicalTypesFastSerdeTest.java
Show resolved
Hide resolved
} | ||
|
||
protected <T extends GenericData> void fixConversionsIfAvro19(T modelData) { | ||
if (AvroCompatibilityHelperCommon.getRuntimeAvroVersion() == AvroVersion.AVRO_1_9) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this issue specific to 1.9 or should the code be run in later versions as well? Should this condition be AvroCompatibilityHelperCommon.getRuntimeAvroVersion().laterThan(AvroVersion.AVRO_1_8)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I revisited this method and new TimeConversions.TimeMillisConversion()
is not needed (will be removed in fix-commit).
Other 3 conversions
new Conversions.DecimalConversion()
new TimeConversions.TimeMicrosConversion()
new TimeConversions.TimestampMicrosConversion()
must be manually added because they're missing. They're actually "lost" when downgrading avro from 1.10.2 to 1.10.1:
- krisso-rtb/avro-logical-types-generated-classes@e271f7d
but in avro-util/fastserde we observe the difference between1.10.2
and1.9.2
.
fastserde/avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastSerdeCache.java
Show resolved
Hide resolved
...erde/avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastDeserializerGenerator.java
Show resolved
Hide resolved
@@ -444,6 +464,61 @@ public JClass findStringClass(Schema schema) { | |||
return outputClass; | |||
} | |||
|
|||
boolean logicalTypeEnabled(Schema schema) { | |||
// return modelData != null && schema != null && Utils.isLogicalTypeSupported() && schema.getLogicalType() != null; | |||
// TODO uuid in 1.10 no supported ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, weird... I see UUIDs mentioned in the spec as far back as 1.9... but maybe in the actual implementation it's not properly supported until 1.11? IDK... this is the kind of minute detail @radai-rosenblatt would know :D ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you reminded me something.
I did an experiment to see how generated Avro classes change when avro version is downgraded:
- https://github.com/krisso-rtb/avro-logical-types-generated-classes/commits/master
This particular case withUUID
is covered by commitAvro version downgraded from 1.11.1 to 1.11.0
- krisso-rtb/avro-logical-types-generated-classes@81e0be0
Basically we have:
private java.util.UUID uuidField; // avro 1.11.1
private java.lang.String uuidField; // avro 1.11.0
from the same .avsc
file.
(just take a look at diff for FastSerdeLogicalTypesTest1.java
)
...lization/AVRO_1_10/FastSerdeLogicalTypesDefined_GenericDeserializer_229156053_229156053.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for simplifying the code in the latest commit.
I have a high-level question about the strategy for this work...
Given that which supported logical type varies by Avro version (even by patch versions, unfortunately...), I wonder what is the ideal way to approach this?
I believe the approach you took in this PR is to take all of the logical types supported in the latest version of Avro, and if the runtime is Avro 1.9 or later, then use all of these logical types (I assume this is the case because of the missingConversions
that you are registering into the modelData
...). First of all, is this the right understanding of the current implementation? If it is, then I have some concerns, namely:
- If using a SpecificRecord class that was generated in a version of Avro older than the latest, which is missing some of the logical types, then would we not have a class cast exception when trying to put an object of the latest logical type into a field that was generated to not have such type?
- If using a GenericRecord, in theory we can put anything in any fields since it's just an array of Objects internally... however, during the period of time when fast-avro is "warming up" (i.e. before the serde compilation finishes), then we'll be falling back on vanilla avro... so does that mean the user will see older types for a brief period of time and later on see the latest logical types?
Both of the above behaviors seem wrong... so I wonder if we should instead do some runtime introspection to figure out what the right type to use is, at the time of the serde code gen, and then use the right type accordingly in the generated code. For a SpecificRecord
serde, the right type could be figured out by using reflection to introspect the fields of the SpecificRecord
class. For GenericRecord
, perhaps we could synthetically generate a payload based on the schema, deserialize it with vanilla Avro, and introspect what type each field is...
It is troublesome, but I think it would avoid the issues mentioned above.
Alternatively, we could say that fast-avro only supports logical types on the most recent of all Avro versions, and for that version it supports all the types... This would be roughly equivalent to the current PR, but we would just bump the minimum Avro version to activate the logical type behavior... Easier to implement, though technically still incorrect in terms of Avro 1.9 and 1.10, but at least it's easier to explain to users what to expect.
Maybe I'm twisting myself into knots over this and it need not be this complicated... IDK. What do you think?
...de-tests-common/src/test/java/com/linkedin/avro/fastserde/logical/types/InMemoryEncoder.java
Show resolved
Hide resolved
so they use the same conversion classes. It's implicitly tested but just in case I will explicitly add these classes to
The hacks/tricks in unit tests, represented by
|
62d55fe
to
07ec099
Compare
Tests fail e.g. due to: java.lang.ClassCastException: class java.time.LocalTime cannot be cast to class java.lang.Integer
07ec099
to
995dd6d
Compare
I pushed some updates:
We ran quite a few benchmarks to see how fast-avro works with logical types. If we could somehow help with this PR please let us know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continuing to push on this. I discussed it offline with @radai-rosenblatt, and we have a minor concern, but I don't think it should be a blocker given that this new capability is opt-in.
The concern is that there is maybe too much trust that the users will do the right thing. In our internal ecosystem, we usually cannot make assumptions about which version of Avro a SR was generated from and need to be able to deal with all of them. Furthermore, some SR may be embedded in libraries and mixed with other libraries or main application code and each of these sources of code may have been touched by different versions of Avro. We think there may be ways to make the work begun here even more robust by introspecting types at serde code gen time, rather than relying on the user to pass in the right mappings in their modelData
... but I don't think that should block this from being merged, since as I said, it is opt-in.
It is great to see support for logical types finally added 😄 🎉 ...
The goal of this PR is to allow
fastserde
(de-)serializing records with logical-types.I realize it's a quite big change so we did our best to verify it works fine... and no new bugs are added :)
Commits guide:
Adding unit tests to verify LogicalTypes support.
- TDD approach, and failing tests are marked with@Ignored
minor code cleanup
- self explanatoryGenerated classes - BEFORE
- temporarily adding all generated classes before the main changesmodelData (GenericData or SpecificData instance) passed to FastSerdeBase
- should be part of the main commit (5.) however it's extracted as separate commit to make the main commit cleaner[fast-avro] Logical types are now supported by fast-avro in version 1.9+
- the heart of this PR, tests from (1.) are unignored.Generated fast-avro classes AFTER (logical-types related).
- temporary commit to see how main commit affected generated classesGenerated fast-avro classes AFTER (side effects of supporting logical-types).
- same as above, could be considered as "side effects"Generated classes moved back to .gitignore
- drops changes from all temporary commits (3, 6 and 7)Due to limitations of current Gradle's config it's impossible to generate Avro classes with custom logicalTypes. To verify this PR works fine in that case as well I created Unit Test in external (dedicated) project:
which uses dependency built locally from this PR's branch: