-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reordering avro nullable types in MergeHiveSchemaWithAvro #140
base: li-0.11.x
Are you sure you want to change the base?
Conversation
cb095b2
to
d58b362
Compare
LGTM, thanks for the PR @rzhang10. |
@@ -138,6 +138,15 @@ public static boolean isOptionSchema(Schema schema) { | |||
return false; | |||
} | |||
|
|||
public static int getNullIndex(Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullIndexForOptionSchema
Iceberg does not use get
in method names. nulls can also be present in complex schemas, so ensure that we add the OptionSchema
constraint in the method name itself.
if (AvroSchemaUtil.isOptionSchema(schema)) { | ||
int nullIndex = AvroSchemaUtil.getNullIndex(schema); | ||
if (defaultValue != null && !defaultValue.equals(JsonProperties.NULL_VALUE)) { | ||
return Schema.createUnion(schema.getTypes().get(1 - nullIndex), Schema.create(Type.NULL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rzhang10 , I am having a hard time reasoning about these changes. Can you please include a description of how these changes help fix the issue. Can you also update the Java doc of this method with the new changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I understand why this is needed. Previously reorderOptionIfRequired
only worked correctly when schema
was an option schema who first element was null. This was fine because we never returned an option with null as the second element from the children visitors, but now we do. I think the description needs to be updated to reflect the new behavior.
int nullIndex = 0; | ||
if (partner != null && AvroSchemaUtil.isOptionSchema(partner)) { | ||
nullIndex = AvroSchemaUtil.getNullIndex(partner); | ||
} | ||
Schema hivePrimitive = hivePrimitiveToAvro(primitive); | ||
// if there was no matching Avro primitive, use the Hive primitive | ||
Schema result = partner == null ? hivePrimitive : checkCompatibilityAndPromote(hivePrimitive, partner); | ||
return shouldResultBeOptional ? AvroSchemaUtil.toOption(result) : result; | ||
return shouldResultBeOptional ? AvroSchemaUtil.toOption(result, nullIndex == 1) : result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handling is incomplete, it only handles cases for primitives inside containers (e.g. list), but does not handle similar cases for nested lists, maps, structs, etc. E.g. this case will fail
@Test
public void shouldNotReorderListElementType1() {
String hive = "struct<fa:array<array<int>>>";
Schema avro =
SchemaBuilder.record("r1")
.fields()
.name("fa")
.type()
.nullable()
.array()
.items()
.nullable()
.array()
.items()
.nullable()
.intType()
.arrayDefault(Collections.singletonList(Arrays.asList(1, 2, 3)))
.endRecord();
assertSchema(avro, merge(hive, avro));
}
I think we need similar handling in other places where we use the shouldResultBeOptional
pattern, e.g. in the struct
, list
and map
methods. Might be good to refactor the common pattern into a separate method for reuse.
This patch fixes a bug where the avro nullable type orders (the order of null and the other type's order) is wrongly flipped, resulting in read failure.
The fix is demonstrated by the new unit test
shouldNotReorderListElementType