Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AvroReader: Add union resolving for nested struct arrays #12686

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JonasDev1
Copy link

@JonasDev1 JonasDev1 commented Sep 30, 2024

Which issue does this PR close?

Resolves #12682

What changes are included in this PR?

Nullable values are passed as unions and this requires a resolving of the union to have the correct null_buffer size.

Are these changes tested?

Added a new test for this issue

@github-actions github-actions bot added the core Core DataFusion crate label Sep 30, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @JonasDev1 -- I started the CI and I left a comment.

@@ -1643,6 +1643,84 @@ mod test {
assert_batches_eq!(expected, &[batch]);
}

#[test]
fn test_avro_nullable_struct_array() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this test without the changes in this PR and it still passes 🤔 Thus I don't think it covers the code change

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs b/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
index 24bc4a61c..8735eac5f 100644
--- a/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
+++ b/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
@@ -573,7 +573,7 @@ impl<'a, R: Read> AvroArrowArrayReader<'a, R> {
                 // extract list values, with non-lists converted to Value::Null
                 let array_item_count = rows
                     .iter()
-                    .map(|row| match maybe_resolve_union(row) {
+                    .map(|row| match row {
                         Value::Array(values) => values.len(),
                         _ => 1,
                     })

And then I ran

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --lib  -p datafusion --all-features -- avro_to_arrow
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.12s
     Running unittests src/lib.rs (target/debug/deps/datafusion-3bc79b97e7a0ad1b)

running 16 tests
test datasource::avro_to_arrow::schema::test::test_invalid_avro_schema ... ok
test datasource::avro_to_arrow::schema::test::test_non_record_schema ... ok
test datasource::avro_to_arrow::schema::test::test_alias ... ok
test datasource::avro_to_arrow::schema::test::test_plain_types_schema ... ok
test datasource::avro_to_arrow::schema::test::test_external_props ... ok
test datasource::avro_to_arrow::schema::test::test_nested_schema ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_read_nested_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_time_avro_milliseconds ... ok
test datasource::avro_to_arrow::reader::tests::test_avro_basic ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_read_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_iterator ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_nullable_struct ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_nullable_struct_array ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_complex_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_deep_nullable_struct ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_complex_struct ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 729 filtered out; finished in 0.00s

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. Yes the test rather checks the happy path. I will add a test for the index error tomorrow and update the formating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index out of bounds error durring read of an Avro file
2 participants