Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7430] Fix empty schema issue for compactor #10718

Closed

Conversation

linliu-code
Copy link
Contributor

Change Logs

The input schema string is empty.
We try to get it form schema file or instants.

Impact

Fixed the bug.

Risk level (write none, low medium or high below)

None.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

try (SparkRDDWriteClient client =
UtilHelpers.createHoodieClient(jsc, cfg.basePath, "", cfg.parallelism, Option.of(cfg.strategyClassName), props)) {
UtilHelpers.createHoodieClient(jsc, cfg.basePath, getSchema(), cfg.parallelism, Option.of(cfg.strategyClassName), props)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail with empty schema string? If so, could you construct a test case that fails without the fix and passes with the fix?

Copy link
Contributor Author

@linliu-code linliu-code Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonvex @yihua, to trigger the failures, I probably need to write a functional test for that since I need to use "hoodie.schema.add.field.ids = true". CC @nsivabalan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonvex, do you know how to trigger the failures by setting "hoodie.schema.add.field.ids = true"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, a test implementation of HoodieAvroWriteSupport needs to be added, that does schema parsing to reproduce the failure. This is related to the custom write support as a Hudi extension in XTable: https://github.com/apache/incubator-xtable/blob/main/hudi-support/extensions/README.md.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soft approval from me. once you address ethan's feedback, we are good to land

@linliu-code linliu-code force-pushed the HUDI-7430_fix_empty_schema branch 2 times, most recently from 892125e to 82ab336 Compare February 24, 2024 00:30
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Feb 26, 2024
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@linliu-code
Copy link
Contributor Author

@yihua, @jonvex , "hoodie.schema.add.field.ids" seems not a valid config. Do you know if we should keep this PR?

@nsivabalan
Copy link
Contributor

we can close it then.

@nsivabalan nsivabalan closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants