Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Non hive partition discovery and update. #212

Closed
penghuo opened this issue Jan 4, 2024 · 2 comments
Closed

[FEATURE] Non hive partition discovery and update. #212

penghuo opened this issue Jan 4, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@penghuo
Copy link
Collaborator

penghuo commented Jan 4, 2024

Is your feature request related to a problem?

  1. After create table, user need to run MSCK statement to refresh table partition. if new partition added, user need to refresh partition of table.
  2. spark data source table does not support non hive partition.

What solution would you like?

  1. user don't need to run MSCK or ALTER to refresh partition.
  2. support non hive partition.

What alternatives have you considered?
not yet.

Do you have any additional context?
n/a

@penghuo
Copy link
Collaborator Author

penghuo commented Jan 4, 2024

Glue crawler support discovery non-hive partition schema. Glue crawler detect /2022/01/01 as partition_0, partition_1 and partition_2, user can rename partition_0, partition_1 and partition_2 as year, month and day. More reading in https://aws.amazon.com/blogs/big-data/catalog-and-analyze-application-load-balancer-logs-more-efficiently-with-aws-glue-custom-classifiers-and-amazon-athena/.

@penghuo
Copy link
Collaborator Author

penghuo commented Feb 19, 2024

Solutions

  1. add recursiveFileLookup = true when creating table. recursiveFileLookup is used to recursively load files and it disables partition inferring.
  2. create skipping index of table. note: User can not add Partition columns to skipping index if table does not include partition columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants