-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Glue Catalog for Iceberg ingest extension #17392
base: master
Are you sure you want to change the base?
Conversation
private Catalog setupGlueCatalog() { | ||
catalog = new GlueCatalog(); | ||
catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION, warehousePath); | ||
catalog.initialize(CATALOG_NAME, catalogProperties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catalog properties must have these key value pairs
"type" : "glue",
"catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
"io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warehouse path must be s3://bucket/path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS related env variables must be available where druid cluster is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS related env variables must be available where druid cluster is running.
Could we add more information related to this in the docs specific to the glue catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will do that. Recently figured out that there is simpler approach in iceberg API itself to choose the catalog. I am spending sometime to check if that would drastically make it modular & work for all available iceberg catalog support on the fly.
While testing I find error:
Please let me know if anyone have faced similar error message, it is related to not able to find IcebergInputSource from the iceberg extension as subtype for input source. |
@shekhar-rajak Thank you for working on this! |
Thanks! I found that there was already
After adding into the existing list. I am able to run it. |
I reallise lib folder not copyting the jars from the druid-iceberg-extension/lib which is needed at runtime . When I copied those jar then GlueCatalog was detected and able to run load iceberg table |
We need to have integration testing for glue catalog. That need a separate discussion and test pipeline. |
<version>${iceberg.core.version}</version> | ||
</dependency> | ||
<!-- GlueCatalog class--> | ||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shekhar-rajak Catalog changes look good to me. |
Update the doc and PR as per the review comment. |
</goals> | ||
<configuration> | ||
<failOnWarning>true</failOnWarning> | ||
<!-- ignore annotations for "unused but declared" warnings --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These dependencies required at compile time.
We need to ignore warnings:
Warning: Unused declared dependencies found:
Warning: org.apache.iceberg:iceberg-aws:jar:1.6.1:compile
Warning: software.amazon.awssdk:glue:jar:2.28.28:compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any compile time usages for these dependencies. Can we try setting runtime
scope for these dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually these dependency is needed at compile time otherwise we will have errors for testcases :
Error: org.apache.druid.iceberg.input.GlueIcebergCatalogTest.testCatalogCreate -- Time elapsed: 0.001 s <<< ERROR!
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: software/amazon/awssdk/services/sts/model/Tag
at org.apache.iceberg.aws.AwsProperties.toStsTags(AwsProperties.java:412)
at org.apache.iceberg.aws.AwsProperties.<init>(AwsProperties.java:264)
at org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.initialize(AwsClientFactories.java:151)
at org.apache.iceberg.aws.AwsClientFactories.loadClientFactory(AwsClientFactories.java:88)
at org.apache.iceberg.aws.AwsClientFactories.from(AwsClientFactories.java:61)
at org.apache.iceberg.aws.glue.GlueCatalog.initialize(GlueCatalog.java:141)
Fixes #17352.
Description
Release note
Key changed/added classes in this PR
GlueIcebergCatalog