Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Glue Catalog for Iceberg ingest extension #17392

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

Shekharrajak
Copy link

@Shekharrajak Shekharrajak commented Oct 22, 2024

Fixes #17352.

Description

Release note


Key changed/added classes in this PR
  • GlueIcebergCatalog

private Catalog setupGlueCatalog() {
catalog = new GlueCatalog();
catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION, warehousePath);
catalog.initialize(CATALOG_NAME, catalogProperties);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog properties must have these key value pairs

                "type" : "glue",
           	"catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
           	"io-impl": "org.apache.iceberg.aws.s3.S3FileIO",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse path must be s3://bucket/path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Could we add more information related to this in the docs specific to the glue catalog?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that. Recently figured out that there is simpler approach in iceberg API itself to choose the catalog. I am spending sometime to check if that would drastically make it modular & work for all available iceberg catalog support on the fly.

@Shekharrajak
Copy link
Author

While testing I find error:

Invalid value for the field [inputSource]. Reason: [Please make sure to load all the necessary extensions and jars with type 'iceberg' on 'druid/broker' service. Could not resolve type id 'iceberg' as a subtype of `org.apache.druid.data.input.InputSource` known type ids = [combining, hdfs, http, inline, local, nil, sql] at [Source: (String)"{"type":"iceberg","tableName": "

Please let me know if anyone have faced similar error message, it is related to not able to find IcebergInputSource from the iceberg extension as subtype for input source.

@a2l007
Copy link
Contributor

a2l007 commented Oct 23, 2024

@shekhar-rajak Thank you for working on this!
Please add the extension to the broker load list, which should fix the error described.

@Shekharrajak
Copy link
Author

Please add the extension to the broker load list, which should fix the error described.

Thanks! I found that there was already druid.extensions.loadList in common.runtime.properties file and it was overriding the below line that I added :

druid.extensions.loadList=["druid-iceberg-extensions"]

After adding into the existing list. I am able to run it.

@Shekharrajak
Copy link
Author

I reallise lib folder not copyting the jars from the druid-iceberg-extension/lib which is needed at runtime . When I copied those jar then GlueCatalog was detected and able to run load iceberg table

@Shekharrajak
Copy link
Author

Shekharrajak commented Oct 25, 2024

We need to have integration testing for glue catalog. That need a separate discussion and test pipeline.

<version>${iceberg.core.version}</version>
</dependency>
<!-- GlueCatalog class-->
<dependency>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a2l007
Copy link
Contributor

a2l007 commented Oct 29, 2024

@shekhar-rajak Catalog changes look good to me.
Do you mind adding some docs in https://github.com/apache/druid/blob/master/docs/ingestion/input-sources.md as well? Also please review and fix the CI failures.

@Shekharrajak
Copy link
Author

Update the doc and PR as per the review comment.

</goals>
<configuration>
<failOnWarning>true</failOnWarning>
<!-- ignore annotations for "unused but declared" warnings -->
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dependencies required at compile time.

We need to ignore warnings:

Warning:  Unused declared dependencies found:
Warning:     org.apache.iceberg:iceberg-aws:jar:1.6.1:compile
Warning:     software.amazon.awssdk:glue:jar:2.28.28:compile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any compile time usages for these dependencies. Can we try setting runtime scope for these dependencies?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these dependency is needed at compile time otherwise we will have errors for testcases :

Error:  org.apache.druid.iceberg.input.GlueIcebergCatalogTest.testCatalogCreate -- Time elapsed: 0.001 s <<< ERROR!
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: software/amazon/awssdk/services/sts/model/Tag
	at org.apache.iceberg.aws.AwsProperties.toStsTags(AwsProperties.java:412)
	at org.apache.iceberg.aws.AwsProperties.<init>(AwsProperties.java:264)
	at org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.initialize(AwsClientFactories.java:151)
	at org.apache.iceberg.aws.AwsClientFactories.loadClientFactory(AwsClientFactories.java:88)
	at org.apache.iceberg.aws.AwsClientFactories.from(AwsClientFactories.java:61)
	at org.apache.iceberg.aws.glue.GlueCatalog.initialize(GlueCatalog.java:141)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Glue Catalog for Iceberg ingest extension
3 participants