From 0def8c2cf1057a9e65903f769210365a6c95d231 Mon Sep 17 00:00:00 2001 From: Jacek Laskowski Date: Sat, 7 Sep 2024 20:42:35 +0200 Subject: [PATCH] [SPARK] GcsVendedTokenProvider --- .../GcsVendedTokenProvider.md | 28 +++++++++++++++++++ docs/spark-integration/UCSingleCatalog.md | 7 +++++ 2 files changed, 35 insertions(+) create mode 100644 docs/spark-integration/GcsVendedTokenProvider.md diff --git a/docs/spark-integration/GcsVendedTokenProvider.md b/docs/spark-integration/GcsVendedTokenProvider.md new file mode 100644 index 0000000..101d6cd --- /dev/null +++ b/docs/spark-integration/GcsVendedTokenProvider.md @@ -0,0 +1,28 @@ +# GcsVendedTokenProvider + +`GcsVendedTokenProvider` is an `AccessTokenProvider` to provide access tokens for [UCSingleCatalog](UCSingleCatalog.md) to [load tables](UCSingleCatalog.md#loadTable) from [Google Cloud Storage](https://cloud.google.com/storage). + +`GcsVendedTokenProvider` can be configured in a Spark application using the following configuration: + +```text +spark.hadoop.fs.AbstractFileSystem.gs.impl com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS +spark.hadoop.fs.gs.auth.access.token.provider io.unitycatalog.connectors.spark.GcsVendedTokenProvider +spark.hadoop.fs.gs.auth.type ACCESS_TOKEN_PROVIDER +``` + +## getAccessToken { #getAccessToken } + +??? note "AccessTokenProvider" + + ```java + AccessToken getAccessToken() + ``` + + `getAccessToken` is part of the `AccessTokenProvider` abstraction. + +`getAccessToken` creates a new `AccessToken` based on the following Hadoop Configuration properties: + +Property | Hadoop Property +-|- + `token` | `fs.gs.auth.access.token.credential` + `expirationTime` | `fs.gs.auth.access.token.expiration` diff --git a/docs/spark-integration/UCSingleCatalog.md b/docs/spark-integration/UCSingleCatalog.md index d208fcf..1a439d7 100644 --- a/docs/spark-integration/UCSingleCatalog.md +++ b/docs/spark-integration/UCSingleCatalog.md @@ -2,6 +2,13 @@ `UCSingleCatalog` is a `TableCatalog` ([Spark SQL]({{ book.spark_sql }}/connector/catalog/TableCatalog/)). +`UCSingleCatalog` supports [loading tables](#loadTable) from the following cloud object storages: + +Cloud Object Storage | Scheme +-|- + [Amazon S3](https://aws.amazon.com/s3/) | `s3://` + [Google Cloud Storage](https://cloud.google.com/storage) | `gs://` + ## DeltaCatalog { #deltaCatalog } `UCSingleCatalog` creates a `DeltaCatalog` ([Delta Lake]({{ book.delta }}/DeltaCatalog)) when requested to [initialize](#initialize).