Simple example of querying a parquet file located in a public bucket …

…on Google Storage (#68)
tableau · Jul 22, 2022 · a0935cb · a0935cb
1 parent 7dcaac3
commit a0935cb
Show file tree

Hide file tree

Showing 4 changed files with 97 additions and 1 deletion.
diff --git a/Community-Supported/README.md b/Community-Supported/README.md
@@ -35,6 +35,9 @@ The community samples focus on individual use cases and are Python-only. They ha
 
     It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api). 
 
+- [__s3-compatible-services__](https://github.com/aetperf/hyper-api-samples/tree/main/Community-Supported/s3-compatible-services)
+  - Demonstrates how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.
+
 </br>
 </br>
 

diff --git a/Community-Supported/native-s3/README.md b/Community-Supported/native-s3/README.md
@@ -69,4 +69,4 @@ Check out these resources to learn more:
 
 - [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)
 
-- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
+- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
diff --git a/Community-Supported/s3-compatible-services/README.md b/Community-Supported/s3-compatible-services/README.md
@@ -0,0 +1,52 @@
+# parquet-to-hyper
+
+## __parquet_to_hyper__
+
+![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
+
+__Current Version__: 1.0
+
+These samples show you how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.
+
+# Get started
+
+## __Prerequisites__
+
+To run the script, you will need:
+
+- a computer running Windows, macOS, or Linux
+
+- Python 3.9+
+
+- install the dependencies from the `requirements.txt` file
+
+## Run the samples
+
+The following instructions assume that you have set up a virtual environment for Python. For more information on
+creating virtual environments, see [venv - Creation of virtual environments](https://docs.python.org/3/library/venv.html)
+in the Python Standard Library.
+
+1. Open a terminal and activate the Python virtual environment (`venv`).
+
+1. Navigate to the folder where you installed the samples.
+
+1. Then follow the steps to run one of the samples which are shown below.
+
+**Live query against a `.parquet` file which is stored on Google Storage**
+
+Run the Python script
+
+```bash
+$ python query-parquet-on-gs.py 
+```
+This script will perform a live query on the Parquet file which is stored in this public Google Storage bucket: `gs://cloud-samples-data/bigquery/us-states/us-states.parquet`.
+
+## __Resources__
+Check out these resources to learn more:
+
+- [Hyper API docs](https://help.tableau.com/current/api/hyper_api/en-us/index.html)
+
+- [Tableau Hyper API Reference (Python)](https://help.tableau.com/current/api/hyper_api/en-us/reference/py/index.html)
+
+- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)
+
diff --git a/Community-Supported/s3-compatible-services/query-parquet-on-gs.py b/Community-Supported/s3-compatible-services/query-parquet-on-gs.py
@@ -0,0 +1,41 @@
+"""Connect to Google Storage and query a parquet file located in a public bucket.
+
+Adapted from hyper-api-samples/Community-Supported/native-s3/query-csv-on-s3.py
+"""
+
+from tableauhyperapi import Connection, HyperProcess, Telemetry, escape_string_literal
+
+BUCKET_NAME = "cloud-samples-data"
+FILE_PATH = "bigquery/us-states/us-states.parquet"
+
+states_dataset_gs = escape_string_literal(
+    f"s3://{BUCKET_NAME.strip('/')}/{FILE_PATH.strip('/')}"
+)
+
+# Hyper Process parameters
+parameters = {}
+# We need to manually enable S3 connectivity as this is still an experimental feature
+parameters["experimental_external_s3"] = "true"
+# endpoint URL
+parameters["external_s3_hostname"] = "storage.googleapis.com"
+# We do not need to specify credentials and bucket location as the GS bucket is
+# publicly accessible; this may be different when used with your own data
+
+with HyperProcess(
+    telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
+    parameters=parameters,
+) as hyper:
+    # Create a connection to the Hyper process - we do not connect to a database
+    with Connection(
+        endpoint=hyper.endpoint,
+    ) as connection:
+
+        # Use the SELECT FROM EXTERNAL(S3_LOCATION()) syntax - this allows us to use
+        # the parquet file like a normal table name in SQL queries
+        sql_query = (
+            f"""SELECT COUNT(*) FROM EXTERNAL(S3_LOCATION({states_dataset_gs}))"""
+        )
+
+        # Execute the query with `execute_scalar_query` as we expect a single number
+        count = connection.execute_scalar_query(sql_query)
+        print(f"number of rows : {count}")