Skip to content

Commit

Permalink
Simple example of querying a parquet file located in a public bucket …
Browse files Browse the repository at this point in the history
…on Google Storage (#68)
  • Loading branch information
djfrancesco authored Jul 22, 2022
1 parent 7dcaac3 commit a0935cb
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 1 deletion.
3 changes: 3 additions & 0 deletions Community-Supported/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ The community samples focus on individual use cases and are Python-only. They ha

It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api).

- [__s3-compatible-services__](https://github.com/aetperf/hyper-api-samples/tree/main/Community-Supported/s3-compatible-services)
- Demonstrates how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.

</br>
</br>

Expand Down
2 changes: 1 addition & 1 deletion Community-Supported/native-s3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,4 @@ Check out these resources to learn more:

- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)

- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
52 changes: 52 additions & 0 deletions Community-Supported/s3-compatible-services/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# parquet-to-hyper

## __parquet_to_hyper__

![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)

__Current Version__: 1.0

These samples show you how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.

# Get started

## __Prerequisites__

To run the script, you will need:

- a computer running Windows, macOS, or Linux

- Python 3.9+

- install the dependencies from the `requirements.txt` file

## Run the samples

The following instructions assume that you have set up a virtual environment for Python. For more information on
creating virtual environments, see [venv - Creation of virtual environments](https://docs.python.org/3/library/venv.html)
in the Python Standard Library.

1. Open a terminal and activate the Python virtual environment (`venv`).

1. Navigate to the folder where you installed the samples.

1. Then follow the steps to run one of the samples which are shown below.

**Live query against a `.parquet` file which is stored on Google Storage**

Run the Python script

```bash
$ python query-parquet-on-gs.py
```
This script will perform a live query on the Parquet file which is stored in this public Google Storage bucket: `gs://cloud-samples-data/bigquery/us-states/us-states.parquet`.

## __Resources__
Check out these resources to learn more:

- [Hyper API docs](https://help.tableau.com/current/api/hyper_api/en-us/index.html)

- [Tableau Hyper API Reference (Python)](https://help.tableau.com/current/api/hyper_api/en-us/reference/py/index.html)

- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)

41 changes: 41 additions & 0 deletions Community-Supported/s3-compatible-services/query-parquet-on-gs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""Connect to Google Storage and query a parquet file located in a public bucket.
Adapted from hyper-api-samples/Community-Supported/native-s3/query-csv-on-s3.py
"""

from tableauhyperapi import Connection, HyperProcess, Telemetry, escape_string_literal

BUCKET_NAME = "cloud-samples-data"
FILE_PATH = "bigquery/us-states/us-states.parquet"

states_dataset_gs = escape_string_literal(
f"s3://{BUCKET_NAME.strip('/')}/{FILE_PATH.strip('/')}"
)

# Hyper Process parameters
parameters = {}
# We need to manually enable S3 connectivity as this is still an experimental feature
parameters["experimental_external_s3"] = "true"
# endpoint URL
parameters["external_s3_hostname"] = "storage.googleapis.com"
# We do not need to specify credentials and bucket location as the GS bucket is
# publicly accessible; this may be different when used with your own data

with HyperProcess(
telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
parameters=parameters,
) as hyper:
# Create a connection to the Hyper process - we do not connect to a database
with Connection(
endpoint=hyper.endpoint,
) as connection:

# Use the SELECT FROM EXTERNAL(S3_LOCATION()) syntax - this allows us to use
# the parquet file like a normal table name in SQL queries
sql_query = (
f"""SELECT COUNT(*) FROM EXTERNAL(S3_LOCATION({states_dataset_gs}))"""
)

# Execute the query with `execute_scalar_query` as we expect a single number
count = connection.execute_scalar_query(sql_query)
print(f"number of rows : {count}")

0 comments on commit a0935cb

Please sign in to comment.