-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Simple example of querying a parquet file located in a public bucket …
…on Google Storage (#68)
- Loading branch information
1 parent
7dcaac3
commit a0935cb
Showing
4 changed files
with
97 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# parquet-to-hyper | ||
|
||
## __parquet_to_hyper__ | ||
|
||
![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg) | ||
|
||
__Current Version__: 1.0 | ||
|
||
These samples show you how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`. | ||
|
||
# Get started | ||
|
||
## __Prerequisites__ | ||
|
||
To run the script, you will need: | ||
|
||
- a computer running Windows, macOS, or Linux | ||
|
||
- Python 3.9+ | ||
|
||
- install the dependencies from the `requirements.txt` file | ||
|
||
## Run the samples | ||
|
||
The following instructions assume that you have set up a virtual environment for Python. For more information on | ||
creating virtual environments, see [venv - Creation of virtual environments](https://docs.python.org/3/library/venv.html) | ||
in the Python Standard Library. | ||
|
||
1. Open a terminal and activate the Python virtual environment (`venv`). | ||
|
||
1. Navigate to the folder where you installed the samples. | ||
|
||
1. Then follow the steps to run one of the samples which are shown below. | ||
|
||
**Live query against a `.parquet` file which is stored on Google Storage** | ||
|
||
Run the Python script | ||
|
||
```bash | ||
$ python query-parquet-on-gs.py | ||
``` | ||
This script will perform a live query on the Parquet file which is stored in this public Google Storage bucket: `gs://cloud-samples-data/bigquery/us-states/us-states.parquet`. | ||
|
||
## __Resources__ | ||
Check out these resources to learn more: | ||
|
||
- [Hyper API docs](https://help.tableau.com/current/api/hyper_api/en-us/index.html) | ||
|
||
- [Tableau Hyper API Reference (Python)](https://help.tableau.com/current/api/hyper_api/en-us/reference/py/index.html) | ||
|
||
- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL) | ||
|
41 changes: 41 additions & 0 deletions
41
Community-Supported/s3-compatible-services/query-parquet-on-gs.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
"""Connect to Google Storage and query a parquet file located in a public bucket. | ||
Adapted from hyper-api-samples/Community-Supported/native-s3/query-csv-on-s3.py | ||
""" | ||
|
||
from tableauhyperapi import Connection, HyperProcess, Telemetry, escape_string_literal | ||
|
||
BUCKET_NAME = "cloud-samples-data" | ||
FILE_PATH = "bigquery/us-states/us-states.parquet" | ||
|
||
states_dataset_gs = escape_string_literal( | ||
f"s3://{BUCKET_NAME.strip('/')}/{FILE_PATH.strip('/')}" | ||
) | ||
|
||
# Hyper Process parameters | ||
parameters = {} | ||
# We need to manually enable S3 connectivity as this is still an experimental feature | ||
parameters["experimental_external_s3"] = "true" | ||
# endpoint URL | ||
parameters["external_s3_hostname"] = "storage.googleapis.com" | ||
# We do not need to specify credentials and bucket location as the GS bucket is | ||
# publicly accessible; this may be different when used with your own data | ||
|
||
with HyperProcess( | ||
telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU, | ||
parameters=parameters, | ||
) as hyper: | ||
# Create a connection to the Hyper process - we do not connect to a database | ||
with Connection( | ||
endpoint=hyper.endpoint, | ||
) as connection: | ||
|
||
# Use the SELECT FROM EXTERNAL(S3_LOCATION()) syntax - this allows us to use | ||
# the parquet file like a normal table name in SQL queries | ||
sql_query = ( | ||
f"""SELECT COUNT(*) FROM EXTERNAL(S3_LOCATION({states_dataset_gs}))""" | ||
) | ||
|
||
# Execute the query with `execute_scalar_query` as we expect a single number | ||
count = connection.execute_scalar_query(sql_query) | ||
print(f"number of rows : {count}") |