Skip to content

Commit

Permalink
Adds a sample demonstrating how to implement an incremental refresh b…
Browse files Browse the repository at this point in the history
…ased on the Hyper API and Hyper Update REST API. (#64)

* Adds an incremental refresh sample based on the Hyper Update REST API.

Adds a sample demonstrating how to implement an incremental refresh based on the Hyper API and the Hyper Update REST API. The sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022.

* Minor: Added argparser to pass in arguments and minor rephrasing of the README file.

* Added the OpenSkyAPI to the requirements.txt file and removed the instructions to manually install it.
  • Loading branch information
jonas-eckhardt authored Jun 21, 2022
1 parent ecf38e2 commit 9ad9c0d
Show file tree
Hide file tree
Showing 5 changed files with 181 additions and 1 deletion.
3 changes: 3 additions & 0 deletions Community-Supported/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ The community samples focus on individual use cases and are Python-only. They ha
- Demonstrates the full end-to-end workflow of how to create a multi-table `.hyper` file, place the extract into a `.tdsx`, and publish to Tableau Online or Server.
- [__s3-to-hyper__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/s3-to-hyper)
- Demonstrates how to create a `.hyper` file from a wildcard union on text files held in an AWS S3 bucket. The extract is then placed in a `.tdsx` file and published to Tableau Online or Server.
- [__flights-data-incremental-refresh__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/flights-data-incremental-refresh)
- This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)).

It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api).

</br>
</br>
Expand Down
51 changes: 51 additions & 0 deletions Community-Supported/flights-data-incremental-refresh/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# flights-data-incremental-refresh
## __Incremental Refresh using the OpenSkyApi__

![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)

This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)).

This script pulls down flights data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api), creates a hyper database with this data and uses the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to implement an incremental refresh on your Tableau Server/Cloud. The first time this script is executed, the database file is simply published.

# Get started

## __Prerequisites__
To run the script, you will need:
- Windows, Linux, or Mac
- Python 3
- Run `pip install -r requirements.txt`
- Tableau Server Credentials, see below.

## Tableau Server Credentials
To run this sample with your Tableau Server/Cloud, you first need to get the following information:
- Tableau Server Url, e.g. 'https://us-west-2a.online.tableau.com'
- Site name, e.g., use 'default' for your default site (note that you cannot use 'default' in Tableau Cloud but must use the site name)
- Project name, e.g., use an empty string ('') for your default project
- [Token Name and Token Value](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm)

Ensure that you have installed the requirements and then just run the sample Python file with the information from above. The syntax for running the script is:

**python flights-data-incremental-refresh.py [-h] server_url site_name project_name token_name token_value**

# Incremental Refresh using the OpenSkyApi
The script consists of two parts: first it creates a Hyper database with flights data and then publishes the database to Tableau Server/Cloud.

## Create a database with flights data
The `create_hyper_database_with_flights_data` method creates an instance of the `OpenSkyAPI` and then pulls down states within a specific bounding box. This example just uses a subset of the available data as we are using the free version of the OpenSkyApi.

Then, a Hyper database is created with a table with name `TableName("public", "flights")`. Finally, an inserter is used to insert the flights data.

## Publish the hyper database to Tableau Server / Cloud
The `publish_to_server` method first signs into Tableau Server / Cloud. Then, it finds the respective project to which the database should be published to.

There are two cases for publishing the database to Server:
- No datasource with name `datasource_name_on_server` exists on Tableau Server. In this case, the script simply creates the initial datasource on Tableau server. This datasource is needed for the subsequent incremental refreshes as the data will be added to this datasource.
- The datasource with name `datasource_name_on_server` already exists on Tableau Server. In this case, the script uses the Hyper Update REST API to insert the data from the database into the respective table in the datasource on Tableau Server/Cloud.

## __Resources__
Check out these resources to learn more:
- [Hyper API documentation](https://help.tableau.com/current/api/hyper_api/en-us/index.html)
- [Hyper Update API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm)
- [Tableau Server Client Docs](https://tableau.github.io/server-client-python/docs/)
- [REST API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm)
- [Tableau Tools](https://github.com/bryantbhowell/tableau_tools)
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, TableName, CreateMode, SqlType, Nullability, Inserter
from opensky_api import OpenSkyApi
import tableauserverclient as TSC
import uuid
import argparse

def create_hyper_database_with_flights_data(database_path):
"""
Leverages the OpenSkyAPI (https://github.com/openskynetwork/opensky-api) to create a
Hyper database with flights data.
"""
# Create an instance of the opensky api to retrieve data from OpenSky via HTTP.
opensky = OpenSkyApi()
# Get the most recent state vector. Note that we can only call this method every
# 10 seconds as we are using the free version of the API.
states = opensky.get_states(bbox=(45.8389, 47.8229, 5.9962, 10.5226))

# Start up a local Hyper process.
with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
# Create a connection to the Hyper process and connect to a hyper file
# (create the file and replace if it exists).
with Connection(endpoint=hyper.endpoint, database=database_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
# Create a table definition with table name "flights" in the "public" schema
# and columns for airport data.
table_definition = TableDefinition(
table_name=TableName("public", "flights"),
columns=[
TableDefinition.Column('baro_altitude', SqlType.double(), Nullability.NULLABLE),
TableDefinition.Column('callsign', SqlType.text(), Nullability.NOT_NULLABLE),
TableDefinition.Column('latitude', SqlType.double(), Nullability.NULLABLE),
TableDefinition.Column('longitude', SqlType.double(), Nullability.NULLABLE),
TableDefinition.Column('on_ground', SqlType.bool(), Nullability.NOT_NULLABLE),
TableDefinition.Column('origin_country', SqlType.text(), Nullability.NOT_NULLABLE),
TableDefinition.Column('time_position', SqlType.int(), Nullability.NULLABLE),
TableDefinition.Column('velocity', SqlType.double(), Nullability.NULLABLE),
])
# Create the flights table.
connection.catalog.create_table(table_definition)

# Insert each of the states into the table.
with Inserter(connection, table_definition) as inserter:
for s in states.states:
inserter.add_row([s.baro_altitude, s.callsign, s.latitude, s.longitude, s.on_ground, s.origin_country, s.time_position, s.velocity])
inserter.execute()

num_flights = connection.execute_scalar_query(query=f"SELECT COUNT(*) from {table_definition.table_name}")
print(f"Inserted {num_flights} flights into {database_path}.")

def publish_to_server(server_url, tableau_auth, project_name, database_path, datasource_name_on_server):
"""
Creates the datasource on Tableau Server if it has not yet been created. Otherwise, uses the
Hyper Update REST API (https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to append the data to the datasource.
"""
# Create a tableuserverclient object to interact with Tableau Server.
server = TSC.Server(server_url, use_server_version=True)
# Sign into Tableau Server with the above authentication information.
with server.auth.sign_in(tableau_auth):
# Get project_id from project_name.
matching_projects = server.projects.filter(name=project_name)
project_id = next((project.id for project in matching_projects if project.name == project_name), None)
if project_id is None:
print(f"Publish failed. The specified project '{project_name}' does not exist.")
exit()

# Get the datasource from Server (if it exists).
matching_datasources = server.datasources.filter(name=datasource_name_on_server)
datasource = next((ds for ds in matching_datasources), None)

if datasource is None:
# If the datasource does not exist on server, publish the datasource.
publish_mode = TSC.Server.PublishMode.CreateNew
datasource = TSC.DatasourceItem(project_id)
# Set the name of the datasource such that it can be easily identified.
datasource.name = datasource_name_on_server
datasource = server.datasources.publish(datasource, database_path, publish_mode)
print(f"New datasource published: (id : {datasource.id}, name: {datasource.name}).")
else:
# If the datasource already exists on Tableau Server, use the Hyper Update REST API
# to send the delta to Tableau Server and insert the data into the respective table
# in the datasource.

# Create a new random request id.
request_id = str(uuid.uuid4())

# Create one action that inserts from the new table into the existing table.
# For more details, see https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm#action-batch-descriptions
actions = [
{
"action": "insert",
"source-schema": "public",
"source-table": "flights",
"target-schema": "public",
"target-table": "flights",
}
]

# Start the update job on Server.
job = server.datasources.update_hyper_data(datasource.id, request_id=request_id, actions=actions, payload=database_path)
print(f"Update job posted (ID: {job.id}). Waiting for the job to complete...")

# Wait for the job to finish.
job = server.jobs.wait_for_job(job)
print("Job finished successfully")


if __name__ == '__main__':
argparser = argparse.ArgumentParser(description="Incremental refresh with flights data.")
argparser.add_argument("server_url", help="The url of Tableau Server / Cloud, e.g. 'https://us-west-2a.online.tableau.com'")
argparser.add_argument("site_name", help="The name of your site, e.g., use 'default' for your default site. Note that you cannot use 'default' in Tableau Cloud but must use the site name.", default='default')
argparser.add_argument("project_name", help="The name of your project, e.g., use an empty string ('') for your default project.", default="")
argparser.add_argument("token_name", help="The name of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm")
argparser.add_argument("token_value", help="The value of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm")
args = argparser.parse_args()

# First create the hyper database with flights data.
database_path = "flights.hyper"
create_hyper_database_with_flights_data(database_path)

# Then publish the data to server.
datasource_name_on_server = 'flights_data_set'
# Create credentials to sign into Tableau Server.
tableau_auth = TSC.PersonalAccessTokenAuth(args.token_name, args.token_value, args.site_name)
publish_to_server(args.server_url, tableau_auth, args.project_name, database_path, datasource_name_on_server)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
tableauhyperapi>=0.0.14946
tableauserverclient>=0.19.0
https://github.com/openskynetwork/opensky-api/archive/master.zip#subdirectory=python
2 changes: 1 addition & 1 deletion Community-Supported/hyper-to-csv/hyper-to-csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
TableName, \
HyperException

# An example of how to turn a .hyper file into a csv to fit within potiential ETL workflows.
# An example of how to turn a .hyper file into a csv to fit within potential ETL workflows.

"""
Note: you need to follow the pantab documentation to make sure columns line up with the
Expand Down

0 comments on commit 9ad9c0d

Please sign in to comment.