diff --git a/Community-Supported/README.md b/Community-Supported/README.md index 0b9b363..e48865e 100644 --- a/Community-Supported/README.md +++ b/Community-Supported/README.md @@ -30,7 +30,10 @@ The community samples focus on individual use cases and are Python-only. They ha - Demonstrates the full end-to-end workflow of how to create a multi-table `.hyper` file, place the extract into a `.tdsx`, and publish to Tableau Online or Server. - [__s3-to-hyper__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/s3-to-hyper) - Demonstrates how to create a `.hyper` file from a wildcard union on text files held in an AWS S3 bucket. The extract is then placed in a `.tdsx` file and published to Tableau Online or Server. +- [__flights-data-incremental-refresh__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/flights-data-incremental-refresh) + - This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)). + It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api).

diff --git a/Community-Supported/flights-data-incremental-refresh/README.md b/Community-Supported/flights-data-incremental-refresh/README.md new file mode 100644 index 0000000..30143cd --- /dev/null +++ b/Community-Supported/flights-data-incremental-refresh/README.md @@ -0,0 +1,51 @@ +# flights-data-incremental-refresh +## __Incremental Refresh using the OpenSkyApi__ + +![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg) + +This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)). + +This script pulls down flights data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api), creates a hyper database with this data and uses the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to implement an incremental refresh on your Tableau Server/Cloud. The first time this script is executed, the database file is simply published. + +# Get started + +## __Prerequisites__ +To run the script, you will need: +- Windows, Linux, or Mac +- Python 3 +- Run `pip install -r requirements.txt` +- Tableau Server Credentials, see below. + +## Tableau Server Credentials +To run this sample with your Tableau Server/Cloud, you first need to get the following information: +- Tableau Server Url, e.g. 'https://us-west-2a.online.tableau.com' +- Site name, e.g., use 'default' for your default site (note that you cannot use 'default' in Tableau Cloud but must use the site name) +- Project name, e.g., use an empty string ('') for your default project +- [Token Name and Token Value](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) + +Ensure that you have installed the requirements and then just run the sample Python file with the information from above. The syntax for running the script is: + + **python flights-data-incremental-refresh.py [-h] server_url site_name project_name token_name token_value** + +# Incremental Refresh using the OpenSkyApi +The script consists of two parts: first it creates a Hyper database with flights data and then publishes the database to Tableau Server/Cloud. + +## Create a database with flights data +The `create_hyper_database_with_flights_data` method creates an instance of the `OpenSkyAPI` and then pulls down states within a specific bounding box. This example just uses a subset of the available data as we are using the free version of the OpenSkyApi. + +Then, a Hyper database is created with a table with name `TableName("public", "flights")`. Finally, an inserter is used to insert the flights data. + +## Publish the hyper database to Tableau Server / Cloud +The `publish_to_server` method first signs into Tableau Server / Cloud. Then, it finds the respective project to which the database should be published to. + +There are two cases for publishing the database to Server: +- No datasource with name `datasource_name_on_server` exists on Tableau Server. In this case, the script simply creates the initial datasource on Tableau server. This datasource is needed for the subsequent incremental refreshes as the data will be added to this datasource. +- The datasource with name `datasource_name_on_server` already exists on Tableau Server. In this case, the script uses the Hyper Update REST API to insert the data from the database into the respective table in the datasource on Tableau Server/Cloud. + +## __Resources__ +Check out these resources to learn more: +- [Hyper API documentation](https://help.tableau.com/current/api/hyper_api/en-us/index.html) +- [Hyper Update API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) +- [Tableau Server Client Docs](https://tableau.github.io/server-client-python/docs/) +- [REST API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm) +- [Tableau Tools](https://github.com/bryantbhowell/tableau_tools) \ No newline at end of file diff --git a/Community-Supported/flights-data-incremental-refresh/flights-data-incremental-refresh.py b/Community-Supported/flights-data-incremental-refresh/flights-data-incremental-refresh.py new file mode 100644 index 0000000..22cee02 --- /dev/null +++ b/Community-Supported/flights-data-incremental-refresh/flights-data-incremental-refresh.py @@ -0,0 +1,123 @@ +from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, TableName, CreateMode, SqlType, Nullability, Inserter +from opensky_api import OpenSkyApi +import tableauserverclient as TSC +import uuid +import argparse + +def create_hyper_database_with_flights_data(database_path): + """ + Leverages the OpenSkyAPI (https://github.com/openskynetwork/opensky-api) to create a + Hyper database with flights data. + """ + # Create an instance of the opensky api to retrieve data from OpenSky via HTTP. + opensky = OpenSkyApi() + # Get the most recent state vector. Note that we can only call this method every + # 10 seconds as we are using the free version of the API. + states = opensky.get_states(bbox=(45.8389, 47.8229, 5.9962, 10.5226)) + + # Start up a local Hyper process. + with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper: + # Create a connection to the Hyper process and connect to a hyper file + # (create the file and replace if it exists). + with Connection(endpoint=hyper.endpoint, database=database_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection: + # Create a table definition with table name "flights" in the "public" schema + # and columns for airport data. + table_definition = TableDefinition( + table_name=TableName("public", "flights"), + columns=[ + TableDefinition.Column('baro_altitude', SqlType.double(), Nullability.NULLABLE), + TableDefinition.Column('callsign', SqlType.text(), Nullability.NOT_NULLABLE), + TableDefinition.Column('latitude', SqlType.double(), Nullability.NULLABLE), + TableDefinition.Column('longitude', SqlType.double(), Nullability.NULLABLE), + TableDefinition.Column('on_ground', SqlType.bool(), Nullability.NOT_NULLABLE), + TableDefinition.Column('origin_country', SqlType.text(), Nullability.NOT_NULLABLE), + TableDefinition.Column('time_position', SqlType.int(), Nullability.NULLABLE), + TableDefinition.Column('velocity', SqlType.double(), Nullability.NULLABLE), + ]) + # Create the flights table. + connection.catalog.create_table(table_definition) + + # Insert each of the states into the table. + with Inserter(connection, table_definition) as inserter: + for s in states.states: + inserter.add_row([s.baro_altitude, s.callsign, s.latitude, s.longitude, s.on_ground, s.origin_country, s.time_position, s.velocity]) + inserter.execute() + + num_flights = connection.execute_scalar_query(query=f"SELECT COUNT(*) from {table_definition.table_name}") + print(f"Inserted {num_flights} flights into {database_path}.") + +def publish_to_server(server_url, tableau_auth, project_name, database_path, datasource_name_on_server): + """ + Creates the datasource on Tableau Server if it has not yet been created. Otherwise, uses the + Hyper Update REST API (https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to append the data to the datasource. + """ + # Create a tableuserverclient object to interact with Tableau Server. + server = TSC.Server(server_url, use_server_version=True) + # Sign into Tableau Server with the above authentication information. + with server.auth.sign_in(tableau_auth): + # Get project_id from project_name. + matching_projects = server.projects.filter(name=project_name) + project_id = next((project.id for project in matching_projects if project.name == project_name), None) + if project_id is None: + print(f"Publish failed. The specified project '{project_name}' does not exist.") + exit() + + # Get the datasource from Server (if it exists). + matching_datasources = server.datasources.filter(name=datasource_name_on_server) + datasource = next((ds for ds in matching_datasources), None) + + if datasource is None: + # If the datasource does not exist on server, publish the datasource. + publish_mode = TSC.Server.PublishMode.CreateNew + datasource = TSC.DatasourceItem(project_id) + # Set the name of the datasource such that it can be easily identified. + datasource.name = datasource_name_on_server + datasource = server.datasources.publish(datasource, database_path, publish_mode) + print(f"New datasource published: (id : {datasource.id}, name: {datasource.name}).") + else: + # If the datasource already exists on Tableau Server, use the Hyper Update REST API + # to send the delta to Tableau Server and insert the data into the respective table + # in the datasource. + + # Create a new random request id. + request_id = str(uuid.uuid4()) + + # Create one action that inserts from the new table into the existing table. + # For more details, see https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm#action-batch-descriptions + actions = [ + { + "action": "insert", + "source-schema": "public", + "source-table": "flights", + "target-schema": "public", + "target-table": "flights", + } + ] + + # Start the update job on Server. + job = server.datasources.update_hyper_data(datasource.id, request_id=request_id, actions=actions, payload=database_path) + print(f"Update job posted (ID: {job.id}). Waiting for the job to complete...") + + # Wait for the job to finish. + job = server.jobs.wait_for_job(job) + print("Job finished successfully") + + +if __name__ == '__main__': + argparser = argparse.ArgumentParser(description="Incremental refresh with flights data.") + argparser.add_argument("server_url", help="The url of Tableau Server / Cloud, e.g. 'https://us-west-2a.online.tableau.com'") + argparser.add_argument("site_name", help="The name of your site, e.g., use 'default' for your default site. Note that you cannot use 'default' in Tableau Cloud but must use the site name.", default='default') + argparser.add_argument("project_name", help="The name of your project, e.g., use an empty string ('') for your default project.", default="") + argparser.add_argument("token_name", help="The name of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm") + argparser.add_argument("token_value", help="The value of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm") + args = argparser.parse_args() + + # First create the hyper database with flights data. + database_path = "flights.hyper" + create_hyper_database_with_flights_data(database_path) + + # Then publish the data to server. + datasource_name_on_server = 'flights_data_set' + # Create credentials to sign into Tableau Server. + tableau_auth = TSC.PersonalAccessTokenAuth(args.token_name, args.token_value, args.site_name) + publish_to_server(args.server_url, tableau_auth, args.project_name, database_path, datasource_name_on_server) \ No newline at end of file diff --git a/Community-Supported/flights-data-incremental-refresh/requirements.txt b/Community-Supported/flights-data-incremental-refresh/requirements.txt new file mode 100644 index 0000000..3eefeaf --- /dev/null +++ b/Community-Supported/flights-data-incremental-refresh/requirements.txt @@ -0,0 +1,3 @@ +tableauhyperapi>=0.0.14946 +tableauserverclient>=0.19.0 +https://github.com/openskynetwork/opensky-api/archive/master.zip#subdirectory=python \ No newline at end of file diff --git a/Community-Supported/hyper-to-csv/hyper-to-csv.py b/Community-Supported/hyper-to-csv/hyper-to-csv.py index 84cf26b..d0d10e9 100644 --- a/Community-Supported/hyper-to-csv/hyper-to-csv.py +++ b/Community-Supported/hyper-to-csv/hyper-to-csv.py @@ -19,7 +19,7 @@ TableName, \ HyperException -# An example of how to turn a .hyper file into a csv to fit within potiential ETL workflows. +# An example of how to turn a .hyper file into a csv to fit within potential ETL workflows. """ Note: you need to follow the pantab documentation to make sure columns line up with the