Skip to content

Commit

Permalink
README for run operator (#14)
Browse files Browse the repository at this point in the history
  • Loading branch information
zongsizhang authored Sep 26, 2024
1 parent 0226da0 commit 459dd5a
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 7 deletions.
96 changes: 90 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,10 @@ Otherwise, just `pip install` it:
$ pip install airflow-providers-wherobots
```

## Usage

### Create a connection
### Create an http connection

You first need to create a Connection in Airflow. This can be done from
the UI, or from the command-line. The default Wherobots connection name
Create a Connection in Airflow. This can be done from Apache Airflow's [Web UI](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#creating-connection-ui),
or from the command-line. The default Wherobots connection name
is `wherobots_default`; if you use another name you must specify that
name with the `wherobots_conn_id` parameter when initializing Wherobots
operators.
Expand All @@ -39,6 +37,92 @@ $ airflow connections add "wherobots_default" \
--conn-password "$(< api.key)"
```

## Usage

### Execute a `Run` on Wherobots Cloud

Wherobots allows users to upload their code (.py, .jar),
execute it on the cloud, and monitor the status of the run.
Each execution is called a `Run`.

The `WherobotsRunOperator` allows you to execute a `Run` on Wherobots Cloud.
`WherobotsRunOperator` triggers the run according to the parameters you provide,
and waits for the run to finish before completing the task.

Refer to the [Wherobots Managed Storage Documentation](https://docs.wherobots.com/latest/develop/storage-management/storage/)
to learn more about how to upload and manage your code on Wherobots Cloud.

Below is an example of `WherobotsRunOperator`

```python
operator = WherobotsRunOperator(
task_id="your_task_id",
name="airflow_operator_test_run_{{ ts_nodash }}",
run_python={
"uri": "s3://wbts-wbc-m97rcg45xi/42ly7mi0p1/data/shared/classification.py"
},
runtime="tiny-a10-gpu",
dag=dag,
poll_logs=True,
)
```

#### Arguments

The arguments for the `WherobotsRunOperator` constructor:

The `run_*` arguments are mutually exclusive, you can only specify one of them.
* `name: str`: The name of the run.
If not specified, a default name will be
generated.
* `runtime: str`: The runtime decides the size of the workloads that execute the run.
The default value is `tiny`.
Find the list of available runtimes in your Wherobots Cloud account
-> Organization Settings -> General -> RUNTIME CONFIGURATION.

Find the runtime IDs in the bracket on the right side of the city name.

![runtimes.png](doc%2Fruntimes.png)

* `poll_logs: bool`: If `True`, the operator will poll the logs of the run until it finishes.
If `False`, the operator will not poll the logs, just track the status of the run.
* `polling_interval`: The interval in seconds to poll the status of the run.
The default value is `30`.
* `run_python: dict`: A dictionary with the following keys:
* `uri: str`: The URI of the Python file to run.
* `args: list[str]`: A list of arguments to pass to the Python file.
* `run_jar: dict`: A dictionary with the following keys:
* `uri: str`: The URI of the JAR file to run.
* `args: list[str]`: A list of arguments to pass to the JAR file.
* `mainClass: str`: The main class to run in the JAR file.
* `environment: dict`: A dictionary with the following keys:
* `sparkDriverDiskGB: int`: The disk size for the Spark driver.
* `sparkExecutorDiskGB: int`: The disk size for the Spark executor.
* `sparkConfigs: dict`: A dictionary of Spark configurations.
* `dependencies: list[dict]`: A list of dependant libraries to install.

The `dependencies` argument is a list of dictionaries.
There are two types of dependencies supported.

1. `PYPI` dependencies:
```json
{
"sourceType": "PYPI",
"libraryName": "package_name",
"libraryVersion": "package_version"
}
```

2. `FILE` dependencies:
```json
{
"sourceType": "FILE",
"filePath": "s3://bucket/path/to/dependency.whl"
}
```
File types supported: `.whl`, `.zip`, `.jar`


### Execute a SQL query

The `WherobotsSqlOperator` allows you to run SQL queries on the
Expand All @@ -54,7 +138,7 @@ SQL with WherobotsDB.
Refer to the [Wherobots Apache Airflow Provider Documentation](https://docs.wherobots.com/latest/develop/airflow-provider)
to get more detailed guidance about how to use the Wherobots Apache Airflow Provider.

## Example
#### Example

Below is an example Airflow DAG that executes a SQL query on Wherobots
Cloud:
Expand Down
Binary file added doc/runtimes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "airflow-providers-wherobots"
version = "0.1.14"
version = "0.2.0"
description = "Airflow extension for communicating with Wherobots Cloud"
authors = ["zongsi.zhang <[email protected]>"]
readme = "README.md"
Expand Down

0 comments on commit 459dd5a

Please sign in to comment.