Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README for run operator #14

Merged
merged 1 commit into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 90 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,10 @@ Otherwise, just `pip install` it:
$ pip install airflow-providers-wherobots
```

## Usage

### Create a connection
### Create an http connection

You first need to create a Connection in Airflow. This can be done from
the UI, or from the command-line. The default Wherobots connection name
Create a Connection in Airflow. This can be done from Apache Airflow's [Web UI](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#creating-connection-ui),
or from the command-line. The default Wherobots connection name
is `wherobots_default`; if you use another name you must specify that
name with the `wherobots_conn_id` parameter when initializing Wherobots
operators.
Expand All @@ -39,6 +37,92 @@ $ airflow connections add "wherobots_default" \
--conn-password "$(< api.key)"
```

## Usage

### Execute a `Run` on Wherobots Cloud

Wherobots allows users to upload their code (.py, .jar),
zongsizhang marked this conversation as resolved.
Show resolved Hide resolved
execute it on the cloud, and monitor the status of the run.
Each execution is called a `Run`.

The `WherobotsRunOperator` allows you to execute a `Run` on Wherobots Cloud.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define what a Run is for new users.

`WherobotsRunOperator` triggers the run according to the parameters you provide,
and waits for the run to finish before completing the task.

Refer to the [Wherobots Managed Storage Documentation](https://docs.wherobots.com/latest/develop/storage-management/storage/)
to learn more about how to upload and manage your code on Wherobots Cloud.

Below is an example of `WherobotsRunOperator`

```python
operator = WherobotsRunOperator(
task_id="your_task_id",
name="airflow_operator_test_run_{{ ts_nodash }}",
run_python={
"uri": "s3://wbts-wbc-m97rcg45xi/42ly7mi0p1/data/shared/classification.py"
},
runtime="tiny-a10-gpu",
dag=dag,
poll_logs=True,
)
```

#### Arguments

The arguments for the `WherobotsRunOperator` constructor:

The `run_*` arguments are mutually exclusive, you can only specify one of them.
* `name: str`: The name of the run.
If not specified, a default name will be
generated.
* `runtime: str`: The runtime decides the size of the workloads that execute the run.
zongsizhang marked this conversation as resolved.
Show resolved Hide resolved
The default value is `tiny`.
Find the list of available runtimes in your Wherobots Cloud account
-> Organization Settings -> General -> RUNTIME CONFIGURATION.

Find the runtime IDs in the bracket on the right side of the city name.

![runtimes.png](doc%2Fruntimes.png)

* `poll_logs: bool`: If `True`, the operator will poll the logs of the run until it finishes.
If `False`, the operator will not poll the logs, just track the status of the run.
* `polling_interval`: The interval in seconds to poll the status of the run.
The default value is `30`.
* `run_python: dict`: A dictionary with the following keys:
* `uri: str`: The URI of the Python file to run.
* `args: list[str]`: A list of arguments to pass to the Python file.
* `run_jar: dict`: A dictionary with the following keys:
* `uri: str`: The URI of the JAR file to run.
* `args: list[str]`: A list of arguments to pass to the JAR file.
* `mainClass: str`: The main class to run in the JAR file.
* `environment: dict`: A dictionary with the following keys:
* `sparkDriverDiskGB: int`: The disk size for the Spark driver.
* `sparkExecutorDiskGB: int`: The disk size for the Spark executor.
* `sparkConfigs: dict`: A dictionary of Spark configurations.
* `dependencies: list[dict]`: A list of dependant libraries to install.

The `dependencies` argument is a list of dictionaries.
There are two types of dependencies supported.

1. `PYPI` dependencies:
```json
{
"sourceType": "PYPI",
"libraryName": "package_name",
"libraryVersion": "package_version"
}
```

2. `FILE` dependencies:
```json
{
"sourceType": "FILE",
"filePath": "s3://bucket/path/to/dependency.whl"
}
```
File types supported: `.whl`, `.zip`, `.jar`


### Execute a SQL query

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have the same list of arguments for WherobotsSqlOperator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll add separately


The `WherobotsSqlOperator` allows you to run SQL queries on the
Expand All @@ -54,7 +138,7 @@ SQL with WherobotsDB.
Refer to the [Wherobots Apache Airflow Provider Documentation](https://docs.wherobots.com/latest/develop/airflow-provider)
to get more detailed guidance about how to use the Wherobots Apache Airflow Provider.

## Example
#### Example

Below is an example Airflow DAG that executes a SQL query on Wherobots
Cloud:
Expand Down
Binary file added doc/runtimes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "airflow-providers-wherobots"
version = "0.1.14"
version = "0.2.0"
description = "Airflow extension for communicating with Wherobots Cloud"
authors = ["zongsi.zhang <[email protected]>"]
readme = "README.md"
Expand Down
Loading