Skip to content

Commit

Permalink
Improve the KFP / User Guides / Core Functions docs (#3795)
Browse files Browse the repository at this point in the history
* Improve the KFP / User Guides / Core Functions docs

Signed-off-by: Mathew Wicks <[email protected]>

* Small updates 1

Signed-off-by: Mathew Wicks <[email protected]>

* Fix `KFPClientManager()` for Kubeflow 1.9.0

Signed-off-by: Mathew Wicks <[email protected]>

* Fix checklinks errors

Signed-off-by: Mathew Wicks <[email protected]>

* Make "Connect the SDK to the API" more clear

Signed-off-by: Mathew Wicks <[email protected]>

---------

Signed-off-by: Mathew Wicks <[email protected]>
  • Loading branch information
thesuperzapper authored Aug 27, 2024
1 parent 9c29299 commit d3ca1b1
Show file tree
Hide file tree
Showing 17 changed files with 971 additions and 556 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This guide describes
[the Katib Config](https://github.com/kubeflow/katib/blob/19268062f1b187dde48114628e527a2a35b01d64/manifests/v1beta1/installs/katib-standalone/katib-config.yaml)
the main configuration file for every Katib component. We use Kubernetes
[ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) to
fetch that config into [the Katib control plane components](/docs/components/katib/installation/#installing-control-plane).
fetch that config into [the Katib control plane components](/docs/components/katib/reference/architecture/#katib-control-plane-components).

The ConfigMap must be deployed in the
[`KATIB_CORE_NAMESPACE`](/docs/components/katib/user-guides/env-variables/#katib-controller)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ wget -O ${PIPELINE_FILE} ${PIPELINE_URL}
dsl-compile --py ${PIPELINE_FILE} --output ${PIPELINE_NAME}.tar.gz
```

After running the commands above, you should get two files in your current directory: `sequential.py` and `sequential.tar.gz`. Run the following command to deploy the generated `.tar.gz` file as you would do using the [Kubeflow Pipelines UI](/docs/components/pipelines/user-guides/core-functions/run-a-pipeline/#1-run-from-the-kfp-dashboard), but this time using the REST API.
After running the commands above, you should get two files in your current directory: `sequential.py` and `sequential.tar.gz`. Run the following command to deploy the generated `.tar.gz` file as you would do using the [Kubeflow Pipelines UI](/docs/components/pipelines/user-guides/core-functions/run-a-pipeline/#run-pipeline---kfp-dashboard), but this time using the REST API.

```
SVC=localhost:8888
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ Pipeline definitions are not isolated right now, and are shared across all names

How to connect Pipelines SDK to Kubeflow Pipelines will depend on __what kind__ of Kubeflow deployment you have, and __from where you are running your code__.

* [Full Kubeflow (from inside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#full-kubeflow-subfrom-inside-clustersub)
* [Full Kubeflow (from outside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#full-kubeflow-subfrom-outside-clustersub)
* [Standalone Kubeflow Pipelines (from inside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#standalone-kubeflow-pipelines-subfrom-inside-clustersub)
* [Standalone Kubeflow Pipelines (from outside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#standalone-kubeflow-pipelines-subfrom-outside-clustersub)
* [Full Kubeflow (from inside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#kubeflow-platform---inside-the-cluster)
* [Full Kubeflow (from outside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#kubeflow-platform---outside-the-cluster)
* [Standalone Kubeflow Pipelines (from inside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#standalone-kfp---inside-the-cluster)
* [Standalone Kubeflow Pipelines (from outside cluster)](/docs/components/pipelines/user-guides/core-functions/connect-api/#standalone-kfp---outside-the-cluster)

The following Python code will create an experiment (and associated run) from a Pod inside a full Kubeflow cluster.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
+++
title = "Create components"
description = "Create pipelines with reusable components."
weight = 3
+++

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Core Functions"
description = "Documentation for users of Kubeflow Pipelines."
description = "Learn about the core functions of Kubeflow Pipelines."
weight = 2
+++
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
+++
title = "Build a More Advanced ML Pipeline"
weight = 6
description = "Create a more advanced pipeline that leverages additional KFP features."
weight = 199
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Use Caching"
description = "How to use caching in Kubeflow Pipelines."
weight = 5
description = "Learn about caching in Kubeflow Pipelines."
weight = 104
+++

Kubeflow Pipelines support caching to eliminate redundant executions and improve
Expand All @@ -26,7 +26,7 @@ be marked with a green "arrow from cloud" icon.
## How to use caching

Caching is enabled by default for all components in KFP. You can disable caching
for a component by calling `.set_caching_options(False)` on a task object.
for a component by calling [`.set_caching_options(enable_caching=False)`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.set_caching_options) on a task object.

```python
from kfp import dsl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
+++
title = "Interact with KFP via the CLI"
weight = 4
title = "Use the KFP CLI"
description = "Learn how to interact with Kubeflow Pipelines using the KFP CLI."
weight = 203
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
+++
title = "Compile a Pipeline"
description = "Compile pipelines and components to YAML"
weight = 2
description = "Define and compile a basic pipeline using the KFP SDK."
weight = 101
+++

{{% kfp-v2-keywords %}}

To submit a pipeline for execution, you must compile it to YAML with the KFP SDK compiler:
## Overview

To [submit a pipeline for execution](/docs/components/pipelines/user-guides/core-functions/run-a-pipeline/), you must compile it to YAML with the KFP SDK compiler.

In the following example, the compiler creates a file called `pipeline.yaml`, which contains a hermetic representation of your pipeline.
The output is called an [Intermediate Representation (IR) YAML](#ir-yaml), which is a serialized [`PipelineSpec`][pipeline-spec] protocol buffer message.

```python
from kfp import dsl
from kfp import compiler
from kfp import compiler, dsl

@dsl.component
def comp(message: str) -> str:
Expand All @@ -25,9 +29,19 @@ def my_pipeline(message: str) -> str:
compiler.Compiler().compile(my_pipeline, package_path='pipeline.yaml')
```

In this example, the compiler creates a file called `pipeline.yaml`, which contains a hermetic representation of your pipeline. The output is called intermediate representation (IR) YAML. You can view an example of IR YAML on [GitHub][compiled-output-example]. The contents of the file is the serialized [`PipelineSpec`][pipeline-spec] protocol buffer message and is not intended to be human-readable.
Because components are actually pipelines, you may also compile them to IR YAML:

```python
@dsl.component
def comp(message: str) -> str:
print(message)
return message

compiler.Compiler().compile(comp, package_path='component.yaml')
```

You can find human-readable information about the pipeline in the comments at the top of the compiled YAML:
You can view an [example of IR YAML][compiled-output-example] on GitHub.
The contents of the file are not intended to be human-readable, however the comments at the top of the file provide a summary of the pipeline:

```yaml
# PIPELINE DEFINITION
Expand All @@ -40,16 +54,21 @@ You can find human-readable information about the pipeline in the comments at th
...
```

You can also compile components, as opposed to pipelines, to IR YAML:
## Type checking

```python
@dsl.component
def comp(message: str) -> str:
print(message)
return message
By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another.
Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.

compiler.Compiler().compile(comp, package_path='component.yaml')
```
Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided.
See [Data Types][data-types] for more information about KFP data types.

For example, for parameters, a list input may only be passed to parameters with a `typing.List` annotation.
Similarly, a float may only be passed to parameters with a `float` annotation.

Input data types and annotations must also match for artifacts, with one exception: the `Artifact` type is compatible with all other artifact types.
In this sense, the `Artifact` type is both the default artifact type and an artifact "any" type.

As described in the following section, you can disable type checking.

## Compiler arguments

Expand All @@ -63,25 +82,14 @@ The [`Compiler.compile`][compiler-compile] method accepts the following argument
| `pipeline_parameters` | `Dict[str, Any]` | _Optional_<br/>Map of parameter names to argument values. This lets you provide default values for pipeline or component parameters. You can override these default values during pipeline submission.
| `type_check` | `bool` | _Optional_<br/>Indicates whether static type checking is enabled during compilation.<br/>

## Type checking

By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another. Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.

Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided. See [Data Types][data-types] for more information about KFP data types.

For example, for parameters, a list input may only be passed to parameters with a `typing.List` annotation. Similarly, a float may only be passed to parameters with a `float` annotation.

Input data types and annotations must also match for artifacts, with one exception: the `Artifact` type is compatible with all other artifact types. In this sense, the `Artifact` type is both the default artifact type and an artifact "any" type.

As described in the following section, you can disable type checking.

## IR YAML

The IR YAML is an intermediate representation of a compiled pipeline or component. It is an instance of the [`PipelineSpec`][pipeline-spec] protocol buffer message type, which is a platform-agnostic pipeline representation protocol. It is considered an intermediate representation because the KFP backend compiles `PipelineSpec` to [Argo Workflow][argo-workflow] YAML as the final pipeline definition for execution.
The IR YAML is an intermediate representation of a compiled pipeline or component.
It is an instance of the [`PipelineSpec`][pipeline-spec] protocol buffer message type, which is a platform-agnostic pipeline representation protocol.
It is considered an intermediate representation because the KFP backend compiles `PipelineSpec` to [Argo Workflow][argo-workflow] YAML as the final pipeline definition for execution.

Unlike the v1 component YAML, the IR YAML is not intended to be written directly.

While IR YAML is not intended to be easily human readable, you can still inspect it if you know a bit about its contents:
While IR YAML is not intended to be easily human-readable, you can still inspect it if you know a bit about its contents:

| Section | Description | Example |
|-------|-------------|---------|
Expand Down
Loading

0 comments on commit d3ca1b1

Please sign in to comment.