Skip to content

Commit

Permalink
Merge pull request #157 from AstraZeneca/improved_docs
Browse files Browse the repository at this point in the history
Improved docs
  • Loading branch information
vijayvammi authored Jun 11, 2024
2 parents c37f168 + d13394d commit 54c3217
Show file tree
Hide file tree
Showing 24 changed files with 2,699 additions and 1,456 deletions.
461 changes: 139 additions & 322 deletions README.md

Large diffs are not rendered by default.

Binary file added docs/assets/work_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/work_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/concepts/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,21 @@ to run on every hyper parameter.
success([Success]):::green
subgraph one[Parameter 1]
process_chunk1([Process Chunk]):::yellow
process_chunk1([Train model]):::yellow
success_chunk1([Success]):::yellow
process_chunk1 --> success_chunk1
end
subgraph two[Parameter ...]
process_chunk2([Process Chunk]):::yellow
process_chunk2([Train model]):::yellow
success_chunk2([Success]):::yellow
process_chunk2 --> success_chunk2
end
subgraph three[Parameter n]
process_chunk3([Process Chunk]):::yellow
process_chunk3([Train model]):::yellow
success_chunk3([Success]):::yellow
process_chunk3 --> success_chunk3
Expand Down
51 changes: 2 additions & 49 deletions docs/configurations/overview.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,2 @@
**runnable** is designed to make effective collaborations between data scientists/researchers
and infrastructure engineers.

All the features described in the [concepts](../concepts/the-big-picture.md) are
aimed at the *research* side of data science projects while configurations add *scaling* features to them.


Configurations are presented during the execution:

For ```yaml``` based pipeline, use the ```--config-file, -c``` option in the [runnable CLI](../usage.md/#usage).

For [python SDK](../sdk.md/#runnable.Pipeline.execute), use the ```configuration_file``` option or via
environment variable ```runnable_CONFIGURATION_FILE```

## Default configuration

```yaml
--8<-- "examples/configs/default.yaml"
```

1. Execute the pipeline in the local compute environment.
2. The run log is not persisted but present in-memory and flushed at the end of execution.
3. No catalog functionality, all catalog operations are effectively no-op.
4. No secrets functionality, all secrets are effectively no-op.
5. No experiment tracking tools, all interactions with experiment tracking tools are effectively no-op.
Run log still captures the metrics, but are not passed to the experiment tracking tools.

The default configuration for all the pipeline executions runs on the
[local compute](executors/local.md), using a
[buffered run log](run-log.md/#buffered) store with
[no catalog](catalog.md/#do-nothing) or
[secrets](secrets.md/#do-nothing) or
[experiment tracking functionality](experiment-tracking.md/).



## Format

The configuration file is in yaml format and the typical structure is:

```yaml
service:
type: service provider
config:
...
```
where service is one of ```executor```, ```catalog```, ```experiment_tracker```,
```secrets``` or ```run_log_store```.
**runnable** is designed to enable the pipeline execution in varied computational environments without changing the
infrastructure patterns.
32 changes: 31 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,41 @@ The difference between native driver and runnable orchestration:
- [x] The pipeline is `runnable` in any environment.


## But why runnable?
## why runnable?

Obviously, there are a lot of orchestration tools. A well maintained and curated [list is
available here](https://github.com/EthicalML/awesome-production-machine-learning/).

Broadly, they could be classed into ```native``` or ```meta``` orchestrators.

<figure markdown>
![Image title](assets/work_light.png#only-light){ width="600" height="300"}
![Image title](assets/work_dark.png#only-dark){ width="600" height="300"}
</figure>


### __native orchestrators__

- Focus on resource management, job scheduling, robustness and scalability.
- Have less features on domain (data engineering, data science) activities.
- Difficult to run locally.
- Not ideal for quick experimentation or research activities.

### __meta orchestrators__

- An abstraction over native orchestrators.
- Oriented towards domain (data engineering, data science) features.
- Easy to get started and run locally.
- Ideal for quick experimentation or research activities.

```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science activities.
It works in conjunction with _native_ orchestrators and an alternative to [kedro](https://docs.kedro.org/en/stable/index.html)
or [metaflow](https://metaflow.org/).





```runnable``` stands out based on these design principles.

<div class="grid cards" markdown>
Expand Down
72 changes: 41 additions & 31 deletions docs/reference.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
Please accompany the reference with ```examples``` from
[the repo](https://github.com/AstraZeneca/runnable-core).



## PythonTask

=== "sdk"
Expand Down Expand Up @@ -75,18 +80,40 @@
<hr style="border:2px dotted orange">


## Catalog
## ShellTask

=== "sdk"

::: runnable.Catalog
::: runnable.ShellTask
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"

Attributes:

- ```name```: the name of the task
- ```command```: the path to the notebook relative to the project root.
- ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
the pipeline successfully or ```fail``` to terminate with fail.
- ```on_failure```: The next node in case of failure.
- ```catalog```: mapping of cataloging items
- ```overrides```: mapping of step overrides from global configuration.

```yaml
dag:
steps:
name: <>
type: task
command: <>
next: <>
on_failure: <>
catalog: # Any cataloging to be done.
overrides: # mapping of overrides of global configuration
```


<hr style="border:2px dotted orange">
Expand All @@ -108,16 +135,14 @@
<hr style="border:2px dotted orange">



## ShellTask
## Catalog

=== "sdk"

::: runnable.ShellTask
::: runnable.Catalog
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"
Expand All @@ -128,30 +153,29 @@




## Parallel

## Pipeline

=== "sdk"

::: runnable.Parallel
::: runnable.Pipeline
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3
members:
- execute

=== "yaml"



<hr style="border:2px dotted orange">
## Parallel

## Map

=== "sdk"

::: runnable.Map
::: runnable.Parallel
options:
show_root_heading: true
show_bases: false
Expand All @@ -160,35 +184,21 @@

=== "yaml"

<hr style="border:2px dotted orange">



::: runnable.Success
options:
show_root_heading: true
show_bases: false
show_docstring_description: true

<hr style="border:2px dotted orange">

::: runnable.Fail
options:
show_root_heading: true
show_bases: false
show_docstring_description: true

<hr style="border:2px dotted orange">

## Pipeline
## Map

=== "sdk"

::: runnable.Pipeline
::: runnable.Map
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"

<hr style="border:2px dotted orange">
5 changes: 1 addition & 4 deletions examples/01-tasks/stub.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,7 @@ def main():

step3 = Stub(name="step3", terminate_with_success=True)

pipeline = Pipeline(
steps=[step1, step2, step3],
add_terminal_nodes=True,
)
pipeline = Pipeline(steps=[step1, step2, step3])

pipeline.execute()

Expand Down
2 changes: 1 addition & 1 deletion examples/02-sequential/on_failure_fail.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def main():
step_1.on_failure = step_4.name

pipeline = Pipeline(
steps=[step_1, step_2, step_3, [step_4]],
steps=[step_1, step_2, step_3],
)
pipeline.execute()

Expand Down
1 change: 0 additions & 1 deletion examples/03-parameters/static_parameters_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ def read_initial_params_as_json(

pipeline = Pipeline(
steps=[read_params_as_pydantic, read_params_as_json],
add_terminal_nodes=True,
)

_ = pipeline.execute(parameters_file="examples/common/initial_parameters.yaml")
Expand Down
1 change: 0 additions & 1 deletion examples/07-map/custom_reducer.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ def iterable_branch(execute: bool = True):

pipeline = Pipeline(
steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
add_terminal_nodes=True,
)

if execute:
Expand Down
1 change: 0 additions & 1 deletion examples/07-map/map.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ def iterable_branch(execute: bool = True):

pipeline = Pipeline(
steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
add_terminal_nodes=True,
)

if execute:
Expand Down
1 change: 0 additions & 1 deletion examples/comparisions/README.md

This file was deleted.

Loading

0 comments on commit 54c3217

Please sign in to comment.