Merge pull request #157 from AstraZeneca/improved_docs

Improved docs
AstraZeneca · Jun 11, 2024 · 54c3217 · 54c3217
2 parents c37f168 + d13394d
commit 54c3217
Show file tree

Hide file tree

Showing 24 changed files with 2,699 additions and 1,456 deletions.
diff --git a/README.md b/README.md
diff --git a/docs/assets/work_dark.png b/docs/assets/work_dark.png
diff --git a/docs/assets/work_light.png b/docs/assets/work_light.png
diff --git a/docs/concepts/map.md b/docs/concepts/map.md
@@ -12,21 +12,21 @@ to run on every hyper parameter.
     success([Success]):::green
 
     subgraph one[Parameter 1]
-        process_chunk1([Process Chunk]):::yellow
+        process_chunk1([Train model]):::yellow
         success_chunk1([Success]):::yellow
 
         process_chunk1 --> success_chunk1
     end
 
     subgraph two[Parameter ...]
-        process_chunk2([Process Chunk]):::yellow
+        process_chunk2([Train model]):::yellow
         success_chunk2([Success]):::yellow
 
         process_chunk2 --> success_chunk2
     end
 
     subgraph three[Parameter n]
-        process_chunk3([Process Chunk]):::yellow
+        process_chunk3([Train model]):::yellow
         success_chunk3([Success]):::yellow
 
         process_chunk3 --> success_chunk3

diff --git a/docs/configurations/overview.md b/docs/configurations/overview.md
@@ -1,49 +1,2 @@
-**runnable** is designed to make effective collaborations between data scientists/researchers
-and infrastructure engineers.
-
-All the features described in the [concepts](../concepts/the-big-picture.md) are
-aimed at the *research* side of data science projects while configurations add *scaling* features to them.
-
-
-Configurations are presented during the execution:
-
-For ```yaml``` based pipeline, use the ```--config-file, -c``` option in the [runnable CLI](../usage.md/#usage).
-
-For [python SDK](../sdk.md/#runnable.Pipeline.execute), use the ```configuration_file``` option or via
-environment variable ```runnable_CONFIGURATION_FILE```
-
-## Default configuration
-
-```yaml
---8<-- "examples/configs/default.yaml"
-```
-
-1. Execute the pipeline in the local compute environment.
-2. The run log is not persisted but present in-memory and flushed at the end of execution.
-3. No catalog functionality, all catalog operations are effectively no-op.
-4. No secrets functionality, all secrets are effectively no-op.
-5. No experiment tracking tools, all interactions with experiment tracking tools are effectively no-op.
-Run log still captures the metrics, but are not passed to the experiment tracking tools.
-
-The default configuration for all the pipeline executions runs on the
-[local compute](executors/local.md), using a
-[buffered run log](run-log.md/#buffered) store with
-[no catalog](catalog.md/#do-nothing) or
-[secrets](secrets.md/#do-nothing) or
-[experiment tracking functionality](experiment-tracking.md/).
-
-
-
-## Format
-
-The configuration file is in yaml format and the typical structure is:
-
-```yaml
-service:
-  type: service provider
-  config:
-    ...
-```
-
-where service is one of ```executor```, ```catalog```, ```experiment_tracker```,
- ```secrets``` or ```run_log_store```.
+**runnable** is designed to enable the pipeline execution in varied computational environments without changing the
+infrastructure patterns.
diff --git a/docs/index.md b/docs/index.md
@@ -79,11 +79,41 @@ The difference between native driver and runnable orchestration:
 - [x] The pipeline is `runnable` in any environment.
 
 
-## But why runnable?
+## why runnable?
 
 Obviously, there are a lot of orchestration tools. A well maintained and curated [list is
 available here](https://github.com/EthicalML/awesome-production-machine-learning/).
 
+Broadly, they could be classed into ```native``` or ```meta``` orchestrators.
+
+<figure markdown>
+  ![Image title](assets/work_light.png#only-light){ width="600" height="300"}
+  ![Image title](assets/work_dark.png#only-dark){ width="600" height="300"}
+</figure>
+
+
+### __native orchestrators__
+
+- Focus on resource management, job scheduling, robustness and scalability.
+- Have less features on domain (data engineering, data science) activities.
+- Difficult to run locally.
+- Not ideal for quick experimentation or research activities.
+
+### __meta orchestrators__
+
+- An abstraction over native orchestrators.
+- Oriented towards domain (data engineering, data science) features.
+- Easy to get started and run locally.
+- Ideal for quick experimentation or research activities.
+
+```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science activities.
+It works in conjunction with _native_ orchestrators and an alternative to [kedro](https://docs.kedro.org/en/stable/index.html)
+or [metaflow](https://metaflow.org/).
+
+
+
+
+
 ```runnable``` stands out based on these design principles.
 
 <div class="grid cards" markdown>

diff --git a/docs/reference.md b/docs/reference.md
@@ -1,3 +1,8 @@
+Please accompany the reference with  ```examples``` from
+[the repo](https://github.com/AstraZeneca/runnable-core).
+
+
+
 ## PythonTask
 
 === "sdk"
@@ -75,18 +80,40 @@
 <hr style="border:2px dotted orange">
 
 
-## Catalog
+## ShellTask
 
 === "sdk"
 
-    ::: runnable.Catalog
+    ::: runnable.ShellTask
         options:
             show_root_heading: true
             show_bases: false
+            show_docstring_description: true
             heading_level: 3
 
 === "yaml"
 
+    Attributes:
+
+    - ```name```: the name of the task
+    - ```command```: the path to the notebook relative to the project root.
+    - ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
+    the pipeline successfully or ```fail``` to terminate with fail.
+    - ```on_failure```: The next node in case of failure.
+    - ```catalog```: mapping of cataloging items
+    - ```overrides```: mapping of step overrides from global configuration.
+
+    ```yaml
+    dag:
+      steps:
+        name: <>
+          type: task
+          command: <>
+          next: <>
+          on_failure: <>
+          catalog: # Any cataloging to be done.
+          overrides: # mapping of overrides of global configuration
+    ```
 
 
 <hr style="border:2px dotted orange">
@@ -108,16 +135,14 @@
 <hr style="border:2px dotted orange">
 
 
-
-## ShellTask
+## Catalog
 
 === "sdk"
 
-    ::: runnable.ShellTask
+    ::: runnable.Catalog
         options:
             show_root_heading: true
             show_bases: false
-            show_docstring_description: true
             heading_level: 3
 
 === "yaml"
@@ -128,30 +153,29 @@
 
 
 
-
-## Parallel
-
+## Pipeline
 
 === "sdk"
 
-    ::: runnable.Parallel
+    ::: runnable.Pipeline
         options:
             show_root_heading: true
             show_bases: false
             show_docstring_description: true
             heading_level: 3
+            members:
+              - execute
 
 === "yaml"
 
 
 
-<hr style="border:2px dotted orange">
+## Parallel
 
-## Map
 
 === "sdk"
 
-    ::: runnable.Map
+    ::: runnable.Parallel
         options:
             show_root_heading: true
             show_bases: false
@@ -160,35 +184,21 @@
 
 === "yaml"
 
-<hr style="border:2px dotted orange">
-
-
-
-::: runnable.Success
-    options:
-        show_root_heading: true
-        show_bases: false
-        show_docstring_description: true
-
-<hr style="border:2px dotted orange">
 
-::: runnable.Fail
-    options:
-        show_root_heading: true
-        show_bases: false
-        show_docstring_description: true
 
 <hr style="border:2px dotted orange">
 
-## Pipeline
+## Map
 
 === "sdk"
 
-    ::: runnable.Pipeline
+    ::: runnable.Map
         options:
             show_root_heading: true
             show_bases: false
             show_docstring_description: true
             heading_level: 3
 
 === "yaml"
+
+<hr style="border:2px dotted orange">
diff --git a/examples/01-tasks/stub.py b/examples/01-tasks/stub.py
@@ -29,10 +29,7 @@ def main():
 
     step3 = Stub(name="step3", terminate_with_success=True)
 
-    pipeline = Pipeline(
-        steps=[step1, step2, step3],
-        add_terminal_nodes=True,
-    )
+    pipeline = Pipeline(steps=[step1, step2, step3])
 
     pipeline.execute()
 

diff --git a/examples/02-sequential/on_failure_fail.py b/examples/02-sequential/on_failure_fail.py
@@ -31,7 +31,7 @@ def main():
     step_1.on_failure = step_4.name
 
     pipeline = Pipeline(
-        steps=[step_1, step_2, step_3, [step_4]],
+        steps=[step_1, step_2, step_3],
     )
     pipeline.execute()
 

diff --git a/examples/03-parameters/static_parameters_python.py b/examples/03-parameters/static_parameters_python.py
@@ -64,7 +64,6 @@ def read_initial_params_as_json(
 
     pipeline = Pipeline(
         steps=[read_params_as_pydantic, read_params_as_json],
-        add_terminal_nodes=True,
     )
 
     _ = pipeline.execute(parameters_file="examples/common/initial_parameters.yaml")

diff --git a/examples/07-map/custom_reducer.py b/examples/07-map/custom_reducer.py
@@ -85,7 +85,6 @@ def iterable_branch(execute: bool = True):
 
     pipeline = Pipeline(
         steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
-        add_terminal_nodes=True,
     )
 
     if execute:

diff --git a/examples/07-map/map.py b/examples/07-map/map.py
@@ -88,7 +88,6 @@ def iterable_branch(execute: bool = True):
 
     pipeline = Pipeline(
         steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
-        add_terminal_nodes=True,
     )
 
     if execute:

diff --git a/examples/comparisions/README.md b/examples/comparisions/README.md