doc: Revise and update README.md, Concepts.md and tutorial jupyter no…

…tebooks (#658) TASK: IL-305 Co-authored-by: Florian Schepers <[email protected]>
Aleph-Alpha · Mar 28, 2024 · ca707b8 · ca707b8
1 parent e7335c4
commit ca707b8
Show file tree

Hide file tree

Showing 11 changed files with 461 additions and 421 deletions.
diff --git a/Concepts.md b/Concepts.md
@@ -2,12 +2,12 @@
 
 The main focus of the Intelligence Layer is to enable developers to
 
-- implement their LLM use cases by building upon existing and composing existing functionality and providing insights into
-  the runtime behavior of these
+- implement their LLM use cases by building upon and composing existing functionalities
+- obtain insights into the runtime behavior of their implementations
 - iteratively improve their implementations or compare them to existing implementations by evaluating them against
-  a given set of example
+  a given set of examples
 
-Both focus points are described in more detail in the following sections.
+How these focus points are realized in the Intelligence Layer is described in more detail in the following sections.
 
 ## Task
 
@@ -18,8 +18,8 @@ transforms an input-parameter to an output like a function in mathematics.
 Task: Input -> Output
 ```
 
-In Python this is expressed through an abstract class with type-parameters and the abstract method `do_run`
-where the actual transformation is implemented:
+In Python this is realized by an abstract class with type-parameters and the abstract method `do_run`
+in which the actual transformation is implemented:
 
 ```Python
 class Task(ABC, Generic[Input, Output]):
@@ -30,13 +30,13 @@ class Task(ABC, Generic[Input, Output]):
 ```
 
 `Input` and `Output` are normal Python datatypes that can be serialized from and to JSON. For this the Intelligence
-Layer relies on [Pydantic](https://docs.pydantic.dev/). The types that can actually be used are defined in form
-of the type-alias [`PydanticSerializable`](src/intelligence_layer/core/tracer/tracer.py#L44).
+Layer relies on [Pydantic](https://docs.pydantic.dev/). The used types are defined in form
+of type-aliases PydanticSerializable.
 
 The second parameter `task_span` is used for [tracing](#Trace) which is described below.
 
-`do_run` is the method that needs to be implemented for a concrete task. The external interface of a
-task is its `run` method:
+`do_run` is the method that implements a concrete task and has to be provided by the user. It will be executed by the external interface method `run` of a
+task:
 
 ```Python
 class Task(ABC, Generic[Input, Output]):
@@ -45,7 +45,7 @@ class Task(ABC, Generic[Input, Output]):
       ...
 ```
 
-Its signature differs only in the parameters regarding [tracing](#Trace).
+The signatures of the `do_run` and `run` methods differ only in the [tracing](#Trace) parameters.
 
 ### Levels of abstraction
 
@@ -56,17 +56,17 @@ with an LLM on a very generic or even technical level.
 
 Examples for higher level tasks (Use Cases) are:
 
-- Answering a question based on a gievn document: `QA: (Document, Question) -> Answer`
+- Answering a question based on a given document: `QA: (Document, Question) -> Answer`
 - Generate a summary of a given document: `Summary: Document -> Summary`
 
 Examples for lower level tasks are:
 
-- Let the model generate text based on an instruacton and some context: `Instruct: (Context, Instruction) -> Completion`
+- Let the model generate text based on an instruction and some context: `Instruct: (Context, Instruction) -> Completion`
 - Chunk a text in smaller pieces at optimized boundaries (typically to make it fit into an LLM's context-size): `Chunk: Text -> [Chunk]`
 
 ### Composability
 
-Tasks compose. Typically you would build higher level tasks from lower level tasks. Given a task you can draw a dependency graph
+Typically you would build higher level tasks from lower level tasks. Given a task you can draw a dependency graph
 that illustrates which sub-tasks it is using and in turn which sub-tasks they are using. This graph typically forms a hierarchy or
 more general a directed acyclic graph. The following drawing shows this graph for the Intelligence Layer's `RecursiveSummarize`
 task:
@@ -76,8 +76,8 @@ task:
 
 ### Trace
 
-A task implements a workflow. It processes its input, passes it on to sub-tasks, processes the outputs of sub-tasks
-to build its own output. This workflow can be represented in a trace. For this a task's `run` method takes a `Tracer`
+A task implements a workflow. It processes its input, passes it on to sub-tasks, processes the outputs of the sub-tasks
+and builds its own output. This workflow can be represented in a trace. For this a task's `run` method takes a `Tracer`
 that takes care of storing details on the steps of this workflow like the tasks that have been invoked along with their
 input and output and timing information. The following illustration shows the trace of an MultiChunkQa-task:
 
@@ -86,9 +86,9 @@ input and output and timing information. The following illustration shows the tr
 To represent this tracing defines the following concepts:
 
 - A `Tracer` is passed to a task's `run` method and provides methods for opening `Span`s or `TaskSpan`s.
-- A `Span` is a `Tracer` and allows for grouping multiple logs and duration together as a single, logical step in the
+- A `Span` is a `Tracer` and allows to group multiple logs and runtime durations together as a single, logical step in the
   workflow.
-- A `TaskSpan` is a `Span` and allows for grouping multiple logs together, as well as the task's specific input, output.
+- A `TaskSpan` is a `Span` that allows to group multiple logs together with the task's specific input and output.
   An opened `TaskSpan` is passed to `Task.do_run`. Since a `TaskSpan` is a `Tracer` a `do_run` implementation can pass
   this instance on to `run` methods of sub-tasks.
 
@@ -104,7 +104,7 @@ three abstract classes `Tracer`, `Span` and `TaskSpan` needs to be implemented.
 - The `NoOpTracer` can be used when tracing information shall not be stored at all.
 - The `InMemoryTracer` stores all traces in an in memory data structure and is most helpful in tests or
   Jupyter notebooks.
-- The `FileTracer` stores all traces in a jsonl-file.
+- The `FileTracer` stores all traces in a json-file.
 - The `OpenTelemetryTracer` uses an OpenTelemetry
   [`Tracer`](https://opentelemetry-python.readthedocs.io/en/latest/api/trace.html#opentelemetry.trace.Tracer)
   to store the traces in an OpenTelemetry backend.
@@ -127,8 +127,8 @@ The evaluation process helps to:
 
 ### Dataset
 
-The basis of an evaluation is a set of examples for the specific task-type to be evaluated. A single example
-consists out of :
+The basis of an evaluation is a set of examples for the specific task-type to be evaluated. A single `Example`
+consists of:
 
 - an instance of the `Input` for the specific task and
 - optionally an _expected output_ that can be anything that makes sense in context of the specific evaluation (e.g.
@@ -139,6 +139,7 @@ consists out of :
 To enable reproducibility of evaluations datasets are immutable. A single dataset can be used to evaluate all
 tasks of the same type, i.e. with the same `Input` and `Output` types.
 
+
 ### Evaluation Process
 
 The Intelligence Layer supports different kinds of evaluation techniques. Most important are:
@@ -148,18 +149,16 @@ The Intelligence Layer supports different kinds of evaluation techniques. Most i
   case the aggregated result could contain metrics like accuracy which can easily compared with other
   aggregated results.
 - Comparing the individual outputs of different runs (all based on the same dataset)
-  in a single evaluation process and produce as aggregated result a
-  ranking of all runs. This technique is useful when it is hard to come up with an absolute metrics to evaluate
+  in a single evaluation process and produce a ranking of all runs as an aggregated result. This technique is useful when it is hard to come up with an absolute metrics to evaluate
   a single output, but it is easier to compare two different outputs and decide which one is better. An example
   use case could be summarization.
 
 To support these techniques the Intelligence Layer differentiates between 3 consecutive steps:
 
 1. Run a task by feeding it all inputs of a dataset and collecting all outputs
-2. Evaluate the outputs of one or several
-  runs and produce an evaluation result for each example. Typically a single run is evaluated if absolute
+2. Evaluate the outputs of one or several runs and produce an evaluation result for each example. Typically a single run is evaluated if absolute
   metrics can be computed and several runs are evaluated when the outputs of runs shall be compared.
-3. Aggregate the evaluation results of one or several evaluation runs into a single object containing the aggregated
+1. Aggregate the evaluation results of one or several evaluation runs into a single object containing the aggregated
   metrics. Aggregating over several evaluation runs supports amending a previous comparison result with
   comparisons of new runs without the need to re-execute the previous comparisons again.
 
@@ -171,30 +170,27 @@ The following table shows how these three steps are represented in code:
 | 2. Evaluate | `Evaluator` | `EvaluationLogic` | `EvaluationRepository` |
 | 3. Aggregate | `Aggregator` | `AggregationLogic` | `AggregationRepository` |
 
-The column
-- Executor lists concrete implementations provided by the Intelligence Layer.
-- Custom Logic lists abstract classes that need to be implemented with the custom logic.
-- Repository lists abstract classes for storing intermediate results. The Intelligence Layer provides
+Columns explained
+- "Executor" lists concrete implementations provided by the Intelligence Layer.
+- "Custom Logic" lists abstract classes that need to be implemented with the custom logic.
+- "Repository" lists abstract classes for storing intermediate results. The Intelligence Layer provides
   different implementations for these. See the next section for details.
 
 ### Data Storage
 
 During an evaluation process a lot of intermediate data is created before the final aggregated result can be produced.
-To avoid that expensive computations have to be repeated if new results should be produced based on previous ones
+To avoid that expensive computations have to be repeated if new results are to be produced based on previous ones
 all intermediate results are persisted. For this the different executor-classes make use of repositories.
 
 There are the following Repositories:
 
-- The `DatasetRepository` offers methods to manage datasets. The `Runner` uses it to read all examples of a dataset to feed
-  then to the `Task`.
-- The `RunRepository` is responsible for storing a task's output (in form of a `ExampleOutput`) for each example of a dataset
+- The `DatasetRepository` offers methods to manage datasets. The `Runner` uses it to read all `Example`s of a dataset and feeds them to the `Task`.
+- The `RunRepository` is responsible for storing a task's output (in form of an `ExampleOutput`) for each `Example` of a dataset
   which are created when a `Runner`
   runs a task using this dataset. At the end of a run a `RunOverview` is stored containing some metadata concerning the run.
   The `Evaluator` reads these outputs given a list of runs it should evaluate to create an evaluation
-  result for each example of the dataset.
-- The `EvaluationRepository` enables the `Evaluator` to store the individual evaluation result (in form of an `ExampleEvaluation`)
-  for each example and an `EvaluationOverview`
-  and makes them available to the `Aggregator`.
+  result for each `Example` of the dataset.
+- The `EvaluationRepository` enables the `Evaluator` to store the evaluation result (in form of an `ExampleEvaluation`) for each example along with an `EvaluationOverview`. The `Aggregator` uses this repository to read the evaluation results.
 - The `AggregationRepository` stores the `AggregationOverview` containing the aggregated metrics on request of the `Aggregator`.
 
 The following diagrams illustrate how the different concepts play together in case of the different types of evaluations.
@@ -210,7 +206,7 @@ The following diagrams illustrate how the different concepts play together in ca
    `RunRepository` and the corresponding `Example` from the `DatasetRepository` and uses the `EvaluationLogic` to compute an `Evaluation`.
 4. Each `Evaluation` gets wrapped in an `ExampleEvaluation` and stored in the `EvaluationRepository`.
 5. The `Aggregator` reads all `ExampleEvaluation`s for a given evaluation and feeds them to the `AggregationLogic` to produce a `AggregatedEvaluation`.
-6. The `AggregatedEvalution` is wrapped in an `AggregationOverview` and stoed in the `AggregationRepository`.
+6. The `AggregatedEvalution` is wrapped in an `AggregationOverview` and stored in the `AggregationRepository`.
 
 The next diagram illustrates the more complex case of a relative evaluation.
 
@@ -219,13 +215,13 @@ The next diagram illustrates the more complex case of a relative evaluation.
 <figcaption>Process of a relative Evaluation</figcaption>
 </figure>
 
-1. Multiple `Runner`s read the same dataset and produce for different `Task`s corresponding `Output`s.
+1. Multiple `Runner`s read the same dataset and produce the corresponding `Output`s for different `Task`s.
 2. For each run all `Output`s are stored in the `RunRepository`.
-3. The `Evaluator` gets as input previous evaluations (that were produced on basis of the same dataset, but different `Task`s) and the new runs of the previous step.
+3. The `Evaluator` gets as input previous evaluations (that were produced on basis of the same dataset, but by different `Task`s) and the new runs of the current task.
 4. Given the previous evaluations and the new runs the `Evaluator` can read the `ExampleOutput`s of both the new runs
    and the runs associated to previous evaluations, collect all that belong to a single `Example` and pass them
    along with the `Example` to the `EvaluationLogic` to compute an `Evaluation`.
 5. Each `Evaluation` gets wrapped in an `ExampleEvaluation` and is stored in the `EvaluationRepository`.
 6. The `Aggregator` reads all `ExampleEvaluation` from all involved evaluations
    and feeds them to the `AggregationLogic` to produce a `AggregatedEvaluation`.
-7. The `AggregatedEvalution` is wrapped in an `AggregationOverview` and stoed in the `AggregationRepository`.
+7. The `AggregatedEvalution` is wrapped in an `AggregationOverview` and stored in the `AggregationRepository`.
diff --git a/README.md b/README.md
@@ -46,12 +46,12 @@ The environment can be activated via `poetry shell`. See the official poetry doc
 
 ### Getting started with the Jupyter Notebooks
 
-After running the local installation steps, there are two environment variables that have to be set before you can start running the examples.
+After running the local installation steps, you can set whether you are using the Aleph-Alpha API or an on-prem setup via the environment variables.
 
 ---
 **Using the Aleph-Alpha API** \
   \
-You will need an [Aleph Alpha access token](https://docs.aleph-alpha.com/docs/account/#create-a-new-token) to run the examples.
+In the Intelligence Layer the Aleph-Alpha API (`https://api.aleph-alpha.com`) is set as default host URL. However, you will need an [Aleph Alpha access token](https://docs.aleph-alpha.com/docs/account/#create-a-new-token) to run the examples.
 Set your access token with
 
 ```bash
@@ -62,13 +62,13 @@ export AA_TOKEN=<YOUR TOKEN HERE>
 
 **Using an on-prem setup** \
   \
-The default host url in the project is set to `https://api.aleph-alpha.com`. This can be changed by setting the `CLIENT_URL` environment variable:
+In case you want to use an on-prem endpoint you will have to change the host URL by setting the `CLIENT_URL` environment variable:
 
 ```bash
 export CLIENT_URL=<YOUR_ENDPOINT_URL_HERE>
 ```
 
-The program will warn you if no `CLIENT_URL` is explicitly set.
+The program will warn you in case no `CLIENT_URL` is set explicitly set.
 
 ---
 After correctly setting up the environment variables you can run the jupyter notebooks.
@@ -188,10 +188,10 @@ Not sure where to start? Familiarize yourself with the Intelligence Layer using
 If you prefer you can also read about the [concepts](Concepts.md) first.
 
 ## Tutorials
-The tutorials aim to guide you through implementing several common use-cases with the Intelligence Layer. They introduce you to key concepts and enable you to create your own use-cases.
+The tutorials aim to guide you through implementing several common use-cases with the Intelligence Layer. They introduce you to key concepts and enable you to create your own use-cases. In general the tutorials are build in a way that you can simply hop into the topic you are most interested in. However, for starters we recommend to read through the `Summarization` tutorial first. It explains the core concepts of the intelligence layer in more depth while for the other tutorials we assume that these concepts are known.
 
-| Order | Topic              | Description                                          | Notebook 📓                                                      |
-| ----- | ------------------ | ---------------------------------------------------- | --------------------------------------------------------------- |
+| Order | Topic              | Description                                          | Notebook 📓                                                     |
+| ----- | ------------------ |------------------------------------------------------|-----------------------------------------------------------------|
 | 1     | Summarization      | Summarize a document                                 | [summarization.ipynb](./src/examples/summarization.ipynb)       |
 | 2     | Question Answering | Various approaches for QA                            | [qa.ipynb](./src/examples/qa.ipynb)                             |
 | 3     | Classification     | Learn about two methods of classification            | [classification.ipynb](./src/examples/classification.ipynb)     |
@@ -200,7 +200,7 @@ The tutorials aim to guide you through implementing several common use-cases wit
 | 6     | Document Index     | Connect your proprietary knowledge base              | [document_index.ipynb](./src/examples/document_index.ipynb)     |
 | 7     | Human Evaluation   | Connect to Argilla for manual evaluation             | [human_evaluation.ipynb](./src/examples/human_evaluation.ipynb) |
 | 8     | Performance tips   | Contains some small tips for performance             | [performance_tips.ipynb](./src/examples/performance_tips.ipynb) |
-| 9     | Deployment         | Shows how to deploy a Task in a minimal FastAPI app. | [fastapi_example.py](./src/examples/fastapi_example.py)         |
+| 9     | Deployment         | Shows how to deploy a Task in a minimal FastAPI app. | [fastapi_tutorial.md](./src/examples/fastapi_tutorial.md)       |
 
 ## How-Tos
 The how-tos are quick lookups about how to do things. Compared to the tutorials, they are shorter and do not explain the concepts they are using in-depth.