ci(lint): Add step to check if docstrings match current implementatio…

…n in lint.sh + update docstrings TASK: IL-296
Aleph-Alpha · Mar 27, 2024 · 6326ed4 · 6326ed4
1 parent 2bf4d44
commit 6326ed4
Show file tree

Hide file tree

Showing 9 changed files with 36 additions and 25 deletions.
diff --git a/.darglint2 b/.darglint2
@@ -1,3 +1,3 @@
 [darglint2]
-ignore=DAR003,DAR201,DAR301,DAR401
+ignore=DAR003,DAR201,DAR202,DAR301,DAR401
 docstring_style=google
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,6 +1,6 @@
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v3.4.0
+    rev: v4.5.0
     hooks:
       - id: check-json
         exclude: trace-viewer/
@@ -21,22 +21,26 @@ repos:
         args: ["--profile", "black", "--filter-files"]
         verbose: true
   - repo: https://github.com/psf/black
-    rev: 24.2.0
+    rev: 24.3.0
     hooks:
       - id: black
   # https://black.readthedocs.io/en/stable/integrations/source_version_control.html#version-control-integration
   - repo: https://github.com/psf/black-pre-commit-mirror
-    rev: 24.2.0
+    rev: 24.3.0
     hooks:
       - id: black-jupyter
   - repo: https://github.com/kynan/nbstripout
-    rev: 0.4.0
+    rev: 0.7.1
     hooks:
       - id: nbstripout
         files: ".ipynb"
   - repo: https://github.com/codespell-project/codespell
-    rev: v2.2.4
+    rev: v2.2.6
     hooks:
       - id: codespell
-        args: ["-L", "newyorker,te,responde,ist,als,oder,technik,sie,rouge,unter,juli,fiel,couldn,mke"]
+        args: ["-L", "newyorker,te,responde,ist,als,oder,technik,sie,rouge,unter,juli,fiel,couldn,mke, vor"]
         exclude: '^(poetry\.lock|trace-viewer/.*|tests/connectors/retrievers/test_document_index_retriever\.py|src/intelligence_layer/use_cases/qa/multiple_chunk_qa.py|src/intelligence_layer/use_cases/summarize/.*|tests/connectors/retrievers/test_document_index_retriever\.py|src/intelligence_layer/use_cases/classify/keyword_extract.py|tests/use_cases/summarize/test_single_chunk_few_shot_summarize.py|tests/use_cases/summarize/very_long_text.txt)$'
+  - repo: https://github.com/akaihola/darglint2
+    rev: v1.8.2
+    hooks:
+      - id: darglint2
diff --git a/Concepts.md b/Concepts.md
@@ -153,7 +153,7 @@ The Intelligence Layer supports different kinds of evaluation techniques. Most i
   a single output, but it is easier to compare two different outputs and decide which one is better. An example
   use case could be summarization.
 
-To support these techniques the Intelligence Layer differantiates between 3 consecutive steps:
+To support these techniques the Intelligence Layer differentiates between 3 consecutive steps:
 
 1. Run a task by feeding it all inputs of a dataset and collecting all outputs
 2. Evaluate the outputs of one or several
@@ -197,7 +197,7 @@ There are the following Repositories:
   and makes them available to the `Aggregator`.
 - The `AggregationRepository` stores the `AggregationOverview` containing the aggregated metrics on request of the `Aggregator`.
 
-The following diagramms illustrate how the different concepts play together in case of the different types of evaluations.
+The following diagrams illustrate how the different concepts play together in case of the different types of evaluations.
 
 <figure>
 <img src="./assets/AbsoluteEvaluation.drawio.svg">

diff --git a/pyproject.toml b/pyproject.toml
@@ -72,6 +72,8 @@ filterwarnings = [
 skip = "*/__init__.py,.venv/*,*/node_modules/*"
 ignore = "E501,E203"
 
+[tool.darglint2]
+
 
 [tool.pylama.linter.mccabe]
 max-complexity = "11"
diff --git a/scripts/lint.sh b/scripts/lint.sh
@@ -3,4 +3,3 @@
 poetry run pre-commit run --all-files
 poetry run mypy
 poetry run pylama
-poetry run darglint2 -v2 src
diff --git a/src/examples/performance_tips.ipynb b/src/examples/performance_tips.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "d8767b2a",
+   "id": "0",
    "metadata": {},
    "source": [
     "# How to get more done in less time\n",
@@ -14,7 +14,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e04cb25b",
+   "id": "1",
    "metadata": {},
    "source": [
     "## A single long running task\n",
@@ -28,7 +28,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e7fbae35",
+   "id": "2",
    "metadata": {},
    "source": [
     "## Running one task multiple times\n",
@@ -40,7 +40,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "04dac517",
+   "id": "3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -71,7 +71,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f58f359a",
+   "id": "4",
    "metadata": {},
    "source": [
     "## Running several tasks at the same time\n",
@@ -82,7 +82,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8959fcec-dc54-4137-9cb8-3a9c70d6a3d0",
+   "id": "5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -104,7 +104,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4e846c9c",
+   "id": "6",
    "metadata": {},
    "source": [
     "<a id='submit_example'></a>\n",
@@ -115,7 +115,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6c88c3a2",
+   "id": "7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -131,7 +131,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "345244a1",
+   "id": "8",
    "metadata": {},
    "source": [
     "`ThreadPool` can easily be used via the function `.map`. This processes a list of jobs in order and outputs the results once all jobs are done.  \n",
@@ -142,7 +142,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6b71469e",
+   "id": "9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -158,7 +158,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a786e543",
+   "id": "10",
    "metadata": {},
    "source": [
     "`ThreadPool.map` can also be used with `Task.run_concurrently()` in which case the creation of the jobs becomes slightly easier."
@@ -167,7 +167,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "de3fe114",
+   "id": "11",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -184,7 +184,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4e775da7",
+   "id": "12",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-warning\">\n",

diff --git a/src/examples/qa.ipynb b/src/examples/qa.ipynb
@@ -69,7 +69,6 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [

diff --git a/tests/conftest.py b/tests/conftest.py
@@ -37,7 +37,11 @@ def token() -> str:
 
 @fixture(scope="session")
 def client(token: str) -> AlephAlphaClientProtocol:
-    """Provide fixture for api."""
+    """Provide fixture for api.
+
+    Args:
+        token: AA Token
+    """
     return LimitedConcurrencyClient(Client(token), max_concurrency=10)
 
 

diff --git a/tests/core/test_echo.py b/tests/core/test_echo.py
@@ -121,6 +121,9 @@ def test_overlapping_tokens_generate_correct_tokens(echo_task: Echo) -> None:
     """This test checks if the echo task correctly tokenizes the expected completion separately
     The two tokens when tokenized together will result in a combination of the end of the first token
     and the start of the second token. This is not the expected behaviour.
+
+    Args:
+        echo_task: Fixture used for this test
     """
     token1 = "ĠGastronomie"
     token2 = "Baby"