Add videos to README (#194)

* update * update * Update README.md * update * update * update * update * update * update * Update README.md * update
kaiko-ai · Oct 4, 2023 · a97036e · a97036e
1 parent 73c0513
commit a97036e
Show file tree

Hide file tree

Showing 8 changed files with 767 additions and 73 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -48,4 +48,4 @@ jobs:
           coverage report -m --fail-under 100
       - name: Run notebooks
         run: |
-          for FILE in docs/source/*.ipynb; do papermill $FILE output.json -k python3; done
+          for FILE in docs/*/*.ipynb; do papermill $FILE output.json -k python3; done
diff --git a/README.md b/README.md
@@ -0,0 +1,75 @@
+# Typedspark: column-wise type annotations for pyspark DataFrames
+
+We love Spark! But in production code we're wary when we see:
+
+```python
+from pyspark.sql import DataFrame
+
+def foo(df: DataFrame) -> DataFrame:
+    # do stuff
+    return df
+```
+
+Because… How do we know which columns are supposed to be in ``df``?
+
+Using ``typedspark``, we can be more explicit about what these data should look like.
+
+```python
+from typedspark import Column, DataSet, Schema
+from pyspark.sql.types import LongType, StringType
+
+class Person(Schema):
+    id: Column[LongType]
+    name: Column[StringType]
+    age: Column[LongType]
+
+def foo(df: DataSet[Person]) -> DataSet[Person]:
+    # do stuff
+    return df
+```
+The advantages include:
+
+* Improved readibility of the code
+* Typechecking, both during runtime and linting
+* Auto-complete of column names
+* Easy refactoring of column names
+* Easier unit testing through the generation of empty ``DataSets`` based on their schemas
+* Improved documentation of tables
+
+## Demo videos
+
+### IDE demo
+
+https://github.com/kaiko-ai/typedspark/assets/47976799/e6f7fa9c-6d14-4f68-baba-fe3c22f75b67
+
+You can find the corresponding code [here](docs/videos/ide.ipynb).
+
+### Jupyter / Databricks notebooks demo
+
+https://github.com/kaiko-ai/typedspark/assets/47976799/39e157c3-6db0-436a-9e72-44b2062df808
+
+You can find the corresponding code [here](docs/videos/notebook.ipynb).
+
+## Installation
+
+You can install ``typedspark`` from [pypi](https://pypi.org/project/typedspark/) by running:
+
+```bash
+pip install typedspark
+```
+By default, ``typedspark`` does not list ``pyspark`` as a dependency, since many platforms (e.g. Databricks) come with ``pyspark`` preinstalled.  If you want to install ``typedspark`` with ``pyspark``, you can run:
+
+```bash
+pip install "typedspark[pyspark]"
+```
+
+## Documentation
+Please see our documentation on [readthedocs](https://typedspark.readthedocs.io/en/latest/index.html).
+
+## FAQ
+
+**I found a bug! What should I do?**</br>
+Great! Please make an issue and we'll look into it.
+
+**I have a great idea to improve typedspark! How can we make this work?**</br>
+Awesome, please make an issue and let us know!
diff --git a/README.rst b/README.rst
diff --git a/docs/run_notebooks.sh b/docs/run_notebooks.sh
@@ -1,4 +1,4 @@
-for FILE in docs/source/*.ipynb; do 
+for FILE in docs/*/*.ipynb; do 
     papermill $FILE $FILE; 
     python docs/remove_metadata.py $FILE;
 done
diff --git a/docs/source/README.rst b/docs/source/README.rst
@@ -1 +1,69 @@
-.. include:: ../../README.rst
+===============================================================
+Typedspark: column-wise type annotations for pyspark DataFrames
+===============================================================
+
+We love Spark! But in production code we're wary when we see:
+
+.. code-block:: python
+
+    from pyspark.sql import DataFrame
+
+    def foo(df: DataFrame) -> DataFrame:
+        # do stuff
+        return df
+
+Because… How do we know which columns are supposed to be in ``df``?
+
+Using ``typedspark``, we can be more explicit about what these data should look like.
+
+.. code-block:: python
+
+    from typedspark import Column, DataSet, Schema
+    from pyspark.sql.types import LongType, StringType
+
+    class Person(Schema):
+        id: Column[LongType]
+        name: Column[StringType]
+        age: Column[LongType]
+
+    def foo(df: DataSet[Person]) -> DataSet[Person]:
+        # do stuff
+        return df
+
+The advantages include:
+
+* Improved readibility of the code
+* Typechecking, both during runtime and linting
+* Auto-complete of column names
+* Easy refactoring of column names
+* Easier unit testing through the generation of empty ``DataSets`` based on their schemas
+* Improved documentation of tables
+
+Installation
+============
+
+You can install ``typedspark`` from `pypi <https://pypi.org/project/typedspark/>`_ by running:
+
+.. code-block:: bash
+
+    pip install typedspark
+
+By default, ``typedspark`` does not list ``pyspark`` as a dependency, since many platforms (e.g. Databricks) come with ``pyspark`` preinstalled.  If you want to install ``typedspark`` with ``pyspark``, you can run:
+
+.. code-block:: bash
+
+    pip install "typedspark[pyspark]"
+
+
+Documentation
+=================
+Please see our documentation on `readthedocs <https://typedspark.readthedocs.io/en/latest/index.html>`_.
+
+FAQ
+===
+
+| **I found a bug! What should I do?**
+| Great! Please make an issue and we'll look into it.
+|
+| **I have a great idea to improve typedspark! How can we make this work?**
+| Awesome, please make an issue and let us know!