Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring documentation up to date #906

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ sphinx:
formats: []

build:
image: testing # For Python 3.9
# Not needed yet because of conf.py hack:
# apt_packages:
# - libgdal20
# - libproj13
os: ubuntu-24.04
apt_packages:
# To allow running autodoc
- libgdal20
- libproj13
tools:
python: "3.11"

python:
version: 3.9
install:
- requirements: src/requirements.txt
- requirements: dev-docs/requirements.txt
9 changes: 3 additions & 6 deletions dev-docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
Sphinx >= 5.3.0
sphinx-rtd-theme >= 1.1.1
sphinxcontrib-django == 0.5.1

# And the rest of the project to handle autodoc
-r ../src/requirements.txt
Sphinx >= 8.1.3
sphinx-rtd-theme >= 3.0.1
sphinxcontrib-django == 2.5
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
schematools.datasetcollection module
dso\_api.dynamic\_api.filters module
====================================

.. automodule:: schematools.datasetcollection
.. automodule:: dso_api.dynamic_api.filters
:members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions dev-docs/source/api/dso_api.dynamic_api.filterset.rst

This file was deleted.

2 changes: 1 addition & 1 deletion dev-docs/source/api/dso_api.dynamic_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dso\_api.dynamic\_api package
.. toctree::
:maxdepth: 1

dso_api.dynamic_api.filterset
dso_api.dynamic_api.filters
dso_api.dynamic_api.openapi
dso_api.dynamic_api.permissions
dso_api.dynamic_api.remote
Expand Down
38 changes: 38 additions & 0 deletions dev-docs/source/api/dso_api.dynamic_api.views.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ dso\_api.dynamic\_api.views package

.. automodule:: dso_api.dynamic_api.views

.. autoclass:: dso_api.dynamic_api.views.APIIndexView
:show-inheritance:


REST API
--------

Expand Down Expand Up @@ -32,3 +36,37 @@ WFS API
.. autoclass:: dso_api.dynamic_api.views.wfs.AuthenticatedFeatureType
:show-inheritance:
:members:


MVT API
-------

.. automodule:: dso_api.dynamic_api.views.mvt

.. autoclass:: dso_api.dynamic_api.views.DatasetMVTIndexView
:show-inheritance:

.. autoclass:: dso_api.dynamic_api.views.DatasetMVTSingleView
:show-inheritance:
:members:
:undoc-members:

.. autoclass:: dso_api.dynamic_api.views.DatasetMVTView
:show-inheritance:
:members:


Documentation
-------------

.. automodule:: dso_api.dynamic_api.views.doc

.. autoclass:: dso_api.dynamic_api.views.DocsOverview
:show-inheritance:
:members:

.. autoclass:: dso_api.dynamic_api.views.DatasetDocView
:show-inheritance:

.. autoclass:: dso_api.dynamic_api.views.DatasetWFSDocView
:show-inheritance:
7 changes: 7 additions & 0 deletions dev-docs/source/api/rest_framework_dso.embedding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
rest\_framework\_dso.embedding module
=====================================

.. automodule:: rest_framework_dso.embedding
:members:
:undoc-members:
:show-inheritance:
16 changes: 0 additions & 16 deletions dev-docs/source/api/rest_framework_dso.filters.rst

This file was deleted.

7 changes: 7 additions & 0 deletions dev-docs/source/api/rest_framework_dso.iterators.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
rest\_framework\_dso.iterators module
=====================================

.. automodule:: rest_framework_dso.iterators
:members:
:undoc-members:
:show-inheritance:
3 changes: 2 additions & 1 deletion dev-docs/source/api/rest_framework_dso.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ rest\_framework\_dso package
:maxdepth: 1

rest_framework_dso.crs
rest_framework_dso.embedding
rest_framework_dso.exceptions
rest_framework_dso.fields
rest_framework_dso.filters
rest_framework_dso.iterators
rest_framework_dso.openapi
rest_framework_dso.pagination
rest_framework_dso.parsers
Expand Down
6 changes: 6 additions & 0 deletions dev-docs/source/api/schematools.permissions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
schematools.permissions module
=============================

.. automodule:: schematools.permissions
:members:
:undoc-members:
2 changes: 1 addition & 1 deletion dev-docs/source/api/schematools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ schematools package
.. toctree::
:maxdepth: 1

schematools.datasetcollection
schematools.factories
schematools.loaders
schematools.naming
schematools.permissions
schematools.types
schematools.validation
28 changes: 24 additions & 4 deletions dev-docs/source/auth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The schema definitions can add an ``auth`` field on various levels:
The absence of an ``auth`` field makes a resource publicly available.

At every level, the ``auth`` field contains a list of *scopes*.
The JWT token of the request must contain one of these scopes to access the resource.
The JWT token of the request must contain *at least one* of these scopes to access the resource.

When there is a scope at both the dataset, table and field level
these should *all* be satisfied to have access to the field.
Expand All @@ -44,6 +44,24 @@ the field is omitted from the response.
Sometimes it's not possible to remove a field (for example, a geometry field for Mapbox Vector Tiles).
In that case, the endpoints produces a HTTP 403 error to completely deny access.

.. note::
The schema validation also require an ``authReason`` to be present when ``auth`` is used.
Government data is expected to be public, unless there is a valid reason for it.
The ``authReason`` field forces schema authors to consider why data has to be restricted.

Restricting Querying
~~~~~~~~~~~~~~~~~~~~

Besides the ``auth`` field, the ``filterAuth`` attribute allows restricting queries for a field.
This feature can be utilized to make it *harder* to query for a particular field.
Let's say, avoid retrieving all properties owned by a real estate owner.

.. warning::

If someone manages to dump the whole table, they can off course still query everything within their local copy.
Hence, it is generally better to restrict access to a field entirely using ``auth`` instead.
The ``filterAuth`` feature is useful for well-monitored internal data, that is already protected using the ``auth`` field.

Profiles
~~~~~~~~

Expand Down Expand Up @@ -137,9 +155,10 @@ See the :doc:`wfs` documentation for more details.
When changing the authorization logic, make sure to test the WFS server endpoint too.
While most logic is shared, it's important to double-check no additional data is exposed.

.. _create-test-tokens:

Testing
-------
Creating Test Tokens
--------------------

When testing datasets with authorization from the command line
you can use the `maketoken` management command, which generates
Expand All @@ -150,7 +169,8 @@ This requires DSO-API to be installed in the current virtualenv
After setting the latter and getting a token with
::

export PUB_JWKS="$(cat jwks_test.json)" # in src/
cd src/
export PUB_JWKS="$(cat jwks_test.json)"
token=$(python manage.py maketoken BRK/RO BRK/RS BRK/RSN)

you can issue a curl command such as
Expand Down
15 changes: 1 addition & 14 deletions dev-docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,12 @@
from datetime import date

import django
from sphinx.ext.autodoc.mock import _MockModule

sys.path.insert(0, os.path.abspath("../../src"))
os.environ["DJANGO_DEBUG"] = "false"
os.environ["DJANGO_SETTINGS_MODULE"] = "dso_api.settings"
os.environ["SCHEMA_URL"] = "https://schemas.data.amsterdam.nl/"

# At readthedocs, GDAL is not part of the build container.
# Feature request here: https://github.com/readthedocs/readthedocs.org/issues/8160
# The workaround to use 'autodoc_mock_imports' doesn't work either, and is applied too late.
# Instead, the internal machinery of 'autodoc_mock_imports' is reused here to avoid GDAL imports.


class GDALMockModule(_MockModule):
GDAL_VERSION = (3, 0)


sys.modules["django.contrib.gis.geos.libgeos"] = _MockModule("django.contrib.gis.geos.libgeos")
sys.modules["django.contrib.gis.gdal.libgdal"] = GDALMockModule("django.contrib.gis.gdal.libgdal")

django.setup()


Expand Down Expand Up @@ -95,6 +81,7 @@ class GDALMockModule(_MockModule):
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "/")

intersphinx_mapping = {
"python": ("https://docs.python.org/3/", None),
Expand Down
38 changes: 38 additions & 0 deletions dev-docs/source/database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Database Notes
==============

Database Roles
--------------

The end-user context is provided to the database. This helps:

* Restrict table/field access on a database level using PostgreSQL roles.
* The database logs show who performed a query (by setting application name).
* The application can't accidentally query sensitive data (by switching roles).

For every internal user, there should be a :samp:`{username}_role` present in the database.
This is created by our internal *dp-infra* repository.

When such user is not present, all ``@amsterdam.nl`` addresses will fallback to an
internal ``medewerker_role``. The other accounts fallback to a ``anonymous_role``.

Switching Roles
~~~~~~~~~~~~~~~

The application user has been granted a role that includes *all* user roles
with ``NOINHERIT``. This way, the application-user can perform a ``SET ROLE`` command,
to switch the user role based on the session. There is a separate role for anonymous access.

The application-user is configured to switch to a role that
has sufficient permission to read metadata about datasets after ``LOGIN``

Note that when switching to a role, another ``SET ROLE`` command is still possible
because the current user doesn't change; only the current role does.

Installing Roles
~~~~~~~~~~~~~~~~

The ``schema permission apply`` command will parse all Amsterdam Schema files,
and install the database roles and grants for all permission types.

Each user-role becomes a member of these groups, based on their membership in Active Directory.
2 changes: 1 addition & 1 deletion dev-docs/source/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ The following features are implemented in the application:
* Browsable API (default for browsers).
* Azure BLOB fields for large documents.
* Internal schema reload endpoint (though unused).
* Database routing to read tables for other databases (``DATABASE_SCHEMAS`` setting).
* :doc:`Database roles to have per-user permissions <database>`.
15 changes: 7 additions & 8 deletions dev-docs/source/howto/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,7 @@ Database Setup
--------------

DSO-API talks to a PostgreSQL instance that contains its data.
Normally, you should use the one from the dataservices-airflow project.
If you don't already have it running, do::

git clone https://github.com/Amsterdam/dataservices-airflow
cd dataservices-airflow
docker-compose up dso_database

Then point to that PostgreSQL in the environment::
The database endpoint can be configured in the environment::

DATABASE_URL=postgres://dataservices:insecure@localhost:5416/dataservices

Expand All @@ -49,6 +42,7 @@ Then point to that PostgreSQL in the environment::
Add a ``.envrc`` to your project folder, and direnv ensures the proper
environment variables are loaded when you cd into your project directory.


Using a Virtual Machine
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -119,6 +113,11 @@ Then start DSO-API:

The API can now be accessed at: http://localhost:8000.

.. tip::

If you need a token to access a view,
it can be :ref:`generated <create-test-tokens>` using the ``manage.py maketoken`` command.


API key middleware
------------------
Expand Down
5 changes: 3 additions & 2 deletions dev-docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ A simplified diagram of the main project dependencies:
schematools_contrib_django [label="schematools.contrib.django"]

ams [label="Amsterdam Schema", shape=note]
airflow [label="Airflow", shape=cylinder]
databricks [label="Databricks", shape=cylinder]

schematools_contrib_django -> dso_api
django -> drf
Expand All @@ -54,7 +54,7 @@ A simplified diagram of the main project dependencies:
django -> schematools_contrib_django
schematools -> schematools_contrib_django
ams -> dso_api [style=dotted, label="import"]
airflow -> dso_api [style=dotted, label="data"]
databricks -> dso_api [style=dotted, label="data"]
}

.. toctree::
Expand All @@ -66,6 +66,7 @@ A simplified diagram of the main project dependencies:
features
dynamic_models
dynamic_api
database
remote
temporal
streaming
Expand Down
18 changes: 5 additions & 13 deletions dev-docs/source/streaming.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,16 +172,8 @@ on the embedded section to avoid many repeated queries.
Prefetching Optimization
~~~~~~~~~~~~~~~~~~~~~~~~

One problem with ``QuerySet.iterator()`` is that it's incompatible with ``QuerySet.prefetch_related()``.
This happens because ``prefetch_related()`` reads over the internal results to collect all
identifiers that need to be "prefetched" with a single query.

To have the best of both words, the ``ChunkedQuerySetIterator`` avoids this problem by reading
the table in chunks of 1000 records. For every batch, records are prefetched and given to
the next generator. It also tracks the most recently retrieved prefetches so the next batch
likely doesn't need an extra prefetch. But even when it does,
this is still better then no having prefetching at all.

Also note that internally, Django's ``QuerySet.iterator()`` may still request 1000 records from the
database cursor at once. Hence, the ``ChunkedQuerySetIterator`` also follows this pattern
to request the exact same amount of records.
Before Django 4.1, using ``QuerySet.iterator()`` was incompatible with ``QuerySet.prefetch_related()``.
This was fixed by letting Django fetch the results in chunks and perform ``prefetch_related()`` on each chunk to retrieve related objects.

However, this optimization is avoided here as our ``ChunkedQuerySetIterator`` has more optimizations.
It also tracks the most recently retrieved prefetches so the next batch likely doesn't need an extra prefetch.
1 change: 1 addition & 0 deletions docs
Loading