Skip to content

Commit

Permalink
Merge branch 'master' into periodically-check-jupyterub-access
Browse files Browse the repository at this point in the history
  • Loading branch information
mishaschwartz committed Nov 30, 2023
2 parents 7b82aff + 02e22fb commit 2c48690
Show file tree
Hide file tree
Showing 42 changed files with 370 additions and 68 deletions.
6 changes: 3 additions & 3 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.37.1
current_version = 1.39.1
commit = True
tag = False
tag_name = {new_version}
Expand Down Expand Up @@ -30,11 +30,11 @@ search = {current_version}
replace = {new_version}

[bumpversion:file:RELEASE.txt]
search = {current_version} 2023-11-03T16:43:09Z
search = {current_version} 2023-11-29T17:03:07Z
replace = {new_version} {utcnow:%Y-%m-%dT%H:%M:%SZ}

[bumpversion:part:releaseTime]
values = 2023-11-03T16:43:09Z
values = 2023-11-29T17:03:07Z

[bumpversion:file(version):birdhouse/config/canarie-api/docker_configuration.py.template]
search = 'version': '{current_version}'
Expand Down
67 changes: 67 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,73 @@
docker exec jupyterhub rm /persist/jupyterhub_cookie_secret && docker restart jupyterhub
```

[1.39.1](https://github.com/bird-house/birdhouse-deploy/tree/1.39.1) (2023-11-29)
------------------------------------------------------------------------------------------------------------------

## Changes

- Limit usernames in Magpie to match restrictions by Jupyterhub's Dockerspawner
When Jupyterhub spawns a new jupyterlab container, it escapes any non-ascii, non-digit character in the username.
This results in a username that may not match the expected username (as defined by Magpie). This mismatch results in
the container failing to spawn since expected volumes cannot be mounted to the jupyterlab container.
This fixes the issue by ensuring that juptyerhub does not convert the username that is receives from Magpie.
Note that this updates the Magpie version.
[1.39.0](https://github.com/bird-house/birdhouse-deploy/tree/1.39.0) (2023-11-27)
------------------------------------------------------------------------------------------------------------------
## Changes
- Add a Magpie Webhook to create the Magpie resources corresponding to the STAC-API path elements when a `STAC-API`
`POST /collections/{collection_id}` or `POST /collections/{collection_id}/items/{item_id}` request is accomplished.
- When creating the STAC `Item`, the `source` entry in `links` corresponding to a `THREDDS` file on the same instance
is used to define the Magpie `resource_display_name` corresponding to a file to be mapped later on
(eg: a NetCDF `birdhouse/test-data/tc_Anon[...].nc`).
- Checking same instance `source` path is necessary because `STAC` could refer to external assets, and we do not want
to inject Magpie resource that are not part of the active instance where the hook is running.
[1.38.0](https://github.com/bird-house/birdhouse-deploy/tree/1.38.0) (2023-11-21)
------------------------------------------------------------------------------------------------------------------
## Changes
Flexible locations for data served by THREDDS. This PR adds two capabilities:
- Makes it possible to configure all aspects of the two default top-level THREDDS catalogs that has been available on Birdhouse (conventionally referred to as `Birdhouse` and `Datasets` on PAIVCS). This is done by defining the following two sets of new environment variables. The `THREDDS_DATASET_` set of variables are meant to control properties of the `Datasets` catalog:
* THREDDS_DATASET_LOCATION_ON_CONTAINER
* THREDDS_DATASET_LOCATION_ON_HOST
* THREDDS_DATASET_LOCATION_NAME
* THREDDS_DATASET_URL_PATH
The `THREDDS_SERVICE_DATA_` set of variables control properties of the `Birdhouse` catalog.
* THREDDS_SERVICE_DATA_LOCATION_ON_CONTAINER
* THREDDS_SERVICE_DATA_LOCATION_ON_HOST
* THREDDS_SERVICE_DATA_LOCATION_NAME
* THREDDS_SERVICE_DATA_URL_PATH
These new variables are defined in [`thredds/default.env`](./birdhouse/config/thredds/default.env) and included in [`env.local.example`](./birdhouse/env.local.example). Their default values have been chosen to ensure the behaviours of the two catalogs remain unchanged (for reasons of backward compatibility).
- Adds the ability to define additional top-level THREDDS catalogs. This is achieved by introducing the `THREDDS_ADDITIONAL_CATALOG` variable in [`thredds/default.env`](./birdhouse/config/thredds/default.env) that can be used to inject custom XML configuration for a new catalog. This information is picked up by the THREDDS server. An example is provided in [`env.local.example`](./birdhouse/env.local.example).
[1.37.2](https://github.com/bird-house/birdhouse-deploy/tree/1.37.2) (2023-11-10)
------------------------------------------------------------------------------------------------------------------
- Fix `weaver` and `cowbird` inconsistencies for `public` WPS outputs directory handling.
Because `cowbird` needs to mount multiple directories within the user-workspace for `jupyterhub`, it needs to define
a dedicated `public/wps_outputs` sub-directory to distinguish it from other `public` files not part of WPS outputs.
However, for WPS birds, other files than WPS outputs are irrelevant, and are therefore mounted directly in their
container. The variable `PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR` was being misused in the context of `weaver`,
causing WPS output URLs for `public` context to be nested as `/wpsoutputs/weaver/public/wps_outputs/{jobID}`
instead of the intended location `/wpsoutputs/weaver/public/{jobID}`, in contrast to user-context WPS outputs
located under `/wpsoutputs/weaver/users/{userID}/{jobID}`.
Relates to [Ouranosinc/pavics-sdi#314](https://github.com/Ouranosinc/pavics-sdi/pull/314).
[1.37.1](https://github.com/bird-house/birdhouse-deploy/tree/1.37.1) (2023-11-03)
------------------------------------------------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generic variables
override SHELL := bash
override APP_NAME := birdhouse-deploy
override APP_VERSION := 1.37.1
override APP_VERSION := 1.39.1

# utility to remove comments after value of an option variable
override clean_opt = $(shell echo "$(1)" | $(_SED) -r -e "s/[ '$'\t'']+$$//g")
Expand Down
8 changes: 4 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ for a full-fledged production platform.
* - releases
- | |latest-version| |commits-since|

.. |commits-since| image:: https://img.shields.io/github/commits-since/bird-house/birdhouse-deploy/1.37.1.svg
.. |commits-since| image:: https://img.shields.io/github/commits-since/bird-house/birdhouse-deploy/1.39.1.svg
:alt: Commits since latest release
:target: https://github.com/bird-house/birdhouse-deploy/compare/1.37.1...master
:target: https://github.com/bird-house/birdhouse-deploy/compare/1.39.1...master

.. |latest-version| image:: https://img.shields.io/badge/tag-1.37.1-blue.svg?style=flat
.. |latest-version| image:: https://img.shields.io/badge/tag-1.39.1-blue.svg?style=flat
:alt: Latest Tag
:target: https://github.com/bird-house/birdhouse-deploy/tree/1.37.1
:target: https://github.com/bird-house/birdhouse-deploy/tree/1.39.1

.. |readthedocs| image:: https://readthedocs.org/projects/birdhouse-deploy/badge/?version=latest
:alt: ReadTheDocs Build Status (latest version)
Expand Down
2 changes: 1 addition & 1 deletion RELEASE.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.37.1 2023-11-03T16:43:09Z
1.39.1 2023-11-29T17:03:07Z
14 changes: 7 additions & 7 deletions birdhouse/components/cowbird/config/cowbird/config.yml.template
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ handlers:
wps_outputs_dir: ${WPS_OUTPUTS_DIR}
secure_data_proxy_name: ${SECURE_DATA_PROXY_NAME}
# wps_outputs_res_name: ${WPS_OUTPUTS_RES_NAME}
public_workspace_wps_outputs_subdir: ${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
public_workspace_wps_outputs_subdir: ${COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
# notebooks_dir_name: ${NOTEBOOKS_DIR_NAME}
# user_wps_outputs_dir_name: ${USER_WPS_OUTPUTS_DIR_NAME}

Expand Down Expand Up @@ -187,10 +187,10 @@ sync_permissions:
- name: "{outputID}"
type: route
thredds:
# /twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/weaver/catalog.html
# /twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/weaver/{public|<user-id>}/catalog.html
# /twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/weaver/{public|<user-id>}/{job-id}/catalog.html
# /twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/weaver/{public|<user-id>}/{job-id}/{output-file}
# /twitcher/ows/proxy/thredds/catalog/${THREDDS_SERVICE_DATA_URL_PATH}/wps_outputs/weaver/catalog.html
# /twitcher/ows/proxy/thredds/catalog/${THREDDS_SERVICE_DATA_URL_PATH}/wps_outputs/weaver/{public|<user-id>}/catalog.html
# /twitcher/ows/proxy/thredds/catalog/${THREDDS_SERVICE_DATA_URL_PATH}/wps_outputs/weaver/{public|<user-id>}/{job-id}/catalog.html
# /twitcher/ows/proxy/thredds/catalog/${THREDDS_SERVICE_DATA_URL_PATH}/wps_outputs/weaver/{public|<user-id>}/{job-id}/{output-file}
# note: paths start after ows-proxy portion extracted when Twitcher/Magpie resolve between each other
thredds_wps_outputs:
- name: thredds
Expand All @@ -199,7 +199,7 @@ sync_permissions:
# 'catalog' is the file/view format specifier for the rest of the path
# - name: catalog
# type: directory
- name: birdhouse
- name: ${THREDDS_SERVICE_DATA_URL_PATH}
type: directory
- name: wps_outputs
type: directory
Expand All @@ -216,7 +216,7 @@ sync_permissions:
# 'catalog' is the file/view format specifier for the rest of the path
# - name: catalog
# type: directory
- name: birdhouse
- name: ${THREDDS_SERVICE_DATA_URL_PATH}
type: directory
- name: wps_outputs
type: directory
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ services:
jupyterhub:
environment:
WORKSPACE_DIR: ${DATA_PERSIST_SHARED_ROOT}/${USER_WORKSPACES}
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR: ${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR: ${COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
volumes:
- "${DATA_PERSIST_SHARED_ROOT}/${USER_WORKSPACES}:${DATA_PERSIST_SHARED_ROOT}/${USER_WORKSPACES}"
12 changes: 11 additions & 1 deletion birdhouse/components/cowbird/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ EXTRA_VARS='
${COWBIRD_LOG_LEVEL}
${USER_WORKSPACES}
${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
${COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
${SECURE_DATA_PROXY_NAME}
'
# extend the original 'VARS' from 'birdhouse/pavics-compose.sh' to employ them for template substitution
Expand Down Expand Up @@ -52,7 +53,15 @@ export USER_WORKSPACES="user_workspaces"
# Subdirectory containing the hardlinks to the public WPS outputs data
# This directory will be mounted on the JupyterLab instances and is located by default
# in the ${USER_WORKSPACES} directory.
export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs"
# NOTE:
# Most WPS birds do not have a concept of Public vs User-specific outputs.
# These birds will employ the same WPS output directory for all jobs, regardless of the user running it.
# By default, WPS output files will be stored under '${WPS_OUTPUTS_DIR}/<bird>', and must all be considered 'public'.
# Some WPS-capable birds such as Weaver do have a concept of Public/User-context for WPS outputs.
# In this case, files under '${WPS_OUTPUTS_DIR}/<bird>' should have an additional nesting
# with 'public' and 'users/{user_id}'. Variable 'PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR' will be shared for such cases.
export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=public
export COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR='${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}/wps_outputs'

# Default name for the secure-data-proxy service from Magpie.
export SECURE_DATA_PROXY_NAME="secure-data-proxy"
Expand All @@ -62,6 +71,7 @@ COWBIRD_MONGODB_DATA_DIR='${DATA_PERSIST_ROOT}/mongodb_cowbird_persist'
DELAYED_EVAL="
$DELAYED_EVAL
COWBIRD_MONGODB_DATA_DIR
COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
"

# this dependency is only required if the mongo instance is the one provided in config/mongodb.
Expand Down
2 changes: 1 addition & 1 deletion birdhouse/components/cowbird/docker-compose-extra.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ services:
# root user
COWBIRD_FILESYSTEM_ADMIN_UID: 0
COWBIRD_FILESYSTEM_ADMIN_GID: 0
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR: ${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR: ${COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}
SECURE_DATA_PROXY_NAME: ${SECURE_DATA_PROXY_NAME}
# Note that WPS_OUTPUTS_DIR and WORKSPACE_DIR must both point to paths from the same volume.
# This is to allow the creation of hardlinks between the wpsoutputs and the user workspace.
Expand Down
9 changes: 9 additions & 0 deletions birdhouse/components/stac/config/magpie/config.yml.template
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ providers:
c4i: false
type: api
sync_type: api
hooks:
- type: response
path: "/stac/collections/?"
method: POST
target: /opt/birdhouse/src/magpie/hooks/stac_hooks.py:create_collection_resource
- type: response
path: "/stac/collections/[\\w-]+/items/?"
method: POST
target: /opt/birdhouse/src/magpie/hooks/stac_hooks.py:create_item_resource

permissions:
# create a default 'stac' resource under 'stac' service
Expand Down
136 changes: 136 additions & 0 deletions birdhouse/components/stac/config/magpie/stac_hooks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
These hooks will be running within Twitcher, using MagpieAdapter context, applied for STAC requests.
The code below can make use of any package that is installed by Magpie/Twitcher.
.. seealso::
Documentation about Magpie/Twitcher request/response hooks is available here:
https://pavics-magpie.readthedocs.io/en/latest/configuration.html#service-hooks
"""

import re
from typing import TYPE_CHECKING, List, Dict

from magpie.api.management.resource import resource_utils as ru
from magpie.api.requests import get_service_matchdict_checked
from magpie.models import Route
from magpie.utils import get_logger
from magpie.db import get_session_from_other
from ziggurat_foundations.models.services.resource import ResourceService

if TYPE_CHECKING:
from pyramid.response import Response
from sqlalchemy.orm.session import Session

LOGGER = get_logger("magpie.stac")

def create_collection_resource(response):
# type: (Response) -> Response
"""
Create the stac collection resource
"""
request = response.request
body = request.json
collection_id = body["id"]
try:
display_name = extract_display_name(body["links"])
except Exception as exc:
LOGGER.error("Error when extracting display_name from links %s %s", body["links"], str(exc), exc_info=exc)
return response

# note: matchdict reference of Twitcher owsproxy view is used, just so happens to be same name as Magpie
service = get_service_matchdict_checked(request)
# Getting a new session from the request, since the current session found in the request is already handled with his own transaction manager.
session = get_session_from_other(request.db)
try:
# Create the resource tree
create_resource_tree(f"stac/collections/{collection_id}", 0, service.resource_id , session, display_name)
session.commit()

except Exception as exc:
LOGGER.error("Unexpected error while creating the collection %s %s", display_name, str(exc), exc_info=exc)
session.rollback()

return response

def create_item_resource(response):
# type: (Response) -> Response
"""
Create the stac item resource
"""
request = response.request
body = request.json
item_id = body["id"]
try:
display_name = extract_display_name(body["links"])
except Exception as exc:
LOGGER.error("Error when extracting display_name from links %s %s", body["links"], str(exc), exc_info=exc)
return response

# Get the <collection_id> from url -> /collections/{collection_id}/items
collection_id = re.search(r'(?<=collections/)[0-9a-zA-Z_.-]+?(?=/items)', request.url).group()

# note: matchdict reference of Twitcher owsproxy view is used, just so happens to be same name as Magpie
service = get_service_matchdict_checked(request)
# Getting a new session from the request, since the current session found in the request is already handled with his own transaction manager.
session = get_session_from_other(request.db)
try:
# Create the resource tree
create_resource_tree(f"stac/collections/{collection_id}/items/{item_id}", 0, service.resource_id, session, display_name)
session.commit()

except Exception as exc:
LOGGER.error("Unexpected error while creating the item %s %s", display_name, str(exc), exc_info=exc)
session.rollback()

return response

def extract_display_name(links):
# type: (List[Dict[str, str]]) -> str
"""
Extract THREDD path from a STAC links
"""
display_name = None
for link in links:
if link["rel"] == "source":
# Example of title `thredds:birdhouse/CMIP6`
display_name = link["title"]
break
if not display_name:
raise ValueError("The display name was not extracted properly")

return display_name

def create_resource_tree(resource_tree, current_depth, parent_id, session, display_name):
# type: (str, int, int, session, str) -> None
"""
Create the resource tree on Magpie
"""
tree = resource_tree.split("/")
# We are at the max depth of the tree.
if current_depth > len(tree) - 1:
return

resource_name = tree[current_depth]
query = session.query(ResourceService.model).filter(ResourceService.model.resource_name == resource_name, ResourceService.model.parent_id == parent_id)
resource = query.first()

if resource is not None:
# Since the resource exists, we can use its id to create the next resource.
parent_id = resource.resource_id
next_depth = current_depth + 1
create_resource_tree(resource_tree, next_depth, parent_id, session, display_name)

# The resource wasn't found in the current depth, we need to create it.
else:
# Creating the last resource in the tree, we need to use the display_name.
if current_depth == len(tree) - 1:
ru.create_resource(resource_name, display_name, Route.resource_type_name, parent_id, db_session=session)
else:
# Creating the resource somewhere in the middle of the tree before using its id.
node = ru.create_resource(resource_name, None, Route.resource_type_name, parent_id, db_session=session)
parent_id = node.json["resource"]["resource_id"]
next_depth = current_depth + 1
create_resource_tree(resource_tree, next_depth, parent_id, session, display_name)
10 changes: 10 additions & 0 deletions birdhouse/components/stac/config/twitcher/docker-compose-extra.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: "3.4"

services:
# extend twitcher with MagpieAdapter hooks employed for STAC proxied requests
twitcher:
volumes:
# NOTE: MagpieAdapter hooks are defined within Magpie config, but it is actually Twitcher proxy that runs them
# target mount location depends on 'MAGPIE_PROVIDERS_CONFIG_PATH' environment variable that is found under `birdhouse/config/twitcher/docker-compose-extra.yml`
- ./components/stac/config/magpie/config.yml:/opt/birdhouse/src/magpie/config/stac-config.yml:ro
- ./components/stac/config/magpie/stac_hooks.py:/opt/birdhouse/src/magpie/hooks/stac_hooks.py:ro
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ services:
- ./components/weaver/config/proxy/conf.extra-service.d:/etc/nginx/conf.extra-service.d/weaver:ro
# because of mounting path naming restrictions (see note in 'worker' definition),
# we must add the custom path on top of named 'wps_outputs' volume of other birds for the proxy to expose results
- ${WEAVER_WPS_OUTPUTS_DIR}:/pavics-data/wps_outputs/weaver:ro
- ${WEAVER_WPS_OUTPUTS_DIR}:/data/wps_outputs/weaver:ro
links:
- weaver
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ services:
twitcher:
volumes:
# NOTE: MagpieAdapter hooks are defined within Magpie config, but it is actually Twitcher proxy that runs them
# target mount location depends on main docker-compose 'MAGPIE_PROVIDERS_CONFIG_PATH' environment variable
# target mount location depends on 'MAGPIE_PROVIDERS_CONFIG_PATH' environment variable that is found under `birdhouse/config/twitcher/docker-compose-extra.yml`
- ./components/weaver/config/magpie/config.yml:/opt/birdhouse/src/magpie/config/weaver-config.yml:ro
- ./components/weaver/config/magpie/weaver_hooks.py:/opt/birdhouse/src/magpie/hooks/weaver_hooks.py:ro
Loading

0 comments on commit 2c48690

Please sign in to comment.