Skip to content

Commit

Permalink
[ENH] Provide instructions for federation API (#133)
Browse files Browse the repository at this point in the history
* Basic first page

* Add some query-tool tips

* Rename pages

* Updated default deployment instruction

* Add nicer inline notes

Co-authored-by: Alyssa Dai <[email protected]>

* Clarify federation

- terminology
- when not to federate

* Reformatting and typo fixes

* Update local federation figure

* Promote API updates to subsection

---------

Co-authored-by: Alyssa Dai <[email protected]>
  • Loading branch information
surchs and alyssadai authored Nov 28, 2023
1 parent 09d46fd commit c9edf14
Show file tree
Hide file tree
Showing 4 changed files with 235 additions and 40 deletions.
140 changes: 140 additions & 0 deletions docs/federate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
## When to use local query federation
There are two main reasons to deploy local query federation:

- **Case 1**: one-way federation. You have (at least) one [local neurobagel
node](infrastructure.md) and you want your users to be able to search
the data in the local node alongside all the publicly
visible data in the neurobagel network.
- **Case 2**: internal federation. You have two or more local neurobagel
nodes (e.g. for data from different groups in your institute)
and you want your local users to search across all of them.

![Local federation scenarios](imgs/local_federation_architecture.jpg)

Note that these cases are not mutually exclusive.
Any local neurobagel nodes you deploy will only be visible to users
inside of your local network (internal federation).

## When not to use local query federation
Query federation is not necessary, if you:

- **only want to query public neurobagel nodes**:
Existing public nodes in the neurobagel network are accessible
to everyone via our public query tool (e.g. on [query.neurobagel.org](https://query.neurobagel.org/)),
meaning you can run federated queries over these graph databases without any additional local setup.
- **you only want to search a single neurobagel node**:
If you only have one local node that you want to query,
it is easier to directly query the node-API of this node.
In that case, all you have to do is follow the [deployment instructions
for a neurobagel node](infrastructure.md) and you are good to go.

## Setting up for local federation
Federated graph queries in neurobagel are provided by the federation API (`f-API`) service.
The neurobagel `f-API` takes a single user query and then sends it to every
neurobagel node API (`n-API`) it is aware of, collects and combinesthe responses,
and sends them back to the user as a single answer.

!!! note

Make sure you have at least one [local `n-API` configured and running](infrastructure.md)
before you set up local federation. If you do not have any local
`n-APIs` to federate over, you can just use our public query tool directly at [query.neurobagel.org](https://query.neurobagel.org/).

In your command line, create and navigate to a new directory where you will keep the configuration
files for your new `f-API`. In this directory, create two files:

### `fed.env` environment file

Create a text file called `fed.env` to hold environment variables needed for the `f-API` deployment.
Let's assume there are two local nodes already running on different servers of your institutional network, and you want to set up federation across both nodes:

- a node named `"node_archive"` running on your local computer on port `8000` and
- a node named `"node_recruitment"` running on a different computer with the local IP `192.168.0.1`, listening on the default http port `80`.
In your `fed.env` file you would configure this as follows:

``` {.bash .annotate title="docker-compose.yml"}
# Configuration for f-API
# List of known local node APIs: (node_URL, node_NAME)
LOCAL_NB_NODES=(http://localhost:8000, node_archive) (http://192.168.0.1, node_recruitment)
# Define the port that the f-API will run on INSIDE the docker container (default 8000)
NB_API_PORT=8000
# Define the port that the f-API will be exposed on to the host computer (and likely the outside network)
NB_API_PORT_HOST=8080
# Chose the docker image tag of the f-API (default latest)
NB_API_TAG=latest

# Configuration for query tool
# Define the URL of the f-API as it will appear to a user
API_QUERY_URL=http://localhost:8080 # (1)!
# Chose the docker image tag of the query tool (default latest)
NB_QUERY_TAG=latest
# Chose the port that the query tool will be exposed on the host and likely the network (default 3000)
NB_QUERY_PORT_HOST=3000
```

1. When a user users the graphical query tool to query your
f-API, these requests will be sent from the users machine,
not from the machine hosting the query tool.

Make sure you set the `API_QUERY_URL` in your `fed.env`
as it will appear to a user on their own machine
- otherwise the request will fail..

Each node to be federated over is described in the variable `LOCAL_NB_NODES` by a comma-delimited tuple of the form `(node_URL, node_NAME)`.

You can add one or more local nodes to the list of nodes known to your `f-API` in this way.
Just adjust the above code snippet according to your own deployment, and store it in a file called `fed.env`.


### `docker-compose.yml` docker config file

Create a second file called `docker-compose.yml`.
This file describes the required services, ports and paths
to launch the `f-API` together with a connected query tool.

!!! danger "Make sure you have a recent version of docker compose installed"

Some Linux distributions come with outdated versions of `docker` and
`docker compose` installed. Please make sure you install `docker`
as described in the [official documentation](https://docs.docker.com/engine/install/).

Copy the following snippet into your `docker-compose.yml` file.
You should not have to change anything about this file.
All local configuration changes are done in the `fed.env` file.

``` {.yaml .annotate title="docker-compose.yml"}
version: "3.8"
services:
federation:
image: "neurobagel/federation_api:${NB_API_TAG:-latest}"
ports:
- "${NB_API_PORT_HOST:-8000}:${NB_API_PORT:-8000}"
environment:
- LOCAL_NB_NODES=${LOCAL_NB_NODES} # (1)!
- NB_API_PORT=${NB_API_PORT:-8000}
query:
image: "neurobagel/query_tool:${NB_QUERY_TAG:-latest}"
ports:
- "${NB_QUERY_PORT_HOST:-3000}:3000"
environment:
- API_QUERY_URL=${API_QUERY_URL:-http://localhost:8000/}
```

1. We maintain a list of public neurobagel nodes
[here](https://github.com/neurobagel/menu/blob/main/node_directory/neurobagel_public_nodes.json).
By default every new `f-API` will lookup this list
on startup and include it in the list of nodes to
federate over.
This also means that you do not have to manually
configure public nodes, i.e. you **do not have to explicitly add them** to the `LOCAL_NB_NODES` variable) in your `fed.env` file.


## Launch f-API and query tool
Once you have created your `fed.env` and `docker-compose.yml` files
as described above, you can simply launch the services by running

`docker compose --env-file fed.env up -d`

from the same directory.
Binary file added docs/imgs/local_federation_architecture.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 93 additions & 39 deletions docs/infrastructure.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# SysAdmin
These instructions are for a sysadmin looking to
deploy a new Neurobagel node locally in an institute or lab.
A local **neurobagel node** includes the **neurobagel API** and
a **graph backend** to store the harmonized metadata.

## Introduction
These instructions are for a sysadmin looking to deploy Neurobagel locally in an institute or lab.
A local neurobagel deployment includes the neurobagel API,
a graph backend to store the harmonized metadata,
and optionally a locally hosted graphical query interface.
To make searching the neurobagel node easier,
you can optionally also set up
a **[locally hosted graphical query interface](#deploy-a-graphical-query-tool).**

![The neurobagel API and graph backend](imgs/nb_architecture.jpg)

Expand Down Expand Up @@ -119,7 +120,7 @@ Below are all the possible Neurobagel environment variables that can be set in `

_** `NB_GRAPH_ADDRESS` should not be changed from its default value (`graph`) when using docker compose as this corresponds to the preset container name of the graph database server within the docker compose network._

_&Dagger; See section [Using a graphical query tool to send API requests](#a-note-on-using-a-graphical-query-tool-to-send-api-requests)_
_&Dagger; See section [Deploy a graphical query tool](#deploy-a-graphical-query-tool)_


For a local deployment, we recommend to **explicitly set** at least the following variables in `.env`
Expand All @@ -142,35 +143,6 @@ For a local deployment, we recommend to **explicitly set** at least the followin

For more information, see [Docker's environment variable precedence](https://docs.docker.com/compose/environment-variables/envvars-precedence/).

### A note on using a graphical query tool to send API requests
The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request).

However, in many cases you may want to make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool).
To do so, you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`.
For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md).

For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)).

??? example "More examples of `NB_API_ALLOWED_ORIGINS`"
``` bash title=".env"
# do not allow requests from any frontend origins
NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file

# allow requests from only one origin
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org"

# allow requests from 3 different origins
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000"

# allow requests from any origin - use with caution
NB_API_ALLOWED_ORIGINS="*"
```

??? note "For more technical deployments using NGINX"

If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API.
For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/.

### Docker Compose

To spin up the API and graph backend containers using Docker Compose,
Expand All @@ -189,9 +161,6 @@ Or, if you want to ensure you always pull the latest Docker images first:
docker compose pull && docker compose up -d
```

By default, this will also deploy a local version of the [Neurobagel graphical query tool](https://github.com/neurobagel/query-tool).
If using the default port mappings, you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running.

## Setup for the first run

When you launch the graph backend for the first time,
Expand Down Expand Up @@ -611,3 +580,88 @@ and click "Try it out" and then "Execute" to execute a query.
!!! note
For very large databases, requests to the API using the interactive docs UI may be very slow or time out.
If this prevents test queries from succeeding, try setting more parameters to enable an example response from the graph, or use a `curl` request instead.


## Deploy a graphical query tool
To give your users an easy, graphical way to
query your new local neurobagel node,
you have two options:

### As part of local federation
Use this option if any of the following apply! You:

- already have deployed other local neurobagel nodes
that you want your users to query alongside the new node
- want your users to be able to query
all public neurobagel nodes together with your new node
- plan on adding more local neurobagel nodes in the
near future that you will want to query alongside your newly created node

In this case, skip directly to the page on
setting up [local query federation](federate.md).

### As a standalone service
Use this option if you

- plan on only deploying a single node
- want your users to only search data
in the new node you deployed

In this case, you need to deploy the query tool
as a standalone docker container.


```bash
docker run -d -p 3000:3000 --env API_QUERY_URL=http://localhost:8000/ --name query_tool neurobagel/query_tool:latest
```

??? todo

Update docker example to use a specific version
once https://github.com/neurobagel/planning/issues/64
is closed.

Make sure to replace the value of `API_QUERY_URL` with the `IP:PORT` or domain name of the
new neurobagel node-API you just deployed!

If using the default port mappings for the query tool (`-p 3000:3000` in above command),
you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running.

To verify the exact configuration that your new docker
container is running with (e.g. for debugging),
you can run

```bash
docker inspect query_tool
```

### Updating your API configuration
If deploying the query tool as a standalone service for the local node you have just created, you must ensure the `NB_API_ALLOWED_ORIGINS` variable is correctly set in the [`.env` file configuration for your node API](#set-the-environment-variables).
The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request).

To make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool),
you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`.
For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md).

For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)).

!!! example "More examples of `NB_API_ALLOWED_ORIGINS`"

``` bash title=".env"
# do not allow requests from any frontend origins
NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file

# allow requests from only one origin
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org"

# allow requests from 3 different origins
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000"

# allow requests from any origin - use with caution
NB_API_ALLOWED_ORIGINS="*"
```

??? note "For more technical deployments using NGINX"

If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API.
For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/.
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ nav:
- Preparing data for annotation: "data_prep.md"
- Annotating a dataset: "annotation_tool.md"
- Generating harmonized subject-level metadata: "cli.md"
- Setting up a graph: "infrastructure.md"
- Set up a neurobagel node: "infrastructure.md"
- Set up local federation: "federate.md"
- Updating a harmonized dataset: "updating_dataset.md"
- Using the API: "api.md"
- Running cohort queries: "query_tool.md"
Expand Down

0 comments on commit c9edf14

Please sign in to comment.