Skip to content

Commit

Permalink
Docs/submission diagram update (#88)
Browse files Browse the repository at this point in the history
  • Loading branch information
blankdots authored Dec 7, 2023
2 parents 3c418d0 + b297568 commit 4afa835
Show file tree
Hide file tree
Showing 13 changed files with 260 additions and 80 deletions.
2 changes: 2 additions & 0 deletions docs/css/neic-sda.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.wy-table-responsive tbody td {
white-space: normal;
}

.wy-nav-content {max-width: 1000px !important;}
38 changes: 28 additions & 10 deletions docs/dataout.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Data Retrieval API
==================

> NOTE:
> We maintain two Data Out API solutions, for which REST APIs are the
> We maintain two Data Retrieval API solutions, for which REST APIs are the
> same.
SDA-DOA
Expand Down Expand Up @@ -84,17 +84,37 @@ and can't expose REST API (but still can receive RabbitMQ messages).
Handling Permissions
--------------------

Data Out API can be run with connection to an AAI or without. If connection to an AAI provider is not possible, the `PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be
Data Retrieval API can be run with connection to an AAI or without. If connection to an AAI provider is not possible, the `PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be
set.

> NOTE:
> By default we use Elixir AAI as JWT for authentication
> By default we use LifeScience AAI as JWT for authentication
> `OPENID_CONFIGURATION_URL` is set to:
> <https://login.elixir-czech.org/oidc/.well-known/openid-configuration>
> <https://proxy.aai.lifescience-ri.eu/.well-known/openid-configuration>
If connected to an AAI provider the current implementation is based on
[GA4GH
Passports](https://github.com/ga4gh/data-security/blob/master/AAI/AAIConnectProfile.md)
[GA4GH Passports](https://github.com/ga4gh/data-security/blob/master/AAI/AAIConnectProfile.md)

```mermaid
sequenceDiagram
actor client
client->>sda-download: request datasets/data
note right of client: send HTTP Authorization Bearer JWT
activate sda-download
client->>sda-download: check datasets/data exists
sda-download-->AAI: request GA4GH Visas permissions (userinfo endpoint)
activate AAI
AAI->>GA4GH Visa Issuer: get GA4GH Visa from Issuer
GA4GH Visa Issuer->>AAI: GA4GH Visas
deactivate AAI
AAI->>sda-download: GA4GH Visas
note right of sda-download: check known GA4GH Visa Issuer
sda-download->GA4GH Visa Issuer: validate GA4GH visas signature
sda-download->>client: return datasets/data
deactivate sda-download
```

The AAI JWT payload should contain a GA4GH Passport claim in the scope:

Expand Down Expand Up @@ -149,10 +169,8 @@ SDA-download
> Source code repository is available at:
> [https://github.com/neicnordic/sda-download](https://github.com/neicnordic/sda-download)
Recommended provisioning method for production is:

- on a `kubernetes cluster` using the [helm
chart](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts).
Recommended provisioning method for production is on a `kubernetes cluster` using the
[helm chart](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts) `sda-svc` which contains the `sda-download`.

`sda-download` focuses on enabling deployment of a stand-alone version
of SDA, with features such as:
Expand Down
167 changes: 143 additions & 24 deletions docs/db.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,45 +13,164 @@ The database container will initialize and create the necessary database
structure and functions if started with an empty area. Procedures for *backing up the database* are important, however considered out of scope for
the secure data archive project.

Look at [the SQL
definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d)
Look at [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d)
if you are also interested in the database triggers.

Configuration
-------------

The following environment variables can be used to configure the
database:
Security is hardened:

| Variable | Description | Default value |
|------------------------|------------------------------------|---------------------|
| `PGVOLUME` | Mountpoint for the writable volume | /var/lib/postgresql |
| `DB_LEGA_IN_PASSWORD` | *lega_in*'s password | - |
| `DB_LEGA_OUT_PASSWORD` | *lega_out*'s password | - |
| `TZ` | Timezone for the Postgres server | Europe/stockholm |
- We do not use 'trust' even for local connections
- Requiring password authentication for all
- Enforcing TLS communication
- Enforcing client-certificate verification

For TLS support use the variables below:
The following environment variables can be used to configure the database:

| Variable | Description | Default value |
|:-----------------|:-------------------------------------------------|:----------------------------------------------------------|
| `PG_SERVER_CERT` | Public Certificate in PEM format | `$PGVOLUME/pg.cert` |
| `PG_SERVER_KEY` | Private Key in PEM format | `$PGVOLUME/pg.key` |
| `PG_CA` | Public CA Certificate in PEM format | `$PGVOLUME/CA.cert` |
| `PG_VERIFY_PEER` | Enforce client verification | 0 |
| `SSL_SUBJ` | Subject for the self-signed certificate creation | `/C=SE/ST=Sweden/L=Uppsala/O=NBIS/OU=SysDevs/CN=LocalEGA` |
| Variable | Description | Default value |
| :--------------------- | :---------------------------------- | :----------------------- |
| PGDATA | Mountpoint for the writable volume | /var/lib/postgresql/data |
| POSTGRES_DB | Name of the database | sda |
| POSTGRES_PASSWORD | Password for the user `postgres` | - |
| POSTGRES_SERVER_CERT | Public Certificate in PEM format | - |
| POSTGRES_SERVER_KEY | Private Key in PEM format | - |
| POSTGRES_SERVER_CACERT | Public CA Certificate in PEM format | - |
| POSTGRES_VERIFY_PEER | Enforce client verification | verify-ca |

Client verification is enforced if `POSTGRES_VERIFY_PEER` is set to `verify-ca` or `verify-full`.

> NOTE:
> If not already injected, the files located at `PG_SERVER_CERT` and
> `PG_SERVER_KEY` will be generated, as a self-signed public/private
> certificate pair, using `SSL_SUBJ`. Client verification is enforced if
> and only if `PG_CA` exists and `PG_VERIFY_PEER` is set to `1`.

Database schema
---------------

The current database schema is documented below.

### Database schema migration
```mermaid
erDiagram
checksums {
text checksum
uuid file_id FK,UK
integer id PK
checksum_source source UK
checksum_algorithm type UK
}
dataset_event_log {
text dataset_id FK
text event FK
timestamp_with_time_zone event_date
integer id PK
jsonb message
}
dataset_events {
text description
integer id PK
character_varying title UK
}
dataset_references {
timestamp_with_time_zone created_at
integer dataset_id FK
timestamp_without_time_zone expired_at
integer id PK
text reference_id
text reference_scheme
}
datasets {
timestamp_with_time_zone created_at
text description
integer id PK
text stable_id UK
text title
}
dbschema_version {
timestamp_with_time_zone applied
character_varying description
integer version PK
}
file_dataset {
integer dataset_id FK,UK
uuid file_id FK,UK
integer id PK
}
file_event_log {
uuid correlation_id
jsonb details
text error
text event FK
uuid file_id FK
timestamp_without_time_zone finished_at
integer id PK
jsonb message
timestamp_with_time_zone started_at
boolean success
text user_id
}
file_events {
text description
integer id PK
character_varying title UK
}
file_references {
timestamp_with_time_zone created_at
timestamp_without_time_zone expired_at
uuid file_id FK
text reference_id
text reference_scheme
}
files {
text archive_file_path UK
bigint archive_file_size
text backup_path
timestamp_with_time_zone created_at
name created_by
bigint decrypted_file_size
text encryption_method
text header
uuid id PK
timestamp_with_time_zone last_modified
name last_modified_by
text stable_id UK
text submission_file_path UK
bigint submission_file_size
text submission_user
}
checksums }o--|| files : "file_id"
dataset_event_log }o--|| dataset_events : "event"
dataset_event_log }o--|| datasets : "dataset_id"
dataset_references }o--|| datasets : "dataset_id"
file_dataset }o--|| datasets : "dataset_id"
file_dataset }o--|| files : "file_id"
file_event_log }o--|| file_events : "event"
file_event_log }o--|| files : "file_id"
file_references }o--|| files : "file_id"
```

Database Functions
------------------

- `files_updated()` - When there is an update, update the last_modified and last_modified_by fields on the files table.

- `register_file(submission_file_path TEXT, submission_user TEXT)` - Function for registering files on upload

- `set_archived(file_uuid UUID, corr_id UUID, file_path TEXT, file_size BIGINT, inbox_checksum_value TEXT, inbox_checksum_type TEXT)` - function for registering files as archived, along with their original path in the inbox

- `set_verified(file_uuid UUID, corr_id UUID, archive_checksum TEXT, archive_checksum_type TEXT, decrypted_size BIGINT, decrypted_checksum TEXT, decrypted_checksum_type TEXT)` - utilised to mark files as verified along with all the necessary checksum details (decrypted and archived versions)


Database schema migration
-------------------------

For continuity/ease of upgrade in production the database supports
automatic migrations between schema versions. This is handled by
Expand Down
9 changes: 9 additions & 0 deletions docs/dictionary/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -275,3 +275,12 @@ svc
adminPassword
adminUser
postgresAdminPassword
autonumber
sequenceDiagram
BIGINT
FK
PGDATA
bigint
dbschema
erDiagram
jsonb
4 changes: 2 additions & 2 deletions docs/guides/deploy-k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@ Differences in deployment make concrete examples challenging, here it is explain

For secure deployment of the system one can think it by what can be accessed from where, for all ways of deploying two trust boundaries can be used, external and internal. For an extra layer of security also the storage trust boundary can be separate. The service is provided for customers on the internet therefore an example of deploying the service is using two separate Kubernetes clusters, one for responding to customers and other communication from outside, and another, more secure, storage facing internal cluster.

One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data out is used to serve unencrypted files the recommendation is to have it available only in an internal cluster.
One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data Retrieval API is used to serve unencrypted files the recommendation is to have it available only in an internal cluster.

The services could be divided into two trust boundaries
- The services in external cluster are [Inbox](/docs/submission.md#submission-inbox ) and [MQ](/docs/connection.md#local-message-broker)
- The services in internal cluster are [Intercept](/docs/services/intercept.md), [Ingest](/docs/services/ingest.md), [Verify](/docs/services/verify.md), [Mapper](/docs/services/mapper.md), [Finalize](/docs/services/finalize.md), [Backup](/docs/services/backup.md) and [Data out](/docs/dataout.md).
- The services in internal cluster are [Intercept](/docs/services/intercept.md), [Ingest](/docs/services/ingest.md), [Verify](/docs/services/verify.md), [Mapper](/docs/services/mapper.md), [Finalize](/docs/services/finalize.md), [Backup](/docs/services/backup.md) and [Data Retrieval API](/docs/dataout.md).

The innermost trust zone contains the database and the archive, which be can accessed only from internal cluster.

Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
mkdocs==1.5.3
mkdocs-include-markdown-plugin==6.0.4
markdown-callouts==0.3.0
mkdocs-mermaid2-plugin==1.1.1
36 changes: 0 additions & 36 deletions docs/static/custom.css

This file was deleted.

4 changes: 2 additions & 2 deletions docs/static/doa-api.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
openapi: 3.0.0
info:
description: SDA Data Out API Documentation derived from EGA Data API
description: SDA Data Retrieval API Documentation derived from EGA Data API
version: "1.0"
title: SDA Data Out API Documentation
title: SDA Data Retrieval API Documentation
license:
name: Apache 2.0
url: http://www.apache.org/licenses/LICENSE-2.0
Expand Down
3 changes: 0 additions & 3 deletions docs/static/ingestion-sequence.svg

This file was deleted.

Loading

0 comments on commit 4afa835

Please sign in to comment.