diff --git a/docs/css/neic-sda.css b/docs/css/neic-sda.css index 33270d6..9a6d5d8 100644 --- a/docs/css/neic-sda.css +++ b/docs/css/neic-sda.css @@ -1,3 +1,5 @@ .wy-table-responsive tbody td { white-space: normal; } + +.wy-nav-content {max-width: 1000px !important;} \ No newline at end of file diff --git a/docs/dataout.md b/docs/dataout.md index b324647..0465570 100644 --- a/docs/dataout.md +++ b/docs/dataout.md @@ -2,7 +2,7 @@ Data Retrieval API ================== > NOTE: -> We maintain two Data Out API solutions, for which REST APIs are the +> We maintain two Data Retrieval API solutions, for which REST APIs are the > same. SDA-DOA @@ -84,17 +84,37 @@ and can't expose REST API (but still can receive RabbitMQ messages). Handling Permissions -------------------- -Data Out API can be run with connection to an AAI or without. If connection to an AAI provider is not possible, the `PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be +Data Retrieval API can be run with connection to an AAI or without. If connection to an AAI provider is not possible, the `PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be set. > NOTE: -> By default we use Elixir AAI as JWT for authentication +> By default we use LifeScience AAI as JWT for authentication > `OPENID_CONFIGURATION_URL` is set to: -> +> If connected to an AAI provider the current implementation is based on -[GA4GH -Passports](https://github.com/ga4gh/data-security/blob/master/AAI/AAIConnectProfile.md) +[GA4GH Passports](https://github.com/ga4gh/data-security/blob/master/AAI/AAIConnectProfile.md) + +```mermaid + +sequenceDiagram + actor client + client->>sda-download: request datasets/data + note right of client: send HTTP Authorization Bearer JWT + activate sda-download + client->>sda-download: check datasets/data exists + sda-download-->AAI: request GA4GH Visas permissions (userinfo endpoint) + activate AAI + AAI->>GA4GH Visa Issuer: get GA4GH Visa from Issuer + GA4GH Visa Issuer->>AAI: GA4GH Visas + deactivate AAI + AAI->>sda-download: GA4GH Visas + note right of sda-download: check known GA4GH Visa Issuer + sda-download->GA4GH Visa Issuer: validate GA4GH visas signature + sda-download->>client: return datasets/data + deactivate sda-download + +``` The AAI JWT payload should contain a GA4GH Passport claim in the scope: @@ -149,10 +169,8 @@ SDA-download > Source code repository is available at: > [https://github.com/neicnordic/sda-download](https://github.com/neicnordic/sda-download) -Recommended provisioning method for production is: - -- on a `kubernetes cluster` using the [helm - chart](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts). +Recommended provisioning method for production is on a `kubernetes cluster` using the +[helm chart](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts) `sda-svc` which contains the `sda-download`. `sda-download` focuses on enabling deployment of a stand-alone version of SDA, with features such as: diff --git a/docs/db.md b/docs/db.md index 9b7eabf..f0bf8f1 100644 --- a/docs/db.md +++ b/docs/db.md @@ -13,45 +13,164 @@ The database container will initialize and create the necessary database structure and functions if started with an empty area. Procedures for *backing up the database* are important, however considered out of scope for the secure data archive project. -Look at [the SQL -definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d) +Look at [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d) if you are also interested in the database triggers. Configuration ------------- -The following environment variables can be used to configure the -database: +Security is hardened: -| Variable | Description | Default value | -|------------------------|------------------------------------|---------------------| -| `PGVOLUME` | Mountpoint for the writable volume | /var/lib/postgresql | -| `DB_LEGA_IN_PASSWORD` | *lega_in*'s password | - | -| `DB_LEGA_OUT_PASSWORD` | *lega_out*'s password | - | -| `TZ` | Timezone for the Postgres server | Europe/stockholm | +- We do not use 'trust' even for local connections +- Requiring password authentication for all +- Enforcing TLS communication +- Enforcing client-certificate verification -For TLS support use the variables below: +The following environment variables can be used to configure the database: -| Variable | Description | Default value | -|:-----------------|:-------------------------------------------------|:----------------------------------------------------------| -| `PG_SERVER_CERT` | Public Certificate in PEM format | `$PGVOLUME/pg.cert` | -| `PG_SERVER_KEY` | Private Key in PEM format | `$PGVOLUME/pg.key` | -| `PG_CA` | Public CA Certificate in PEM format | `$PGVOLUME/CA.cert` | -| `PG_VERIFY_PEER` | Enforce client verification | 0 | -| `SSL_SUBJ` | Subject for the self-signed certificate creation | `/C=SE/ST=Sweden/L=Uppsala/O=NBIS/OU=SysDevs/CN=LocalEGA` | +| Variable | Description | Default value | +| :--------------------- | :---------------------------------- | :----------------------- | +| PGDATA | Mountpoint for the writable volume | /var/lib/postgresql/data | +| POSTGRES_DB | Name of the database | sda | +| POSTGRES_PASSWORD | Password for the user `postgres` | - | +| POSTGRES_SERVER_CERT | Public Certificate in PEM format | - | +| POSTGRES_SERVER_KEY | Private Key in PEM format | - | +| POSTGRES_SERVER_CACERT | Public CA Certificate in PEM format | - | +| POSTGRES_VERIFY_PEER | Enforce client verification | verify-ca | + +Client verification is enforced if `POSTGRES_VERIFY_PEER` is set to `verify-ca` or `verify-full`. -> NOTE: -> If not already injected, the files located at `PG_SERVER_CERT` and -> `PG_SERVER_KEY` will be generated, as a self-signed public/private -> certificate pair, using `SSL_SUBJ`. Client verification is enforced if -> and only if `PG_CA` exists and `PG_VERIFY_PEER` is set to `1`. Database schema --------------- The current database schema is documented below. -### Database schema migration +```mermaid + + erDiagram + checksums { + text checksum + uuid file_id FK,UK + integer id PK + checksum_source source UK + checksum_algorithm type UK + } + + dataset_event_log { + text dataset_id FK + text event FK + timestamp_with_time_zone event_date + integer id PK + jsonb message + } + + dataset_events { + text description + integer id PK + character_varying title UK + } + + dataset_references { + timestamp_with_time_zone created_at + integer dataset_id FK + timestamp_without_time_zone expired_at + integer id PK + text reference_id + text reference_scheme + } + + datasets { + timestamp_with_time_zone created_at + text description + integer id PK + text stable_id UK + text title + } + + dbschema_version { + timestamp_with_time_zone applied + character_varying description + integer version PK + } + + file_dataset { + integer dataset_id FK,UK + uuid file_id FK,UK + integer id PK + } + + file_event_log { + uuid correlation_id + jsonb details + text error + text event FK + uuid file_id FK + timestamp_without_time_zone finished_at + integer id PK + jsonb message + timestamp_with_time_zone started_at + boolean success + text user_id + } + + file_events { + text description + integer id PK + character_varying title UK + } + + file_references { + timestamp_with_time_zone created_at + timestamp_without_time_zone expired_at + uuid file_id FK + text reference_id + text reference_scheme + } + + files { + text archive_file_path UK + bigint archive_file_size + text backup_path + timestamp_with_time_zone created_at + name created_by + bigint decrypted_file_size + text encryption_method + text header + uuid id PK + timestamp_with_time_zone last_modified + name last_modified_by + text stable_id UK + text submission_file_path UK + bigint submission_file_size + text submission_user + } + + checksums }o--|| files : "file_id" + dataset_event_log }o--|| dataset_events : "event" + dataset_event_log }o--|| datasets : "dataset_id" + dataset_references }o--|| datasets : "dataset_id" + file_dataset }o--|| datasets : "dataset_id" + file_dataset }o--|| files : "file_id" + file_event_log }o--|| file_events : "event" + file_event_log }o--|| files : "file_id" + file_references }o--|| files : "file_id" +``` + +Database Functions +------------------ + +- `files_updated()` - When there is an update, update the last_modified and last_modified_by fields on the files table. + +- `register_file(submission_file_path TEXT, submission_user TEXT)` - Function for registering files on upload + +- `set_archived(file_uuid UUID, corr_id UUID, file_path TEXT, file_size BIGINT, inbox_checksum_value TEXT, inbox_checksum_type TEXT)` - function for registering files as archived, along with their original path in the inbox + +- `set_verified(file_uuid UUID, corr_id UUID, archive_checksum TEXT, archive_checksum_type TEXT, decrypted_size BIGINT, decrypted_checksum TEXT, decrypted_checksum_type TEXT)` - utilised to mark files as verified along with all the necessary checksum details (decrypted and archived versions) + + +Database schema migration +------------------------- For continuity/ease of upgrade in production the database supports automatic migrations between schema versions. This is handled by diff --git a/docs/dictionary/wordlist.txt b/docs/dictionary/wordlist.txt index 52f2a98..135e384 100644 --- a/docs/dictionary/wordlist.txt +++ b/docs/dictionary/wordlist.txt @@ -275,3 +275,12 @@ svc adminPassword adminUser postgresAdminPassword +autonumber +sequenceDiagram +BIGINT +FK +PGDATA +bigint +dbschema +erDiagram +jsonb diff --git a/docs/guides/deploy-k8s.md b/docs/guides/deploy-k8s.md index ce91c61..1c3ce8e 100644 --- a/docs/guides/deploy-k8s.md +++ b/docs/guides/deploy-k8s.md @@ -20,11 +20,11 @@ Differences in deployment make concrete examples challenging, here it is explain For secure deployment of the system one can think it by what can be accessed from where, for all ways of deploying two trust boundaries can be used, external and internal. For an extra layer of security also the storage trust boundary can be separate. The service is provided for customers on the internet therefore an example of deploying the service is using two separate Kubernetes clusters, one for responding to customers and other communication from outside, and another, more secure, storage facing internal cluster. -One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data out is used to serve unencrypted files the recommendation is to have it available only in an internal cluster. +One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data Retrieval API is used to serve unencrypted files the recommendation is to have it available only in an internal cluster. The services could be divided into two trust boundaries - The services in external cluster are [Inbox](/docs/submission.md#submission-inbox ) and [MQ](/docs/connection.md#local-message-broker) -- The services in internal cluster are [Intercept](/docs/services/intercept.md), [Ingest](/docs/services/ingest.md), [Verify](/docs/services/verify.md), [Mapper](/docs/services/mapper.md), [Finalize](/docs/services/finalize.md), [Backup](/docs/services/backup.md) and [Data out](/docs/dataout.md). +- The services in internal cluster are [Intercept](/docs/services/intercept.md), [Ingest](/docs/services/ingest.md), [Verify](/docs/services/verify.md), [Mapper](/docs/services/mapper.md), [Finalize](/docs/services/finalize.md), [Backup](/docs/services/backup.md) and [Data Retrieval API](/docs/dataout.md). The innermost trust zone contains the database and the archive, which be can accessed only from internal cluster. diff --git a/docs/requirements.txt b/docs/requirements.txt index e26ccfa..51323de 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,3 +1,4 @@ mkdocs==1.5.3 mkdocs-include-markdown-plugin==6.0.4 markdown-callouts==0.3.0 +mkdocs-mermaid2-plugin==1.1.1 \ No newline at end of file diff --git a/docs/static/custom.css b/docs/static/custom.css deleted file mode 100644 index 04fbcbb..0000000 --- a/docs/static/custom.css +++ /dev/null @@ -1,36 +0,0 @@ -body { overflow-x: initial !important; } /* Fix readthedocs bug */ - -.wy-nav-content {max-width: 1000px;} - -.bolditalic { font-weight: bold; font-style:italic; } - -#ega { margin-bottom: 1.5em; } - - -#ega tr td { font-size: 0.85em; margin: 0; padding: 0.5em 1em; border: 1px solid #e1e4e5; text-align:left; white-space: initial; vertical-align:middle; } -#ega tr td:first-child { text-align:right; border:none; font-weight:bold; font-variant: small-caps; width: 15%;} -#ega tr td:nth-child(4) { text-align:center; border:none; } - - -.ega-stable { color: green; } -.ega-dev { color: orange; } -.ega-unstable { color: red; } - - -#ega thead { text-align:left; } -#ega thead tr th { padding: 0.5em; border: 1px solid #e1e4e5; background-color: #cfd0d0;} - - -/* override table width restrictions */ -@media screen and (min-width: 767px) { - - .wy-table-responsive table td { - /* !important prevents the common CSS stylesheets from overriding - this as on RTD they are loaded after this stylesheet */ - white-space: normal !important; - } - - .wy-table-responsive { - overflow: visible !important; - } - } \ No newline at end of file diff --git a/docs/static/doa-api.yml b/docs/static/doa-api.yml index 8c9925c..f0b3172 100644 --- a/docs/static/doa-api.yml +++ b/docs/static/doa-api.yml @@ -1,8 +1,8 @@ openapi: 3.0.0 info: - description: SDA Data Out API Documentation derived from EGA Data API + description: SDA Data Retrieval API Documentation derived from EGA Data API version: "1.0" - title: SDA Data Out API Documentation + title: SDA Data Retrieval API Documentation license: name: Apache 2.0 url: http://www.apache.org/licenses/LICENSE-2.0 diff --git a/docs/static/ingestion-sequence.svg b/docs/static/ingestion-sequence.svg deleted file mode 100644 index 943d6a1..0000000 --- a/docs/static/ingestion-sequence.svg +++ /dev/null @@ -1,3 +0,0 @@ - - -
opt
opt
<<optional>>
<...
msg: error in verify [cega][files.error]
msg: error in verify [cega][files.error]
ERROR
ERROR
COMPLETED
COMPLETED
opt
opt
<<optional>>
<...
Inbox
Inbox
Verify
Verify
Finalize
Finalize
msg:  File s secure [local] [completed]
msg:  File s secure [local] [completed]
Local MQ
Local MQ
msg: Upload done[cega][files.inbox]
msg: Upload done[cega][files.inbox]
Central MQ
Central MQ
Upload tool
Upload tool
Database
Database
FILE STATUS
FILE STATUS
IN_INGESTION
IN_INGESTION
ARCHIVED
ARCHIVED
msg: begin ingestion [local][files]
msg: begin ingestion [local][files]
msg: trigger verify [local] [archived]
msg: trigger verify [local] [archived]
Ingest
Ingest
msg: Done with ingestion [local][archived]
msg: Done with ingestion [local][archived]
msg:  Done with verify [local] [backup]
msg:  Done with verify [local] [backup]
msg: trigger finalize [local] [accessionIDs]
msg: trigger finalize [local] [accessionIDs]
READY
READY
read federated msg [localega.v1][accessionIDs]
read federated msg [localega.v1][accessionIDs]
shovel msg [localega.v1][completed]
shovel msg [localega.v1][completed]
msg: error in ingestion [cega][files.error]
msg: error in ingestion [cega][files.error]
ERROR
ERROR
upload enc file
upload enc file
shovel msg [localega.v1][v1.files.inbox]
shovel msg [localega.v1][v1.files.inbox]
read federated msg [localega.v1][files]
read federated msg [localega.v1][files]
shovel msg [localega.v1][v1.files.error]
shovel msg [localega.v1][v1.files.error]
ERROR
ERROR
shovel msg [localega.v1][v1.files.error]
shovel msg [localega.v1][v1.files.error]
msg: error in finalize [cega][files.error]
msg: error in finalize [cega][files.error]
shovel msg [localega.v1][v1.files.error]
shovel msg [localega.v1][v1.files.error]
Backupsync
Backupsync
Intercept
Intercept
msg:  Wait for backup [local] [backup]
msg:  Wait for backup [local] [backup]
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/docs/static/neic-sda-seq.drawio b/docs/static/neic-sda-seq.drawio deleted file mode 100644 index 9f1804a..0000000 --- a/docs/static/neic-sda-seq.drawio +++ /dev/null @@ -1 +0,0 @@ -7V1Zk5u4Fv41/WgXkpCAR6+ZvtWZZLJ0bu7LFMbYTUIbX0xv8+tHYjMIsdnCjbvtqqSNAAlzjr5zdDZdocn98wff3N599Ja2ewWV5fMVml5BCBSV0D+s5SVqMRQQNax9ZxlftG/46vxjx404bn1wlvYud2HgeW7gbPONlrfZ2FaQazN933vKX7by3PyoW3NtFxq+WqZbbP3hLIO7qFUFaN/+h+2s7+KRoRHfsDCt32vfe9jEw11BtAo/0el7M+lKiRp2d+bSe8o0odkVmvieF0Tf7p8ntstebfLWovvmJWfTx/btTdDkhr/csbGxHgc/gu/fbn7ejh6nk/EA6VE3j6b7EL8PbxvEDxy8JO/oLrh36Tdwhcb0Z2xZ6/3zmvHDcPeyu3eHzm5kBV8D3zbv6TWm66w39BrXXgXslq1pOZv1TXg0xbTl0fYDh5JgFF8YeNv9dd/YwXSAaMvK29Beo8dgg68c1514rueHj4Xm+mwym7I7A9/7bWfOLHSsYoWeKb6l+MWxR7CfM03xW/tge/d24L/QS+KzhhZzaszgGok5/mnPLoYSU/kuwypAiW80YxZdp33v6US/xKRqQzajQLYrSMx79hbdIPzZ2SNKUsfbmG7aus5dEx9xNPftnfOPuQgP2ZvM8ABHXzE1XXNhu5+9ncPGLl58w52+d5ZLNlZ03zidXAlRN97GTjgiwhCg8DyDhQSvZPwYVeLfuZ+tLdhDUYZK5gNyzILhsCsmUIGACTgi2pvliGHk/gVmyLg0d3f2MpnX4RRKEJD+3jG9d+6wRwrJvwtMP0j6Wrie9TtpjC8Conk4G5ExnS4V83DnPfhW/LTawlyYwFIXFtRW6kIZxBONjrK2m1DTXuawvkizzJTFghmbtPm2awbOY15CiAgYj/DZc+jPSlkCkER6xFygA66P6FfHt2VBm+9JwTU9Re+m0BOllPmSuWzLLthVPDKAxhALh9rzaNTpnmPTF3sEE0MRkoWoxaZ6jpvJ/x+85MRgF4LAiF4A0PZ5fzKc0wn/7RsTTsw0kQQFjxot38vi8C7ud+votE0nmc/e2ybCIWfFCJmMQN/ygh+VtkWPL+1hrvDYstfmFZ7Sb1Tq2rth9Fh42upBeDii8zMEfrE8sSg42L5AoqTCgRdKMVsj9pjs2aKfl1UR5mOsKXH73Lx3XMbdt7a/NDcmJ09Qc/GhlmkRZRKlVHwMqPTAupGbeQMgBYYGROGm9AAgXhx5q9XODriJLmdq41byyXLN3c6x8pxRECiRjC3IK1RBeaqnLh07d26JNWQsiwqlTZbK0j5YXKlFcbV6mj/dfv6kTYP59G47sQYfZg8D1C9pRfkvD/zqodJK1QAvRPi+pMkrVRUO1K20InXSSih7FhG7lUoNqsRuGmG1WiN7oobZly+fvtTAdDTk2aB3U1hOIOdoWFaGSNHyqjyUMtswP0MGAHaAyELcgSL2fQ+I3Buk1ZUc7Q3ASePGSIsQHkJc3VlrqJXOb9o5wOXk08fPN7NvszrN9uwhsxoT+guZqjrUCnrsqbRYWNRiLxZSjvK6qnNCDYMY6DLAChRIkstOYyYVyruLmbSFmTTh/uPNpBH1M3ZSmGcYKXbSgvwvWpeuNwvvuUDFZM4+3Ls3zsp2nfAtbm3foc/AEHrqxs2f923jpzsnsL/SV8tufaKTPU9/+tICk97ip8eua253zp5ZfNt68HcUGb/YkdgKW72HgI00SR1e4aWMyntrbeJMUsJ+75mWFn6vZguxilZjFoEigcXhzBLb+lIV4YwOF6jaAtychwjJq09Ag0WUgSJPDEruPIa7Fj//ssjq44cfo5fg99j+CZ/N52QtX6VSZ1iiKL6zPOb5wZ23ZhCU5bI95RVZlAfyKD/FM32qCulbOhsbExznpQpQRFKlSG4ogdhlS4kMoW9j2+wFS6RiyWq1gpYlwpIlWRAsCUvS1VIVlgi9ut1hicir+56wZD6fw8mkGZaobQmuchoqwc2whF9VSwGT4qJi7lBSsXd1gRPJcKJbthhOZC6BAOBlVQ/wBBSje94ZoOiTWVNAKTMdlwMK5M0SQNH1pqAiY7nzcufpq+D35JOv+9eD28+/7n7OB0mAgESPunQfd7yi3ix22/CsMndod7Rb9o8BDzugXOh6dAkc+pzZEWWorUv5bpnxQi/8439Qjz3ZDTzWnP1Po58Q7o5g+damPw0P6YJe1xGbAhjkvdmGodOzqoE0gjBMQtdauLG1/B1SbH1loUaZWXMTMh9UPv51kcmSZfLCtPQlEslkiFQVLyXJZBXmfcQnlskFHgO1/pB6JIONfBrHdit2zOhogYleEbr0fet65pJ+WTIGKsYQOaHxLcLzDIA3imY6BKUzs4sH4oUXBB4zxWccnklYY71ttQEuMwmRWOZtd+E9zfYN47CBnkie6qo2GjOvaDUA9xjQKt2j8XSodY/uY5UIRvk5FbP00aFKiFOlElW1xjPaNm6kdCBh3EjRFUWqn7LE5brvp1shVsSXCWUQ/yLGOhJjio1sIhJjiq2ztYAcMaZpnNmbaAUxlvZ7AjFWXFWmuM8ydy5sJtu5Ytr6SmjBIJZuL1Zy2Cx12fZEWXrvxtDpaKbPG9ou9LbEVl/PsVIVEZsh9tSkq09zd7GHykaTNKKugCaGQYDOggP4+Lz4kWTIMqi/6pJMyHyiOPWWkWZKo1XZ/PpmRq/++m307fvXVtFmJXPAeqHct2zC4IuIJ28WaUPKeJ8iHo7bBbGZU2zMxuODOKkLltGKjpskfyLLMKoEA6uQXRJgPMco2rpVpCCm1rXBz0f31zr48Ov27+9L3bt/+tx44XiauFqDsziqBhymWbsHhNZCLppNNdR8X5KSGAyNlIwkK4tBzMC1HoKGYbnSQnCv//z7+s8Ps6/frj/9eQZW+SPCbUFZ4Fx/wm11rZg1lu+juwyFJBfiHLH1raIpUTkHzBFIWuhLFpLqfKpjMlK3SCqK/mtpeVebWN7rUiJSpBx9mfxxfVuby3DuKJrARH9R1Hg9DC1GKlZh6KvUhUgh+a0qpPkli0oOVUZVraYnWQBq8ACajNQtgJavveWWf1gCQ8VWOYD2JjRGWdjrsPKDs1nbuzDr47SVH/KBN7Hn9h3WfajG1l7XfQBc3MMAkFMJH3hOwqeRuOl9TjFXIUhLSuocIGuU6p66qzWUDNWpsBGkch4E/yHuHS0BRL2kQiDwnfWaQmK2ABAfDmn61h3lmwOjIat+xHlA/Egf6WFmmVyIh2cB8Vy0+YDvQgrCV2nauZRKpqkUOOLiqDvOUTcbj2eGMEhyhkaAqFUCrLl7RTP6l7fQoEzJm3b9V9C3ckoe6v6HBhLQXEByXeuI4qLA2LPTGrN1v87ASFGwLRCDi81vqjny5o5CR50ZKZKRZOmNQuYUZVG9bxvFlE1AxqeUTXOWitOXqcxYKgQK8XlosscbKypBtdeKLED5RWc3mqw4PbJdIMcZSJ2+2yp4C4NxsGtRwXxPhM+olGet4KM09oN1KnfOMpMzLxrKrRcs/uxh+34yOaUaLqrhrNd4j7kMmi4s0+LX0yCgvDdof4BbtCy7ty/YX4g5R3zdt+aGamLUdHXK0pdCboMi3aJn2J2xMq/S6iMCO7Nl2bsdVe+vp7t3iddTA021DvA6AaQ+4zVUhuQUzkTxJGpQXO3sogH7gsfA0Ar1S1Si81p0i2A/ha9ILOrv9ZFZ5KHuImZvzFhkWongUcOX2Wj687wgsSJgr3om9zdgr1BPu5uQPaFhuH9BE72HL6KwmjNAI6pGZZ/ORbwYBuebagxjug6GCl/AhutMlmWBYFg2VunzaVrdPUcbI4QsWh47cVSwmQyV2LfD3POVvbR9k5WHggpTcfeqrL02h48gNhKLtdlGRUdeHXbxzJgYiaMxV/b7UM20Co6OL/rMFNPUufiSm+pHzn9lqClG5oM0Dru5jK0OldR22xxdwDv0I3MLeFU/PEBaqelKFlpjlfc+pmN1agYWV7GvTdcjmq6PBcWaegPbuzvvMdxLuBysRRX/GiF1X/VmPNXGo8SUIAXAqzFJimmhGwTnSil1Y1YQbmrQbhek1903taEhuHaHVNgzazAfe4LxgQKAjz0pdNRZ7EkyUqc7zkmoGPiWt0d91dCTyw6pDUC2z8btAQAFk8/p9kdFsJUcOg/rdtYjWRXbe2Z7pOoaGqqqQjTNIAQoIL/axAevXgDRh2rYIzFUA3CpPIV+ZS1lFAULB+pUlKHaZPk+7Ab4fjdPTfGovzb6kyTVF8s2S4rDyqdclazaFW1lLmRoUKIEr4ekTqe9sa6Y159V3T95eWVWkvblvzEbhwc/r5rkW3RUeFm0gqPPNH3OPuH0JXuUSSU5IkDHKIrCqi3GakXhkZNLU/KWNKjDw6SaRvLTFGpcRyViTFpRgVZ6VcKVhyzwwWEr9LK6zT1RdujiHHFrXQQODdQloLYveQZa8UDdJhXLkQ31gqHGnNtYDepqgd7AnPsIhvw+ABWbAJyZ5iWxyIQ8PQwQNb+kQFIQQsV8iTjIQXx3FSaQKLLydc23B0TrCrduQucoKlS+bhBfJLq5qNBQdU/S4i4M8SN3KiiE5p6spOiZe65VVIWgSNAZO+mkR1lUA1mPnXRAG3L9dhNZIfZZnUFWeFtHXeWCUGuA/6Rf+E+SHQ+PNoWmcRtdGz8x5rWXk5g/YV0aec/wv40+L/B8nS/8p1NaGvxXo1uP4R/za/cTRmngdnF15+Ee6wtsQ4VSFmENqFAP/88zD9KOzQNR0g+f9VfsWhawQ1UbEqyj9FM2cqc4jw8L17u4uU7k5sLSYuM6c3NxGrc+5Pey6BB12xnPX1fn7j3M6hRyslHqPBYeUxZ6qOnl+N1RlWigsr2cKcIbDGyJBkvG7RZgz8yQclGku1WksTR7+RtVpNtlUV8gva6EUSHNGamHq8t1mnixa2nqMoJDaACSfLSykbtFcwnp2G85wjlbmeMS4HzUAkGy2JGW0t5hhDNfZ/2kAc7qmxA8b85ib5Ah0QjChoIMYHC55LrK9djGfj80dICJTnRDhZBU9yvTmk/VK1VF0bBYOGynIkytE2GXBUk/JMGJFiSqVMlw1gsSYZhL0cQ0DotB7l5YVDLHJZddBMp5EoqmC+eT0OjHskSiT6UfXCn6WmzSrPDW9tNuIyDkM5GdiOOuN72NQEr6IoEr5+Wh2wjopNk2AhDgjiheVDiv2aux7G1wARbJwLJQLN1ciICFwAVSZQELTsy9PQIWUSRH+bpG6JmuXdrEKTKAKrf4KpsmMwQaujpZjoogXbMKa/uyyOGDiUDKIq3XNRhxXaWlx+Wnswhfrahucc9MVoUq4D9Mhz3ZKrRjxaW+L0XAm6wxuCVLhQyX5vYAENF1hgJ1HVHu1nnApfxOz6oGYkYDiIzyyXgUqLbcjeEQVC17t/azE0RgCwiOjyOsZeGN0fEeatnBS+bgIKCtAtFzsibxQGvosnDWQKeF2XRP8dPz3ynzR3rOPwQd6kjj+ae7+GGuarKK5LrJ1qP/OWrgXj9+uf31n0dENub875Y5rULm3KddDzFMG2KcA6mS2UanLLJ4bgYIWa1j1n49u3qCfKnKya8/GrOyhviugCwspIe+x9Sm/eV0EXv30Vsy3Wf2Lw== \ No newline at end of file diff --git a/docs/structure.md b/docs/structure.md index adb8105..a5128e2 100644 --- a/docs/structure.md +++ b/docs/structure.md @@ -30,7 +30,7 @@ progress notifications whether the ingestion was successful, or whether there wa More details about the [Ingestion Workflow](submission.md#ingestion-workflow). Once a file has been successfully submitted and the ingestion process has been finalised, including receiving an `Accession ID` from Central -EGA. The Data Out API can be utilised to retrieve set file by utilising the `Accession ID`. More details in [Data Retrieval API](dataout.md#data-retrieval-api). +EGA. The Data Retrieval API can be utilised to retrieve set file by utilising the `Accession ID`. More details in [Data Retrieval API](dataout.md#data-retrieval-api). Inter-communication between services diff --git a/docs/submission.md b/docs/submission.md index 29be6bd..e509767 100644 --- a/docs/submission.md +++ b/docs/submission.md @@ -16,7 +16,77 @@ Structure of the message and its contents are described in ### Ingestion Workflow -![Ingestion sequence diagram](./static/ingestion-sequence.svg) +```mermaid + + sequenceDiagram + autonumber + participant Upload Tool + box SDA + participant Inbox + participant Ingest + participant Verify + participant Finalize + participant Mapper + participant SDA Database + participant Intercept + participant SDA RabbitMQ + end + box Central EGA + participant Central EGA RabbitMQ + end + Upload Tool->>Inbox: upload encrypted file + activate Inbox + Inbox-->>SDA RabbitMQ: msg: Upload Done + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.inbox] + deactivate Inbox + Central EGA RabbitMQ-->>SDA RabbitMQ: federated msg: [from_cega][ingest type] + SDA RabbitMQ-->>Intercept: Intercept reads message + Intercept->>Ingest: msg: [sda][ingest] begin ingestion + activate Ingest + Ingest->>SDA Database: mark ingested + opt + Ingest-->>SDA RabbitMQ: msg: error + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.error] + end + Ingest->>SDA Database: mark archived + Ingest-->>SDA RabbitMQ: msg [sda][archived] + deactivate Ingest + activate Verify + SDA RabbitMQ-->>Verify: msg [sda][archived] triggers verify + opt + Verify-->>SDA RabbitMQ: msg: error + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.error] + end + Verify->>SDA Database: mark verified + Verify-->>SDA RabbitMQ: msg: [sda][verified] + deactivate Verify + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.verified] + Central EGA RabbitMQ-->>SDA RabbitMQ: federated msg: [from_cega][accession type] + SDA RabbitMQ-->>Intercept: Intercept reads message + Intercept->>Finalize: msg: [sda][accession] map file to accession ID + activate Finalize + note right of Finalize: Finalize makes the file backup + opt + Finalize-->>SDA RabbitMQ: msg: error + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.error] + end + Finalize->>SDA Database: mark completed + Finalize-->>SDA RabbitMQ: msg: [sda][completed] + deactivate Finalize + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.completed] + Central EGA RabbitMQ-->>SDA RabbitMQ: federated msg: [from_cega][mappings type] + SDA RabbitMQ-->>Intercept: Intercept reads message + Intercept->>Mapper: msg: [sda][mappings] begin ingestion + activate Mapper + opt + Mapper-->>SDA RabbitMQ: msg: error + SDA RabbitMQ-->>Central EGA RabbitMQ: shovel msg:[to_cega][files.error] + end + Mapper->>SDA Database: map file to dataset accession ID + Mapper->>Inbox: remove file from inbox + deactivate Mapper + +``` > NOTE: > Ingestion Workflow Legend diff --git a/mkdocs.yml b/mkdocs.yml index e7c8005..795bdb7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -4,6 +4,7 @@ markdown_extensions: - callouts plugins: - include-markdown + - mermaid2 theme: name: readthedocs nav: