docker-compose -f compose-no-tls.yml up -d
For a complete test of the pipeline
First create the necessary credentials.
To start all the backend services using docker compose.
docker-compose -f compose-sda.yml up -d db mq s3
To start all the sda services using docker compose.
docker-compose -f compose-sda.yml up -d
To start one of the sda services using docker compose.
docker-compose -f compose-sda.yml up -d ingest
To see brief real-time logs at the terminal remove the -d flag.
For step-by-step tests follow instructions below.
Upload the dummy datafile to the s3 inbox under the folder /test.
s3cmd -c s3cmd.conf put "dummy_data.c4gh" s3://inbox/test/dummy_data.c4gh
Browse the s3 buckets at:
In order to start the ingestion of the dummy datafile a message needs to be published to the files
routing key of the sda
exchange either via the API or the webui.
curl --cacert certs/ca.pem -vvv -u test:test 'https://localhost:15672/api/exchanges/test/sda/publish' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"vhost":"test","name":"sda","properties":{"delivery_mode":2,"correlation_id":"1","content_encoding":"UTF-8","content_type":"application/json"},"routing_key":"files","payload":"{\"type\": \"ingest\", \"user\": \"test\", \"filepath\": \"test/dummy_data.c4gh\", \"encrypted_checksums\":[{\"type\":\"sha256\", \"value\":\"5e9c767958cc3f6e8d16512b8b8dcab855ad1e04e05798b86f50ef600e137578\", \"type\": \"md5\", \"value\": \"b60fa2486b121bed8d566bacec987e0d\"}]}","payload_encoding":"string"}'
More examples using the API and relevant message properties can be found in the integration test script.
Alternatively, to access the webui go to:
"type": "ingest",
"user": "test",
"filepath": "test/dummy_data.c4gh",
"encrypted_checksums": [
{ "type": "sha256", "value": "5e9c767958cc3f6e8d16512b8b8dcab855ad1e04e05798b86f50ef600e137578" },
{ "type": "md5", "value": "b60fa2486b121bed8d566bacec987e0d" }
This step is automatically triggered by ingestion when all needed services are running. To initiate the verification of the dummy datafile manually, a message needs to be published to the files
routing key of the sda
"user": "test",
"filepath": "test/dummy_data.c4gh",
"file_id": "1",
"archive_path": "123e4567-e89b-12d3-a456-426614174000",
"encrypted_checksums": [
{ "type": "sha256", "value": "5e9c767958cc3f6e8d16512b8b8dcab855ad1e04e05798b86f50ef600e137578" },
{ "type": "md5", "value": "b60fa2486b121bed8d566bacec987e0d" }
The value of the archive path can be found by getting from the queue the message that was published when the header-stripped datafile is archived either by using the API or the webgui. This value corresponds to the name of the header-stripped file that is created in the archive bucket.
To finalize ingestion of the dummy datafile a message needs to be published to the files
routing key of the sda
"type": "accession",
"user": "test",
"filepath": "test/dummy_data.c4gh",
"accession_id": "EGAF00123456789",
"decrypted_checksums": [
{ "type": "sha256", "value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" },
{ "type": "md5", "value": "d41d8cd98f00b204e9800998ecf8427e" }
The values of the decrypted datafile checksums can be found by getting from the queue the message that was published at verification either by using the API or the webgui. After ingestion is finalized the backup bucket is backuped with the archive and contains the header-stripped datafile.
To register the mapping of the datafile IDs to the database a message needs to be published to the files
routing key of the sda
"type": "mapping",
"dataset_id": "EGAD00123456789",
"accession_ids": ["EGAF00123456789"