This API provides API access to the Spinup Data Set service.
GET /v1/ds/ping
GET /v1/ds/version
GET /v1/ds/metrics
POST /v1/ds/{account}/datasets/{group}
GET /v1/ds/{account}/datasets/{group}/{id}
PATCH /v1/ds/{account}/datasets/{group}/{id}
PUT /v1/ds/{account}/datasets/{group}/{id}
DELETE /v1/ds/{account}/datasets/{group}/{id}
POST /v1/ds/{account}/datasets/{group}/{id}/attachments
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/instances
POST /v1/ds/{account}/datasets/{group}/{id}/instances
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}
GET /v1/ds/{account}/datasets/{group}/{id}/logs
GET /v1/ds/{account}/datasets/{group}/{id}/users
POST /v1/ds/{account}/datasets/{group}/{id}/users
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
PUT /v1/ds/{account}/datasets/{group}/{id}/users
POST /v1/ds/{account}/datasets/{group}
{
"name": "awesome-dataset-of-stuff",
"type": "s3",
"derivative": true,
"tags": [
{ "key": "Application", "value": "ButWhyyyyy" },
{ "key": "COA", "value": "Take.My.Money" },
{ "key": "CreatedBy", "value": "SomeGuy" }
],
"metadata": {
"description": "The hugest dataset of awesome stuff",
"created_at": "2018-03-28T07:36:01.123Z",
"created_by": "drzoidberg",
"data_classifications": ["hipaa","pii"],
"data_format": "file",
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2019-03-28T07:36:01.123Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": ["e15d2282-9c68-46b5-801c-2b5a62484624", "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"]
}
}
{
"id": "d37b375b-d136-4b17-8666-5036dc554a66",
"repository": "dataset-localdev-d37b375b-d136-4b17-8666-5036dc554a66",
"metadata": {
"id": "d37b375b-d136-4b17-8666-5036dc554a66",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-11T18:41:32Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": true,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2020-03-11T18:41:32Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"e15d2282-9c68-46b5-801c-2b5a62484624",
"a7c082ee-f711-48fa-8a57-25c95b3a6ddd"
]
}
}
Response Code | Definition |
---|---|
202 Accepted | creation request accepted |
400 Bad Request | badly formed request |
403 Forbidden | you don't have access to bucket |
404 Not Found | account not found |
409 Conflict | bucket or iam policy already exists |
429 Too Many Requests | service or rate limit exceeded |
500 Internal Server Error | a server error occurred |
503 Service Unavailable | an AWS service is unavailable |
GET /v1/ds/{account}/datasets/{group}/{id}
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": true,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2020-03-16T15:38:14Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
},
"repository": {
"name": "dataset-localdev-bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"empty": false,
"tags": [
{
"key": "CreatedBy",
"value": "SomeGuy"
},
{
"key": "spinup:org",
"value": "localdev"
},
{
"key": "ID",
"value": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8"
},
{
"key": "COA",
"value": "Take.My.Money"
},
{
"key": "Application",
"value": "ButWhyyyyy"
},
{
"key": "Name",
"value": "awesome-dataset-of-stuff"
}
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
PATCH /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": false,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"finalized_at": "2020-06-01T19:27:35Z",
"finalized_by": "awong",
"modified_at": "2020-06-01T19:27:35Z",
"modified_by": "awong",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
409 Conflict | dataset already finalized |
500 Internal Server Error | a server error occurred |
PUT /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
Request:
{
"metadata": {
"description": "It's actually a tiny dataset"
}
}
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "It's actually a tiny dataset",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": false,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"finalized_at": "2020-06-01T19:27:35Z",
"finalized_by": "awong",
"modified_at": "2020-06-01T21:31:05Z",
"modified_by": "awong",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
DELETE /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
Response Code | Definition |
---|---|
204 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
POST /v1/ds/{account}/datasets/{group}/{id}/attachments
The request needs to be a multipart/form-data
with the following parameters:
name
- the name of the attachment as it should be saved, e.g.eula.txt
attachment
- the content of the file being uploaded
[
"eula.txt"
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request, or file too big |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
{
"attachment_name": "dummy.doc"
}
Response Code | Definition |
---|---|
204 OK | attachment deleted, if it existed |
400 Bad Request | bad request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
GET /v1/ds/{account}/datasets/{group}/{id}/attachments
[
{
"Name": "Dataset Data Use Agreement.pdf",
"Modified": "2020-05-17T02:04:27Z",
"Size": 3708454,
"URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/Dataset%20Data%20Use%20Agreement.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=342d937b7b726408c2efe41493d126ea577204f85ffe77ffc9b3cf22af80c7ea"
},
{
"Name": "eula.txt",
"Modified": "2020-05-18T13:19:34Z",
"Size": 6920,
"URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/eula.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=c2d7f7165ce3c099e8eefcb14e3b4c7e0e6a319af48d6727f25519f35488b14a"
}
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
GET /v1/ds/{account}/datasets/{group}/{id}/instances
{
"id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
"access": {
"i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
POST /v1/ds/{account}/datasets/{group}/{id}/instances
{
"instance_id": "i-01f9bfb7ee683e807"
}
{
"id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
"access": {
"i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}
Response Code | Definition |
---|---|
204 OK | instance access revoked |
400 Bad Request | bad request, or instance doesn't have access |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
GET /v1/ds/{account}/datasets/{group}/{id}/logs
[
"11/19/2020, 17:07:28 - Created dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (CreatedBy: drzoidberg)",
"11/19/2020, 17:51:39 - Updated metadata for dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: awong)",
"11/19/2020, 17:56:33 - Finalized original dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: me)"
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
POST /v1/ds/{account}/datasets/{group}/{id}/users
Request body is empty.
{
"user": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr",
"group": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpGrp",
"policy": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpPlc",
"credentials": {
"akid": "XXXXXXXXXXXXXXXXXXXX",
"secret": "secretsecretsecretsecretsecretsecret",
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
409 Conflict | user already exists |
500 Internal Server Error | a server error occurred |
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset/user not found |
500 Internal Server Error | a server error occurred |
GET /v1/ds/{account}/datasets/{group}/{id}/users
{
"dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr": {
"keys": {
"XXXXXXXXXXXXXXXXXXXX": "Inactive",
"YYYYYYYYYYYYYYYYYYYY": "Active"
}
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset/user not found |
500 Internal Server Error | a server error occurred |
PUT /v1/ds/{account}/datasets/{group}/{id}/users
Request body is empty.
{
"keys": {
"XXXXXXXXXXXXXXXXXXXXX": "Inactive"
},
"credentials": {
"akid": "YYYYYYYYYYYYYYYYYYYYY",
"secret": "secretsecretsecretsecretsecretsecret"
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
429 Limit Exceeded | maximum number of keys |
500 Internal Server Error | a server error occurred |
Authentication is accomplished using a pre-shared key (hashed string) in the X-Auth-Token
header.
API configuration is via config/config.json
, an example config file is provided.
You can specify a single metadataRepository
where metadata about all the different data sets will be stored. Currently, the only supported type is s3
, so you need to provide an S3 bucket and credentials with full access to that bucket. For example, if you created a bucket called spinup-example-metadata-repository
, then the IAM policy would be:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::spinup-example-metadata-repository",
"arn:aws:s3:::spinup-example-metadata-repository/*"
]
}
]
}
You can then define a list of accounts
for the actual dataset repositories - that's where the data sets will be stored. Currently, the only supported type is s3
, so you need to provide credentials in each account with the appropriate S3 and IAM access. This is a good starting IAM policy if you don't modify the default name and path prefixes:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:*",
"Resource": [
"arn:aws:iam::*:role/spinup/dataset/*",
"arn:aws:iam::*:instance-profile/spinup/dataset/*",
"arn:aws:iam::*:group/spinup/dataset/*",
"arn:aws:iam::*:user/spinup/dataset/*",
"arn:aws:iam::*:policy/spinup/dataset/*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:GetInstanceProfile",
"iam:ListAttachedRolePolicies",
"iam:PassRole"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3::*:dataset-*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:AssociateIamInstanceProfile",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstances",
"ec2:DisassociateIamInstanceProfile"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
},
{
"Effect": "Allow",
"Action": [
"logs:ListTagsLogGroup",
"logs:CreateLogStream",
"logs:TagLogGroup",
"logs:DescribeLogGroups",
"logs:DeleteLogGroup",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:PutRetentionPolicy",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:log-group:/spinup/ORG/*:log-stream:*",
"arn:aws:logs:*:*:log-group:/spinup/ORG/*"
]
}
]
}
When creating a data set you need to specify a group that it belongs to. The group could be any arbitrary string and it just provides a way to group similar datasets together (e.g. data sets that are part of the same application or department). Currently, the group is only used for logging purposes but eventually it will play a more significant role.
E Camden Fisher [email protected] Tenyo Grozev [email protected]
GNU Affero General Public License v3.0 (GNU AGPLv3)
Copyright (c) 2020 Yale University