Skip to content

YaleSpinup/ds-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ds-api

CircleCI

This API provides API access to the Spinup Data Set service.

Endpoints

GET /v1/ds/ping
GET /v1/ds/version
GET /v1/ds/metrics

POST /v1/ds/{account}/datasets/{group}
GET /v1/ds/{account}/datasets/{group}/{id}
PATCH /v1/ds/{account}/datasets/{group}/{id}
PUT /v1/ds/{account}/datasets/{group}/{id}
DELETE /v1/ds/{account}/datasets/{group}/{id}

POST /v1/ds/{account}/datasets/{group}/{id}/attachments
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/attachments

GET /v1/ds/{account}/datasets/{group}/{id}/instances
POST /v1/ds/{account}/datasets/{group}/{id}/instances
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}

GET /v1/ds/{account}/datasets/{group}/{id}/logs

GET /v1/ds/{account}/datasets/{group}/{id}/users
POST /v1/ds/{account}/datasets/{group}/{id}/users
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
PUT /v1/ds/{account}/datasets/{group}/{id}/users

Usage

Create a dataset

POST /v1/ds/{account}/datasets/{group}

{
    "name": "awesome-dataset-of-stuff",
    "type": "s3",
    "derivative": true,
    "tags": [
        { "key": "Application", "value": "ButWhyyyyy" },
        { "key": "COA", "value": "Take.My.Money" },
        { "key": "CreatedBy", "value": "SomeGuy" }
    ],
    "metadata": {
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2018-03-28T07:36:01.123Z",
        "created_by": "drzoidberg",
        "data_classifications": ["hipaa","pii"],
        "data_format": "file",
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2019-03-28T07:36:01.123Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": ["e15d2282-9c68-46b5-801c-2b5a62484624", "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"]
    }
}

Response

{
    "id": "d37b375b-d136-4b17-8666-5036dc554a66",
    "repository": "dataset-localdev-d37b375b-d136-4b17-8666-5036dc554a66",
    "metadata": {
        "id": "d37b375b-d136-4b17-8666-5036dc554a66",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-11T18:41:32Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": true,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2020-03-11T18:41:32Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "e15d2282-9c68-46b5-801c-2b5a62484624",
            "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"
        ]
    }
}
Response Code Definition
202 Accepted creation request accepted
400 Bad Request badly formed request
403 Forbidden you don't have access to bucket
404 Not Found account not found
409 Conflict bucket or iam policy already exists
429 Too Many Requests service or rate limit exceeded
500 Internal Server Error a server error occurred
503 Service Unavailable an AWS service is unavailable

Get information about a dataset

GET /v1/ds/{account}/datasets/{group}/{id}

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": true,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2020-03-16T15:38:14Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    },
    "repository": {
        "name": "dataset-localdev-bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "empty": false,
        "tags": [
            {
                "key": "CreatedBy",
                "value": "SomeGuy"
            },
            {
                "key": "spinup:org",
                "value": "localdev"
            },
            {
                "key": "ID",
                "value": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8"
            },
            {
                "key": "COA",
                "value": "Take.My.Money"
            },
            {
                "key": "Application",
                "value": "ButWhyyyyy"
            },
            {
                "key": "Name",
                "value": "awesome-dataset-of-stuff"
            }
        ]
    }
}
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found dataset not found
500 Internal Server Error a server error occurred

Promote a dataset

PATCH /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong

Response

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": false,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "finalized_at": "2020-06-01T19:27:35Z",
        "finalized_by": "awong",
        "modified_at": "2020-06-01T19:27:35Z",
        "modified_by": "awong",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    }
}
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found dataset not found
409 Conflict dataset already finalized
500 Internal Server Error a server error occurred

Update dataset metadata

PUT /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong

Request:

{
	"metadata": {
		"description": "It's actually a tiny dataset"
	}
}

Response

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "It's actually a tiny dataset",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": false,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "finalized_at": "2020-06-01T19:27:35Z",
        "finalized_by": "awong",
        "modified_at": "2020-06-01T21:31:05Z",
        "modified_by": "awong",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    }
}
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found dataset not found
500 Internal Server Error a server error occurred

Delete a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong
Response Code Definition
204 OK okay
400 Bad Request badly formed request
404 Not Found dataset not found
500 Internal Server Error a server error occurred

Create attachment for a dataset

POST /v1/ds/{account}/datasets/{group}/{id}/attachments

The request needs to be a multipart/form-data with the following parameters:

  • name - the name of the attachment as it should be saved, e.g. eula.txt
  • attachment - the content of the file being uploaded

Response

[
    "eula.txt"
]
Response Code Definition
200 OK okay
400 Bad Request badly formed request, or file too big
404 Not Found dataset not found
500 Internal Server Error a server error occurred

Delete attachment from a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments

{
	"attachment_name": "dummy.doc"
}

Response

Response Code Definition
204 OK attachment deleted, if it existed
400 Bad Request bad request
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

Get attachments for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/attachments

Response

[
    {
        "Name": "Dataset Data Use Agreement.pdf",
        "Modified": "2020-05-17T02:04:27Z",
        "Size": 3708454,
        "URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/Dataset%20Data%20Use%20Agreement.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=342d937b7b726408c2efe41493d126ea577204f85ffe77ffc9b3cf22af80c7ea"
    },
    {
        "Name": "eula.txt",
        "Modified": "2020-05-18T13:19:34Z",
        "Size": 6920,
        "URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/eula.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=c2d7f7165ce3c099e8eefcb14e3b4c7e0e6a319af48d6727f25519f35488b14a"
    }
]
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

List all instances that have access to a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/instances

{
    "id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
    "access": {
        "i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
    }
}
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

Grant dataset access to an instance

POST /v1/ds/{account}/datasets/{group}/{id}/instances

{
	"instance_id": "i-01f9bfb7ee683e807"
}

Response

{
    "id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
    "access": {
        "i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
    }
}
Response Code Definition
200 OK instance access granted
400 Bad Request badly formed request
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

Revoke dataset access from an instance

DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}

Response Code Definition
204 OK instance access revoked
400 Bad Request bad request, or instance doesn't have access
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

Get audit logs for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/logs

Response

[
   "11/19/2020, 17:07:28 - Created dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (CreatedBy: drzoidberg)",
    "11/19/2020, 17:51:39 - Updated metadata for dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: awong)",
    "11/19/2020, 17:56:33 - Finalized original dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: me)"
]
Response Code Definition
200 OK okay
400 Bad Request badly formed request
404 Not Found account/dataset not found
500 Internal Server Error a server error occurred

Create a user for a dataset

POST /v1/ds/{account}/datasets/{group}/{id}/users

Request body is empty.

Response

{
    "user": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr",
    "group": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpGrp",
    "policy": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpPlc",
    "credentials": {
        "akid": "XXXXXXXXXXXXXXXXXXXX",
        "secret": "secretsecretsecretsecretsecretsecret",
    }
}
Response Code Definition
200 OK instance access granted
400 Bad Request badly formed request
404 Not Found account/dataset not found
409 Conflict user already exists
500 Internal Server Error a server error occurred

Delete a user for a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}/users

Response

Response Code Definition
200 OK instance access granted
400 Bad Request badly formed request
404 Not Found account/dataset/user not found
500 Internal Server Error a server error occurred

Get a user for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/users

Response

{
    "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr": {
        "keys": {
            "XXXXXXXXXXXXXXXXXXXX": "Inactive",
            "YYYYYYYYYYYYYYYYYYYY": "Active"
        }
    }
}
Response Code Definition
200 OK instance access granted
400 Bad Request badly formed request
404 Not Found account/dataset/user not found
500 Internal Server Error a server error occurred

Update a user's key for a dataset

PUT /v1/ds/{account}/datasets/{group}/{id}/users

Request body is empty.

Response

{
    "keys": {
        "XXXXXXXXXXXXXXXXXXXXX": "Inactive"
    },
    "credentials": {
        "akid": "YYYYYYYYYYYYYYYYYYYYY",
        "secret": "secretsecretsecretsecretsecretsecret"
    }
}
Response Code Definition
200 OK instance access granted
400 Bad Request badly formed request
404 Not Found account/dataset not found
429 Limit Exceeded maximum number of keys
500 Internal Server Error a server error occurred

Authentication

Authentication is accomplished using a pre-shared key (hashed string) in the X-Auth-Token header.

API Configuration

API configuration is via config/config.json, an example config file is provided.

You can specify a single metadataRepository where metadata about all the different data sets will be stored. Currently, the only supported type is s3, so you need to provide an S3 bucket and credentials with full access to that bucket. For example, if you created a bucket called spinup-example-metadata-repository, then the IAM policy would be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::spinup-example-metadata-repository",
                "arn:aws:s3:::spinup-example-metadata-repository/*"
            ]
        }
    ]
}

You can then define a list of accounts for the actual dataset repositories - that's where the data sets will be stored. Currently, the only supported type is s3, so you need to provide credentials in each account with the appropriate S3 and IAM access. This is a good starting IAM policy if you don't modify the default name and path prefixes:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:*",
            "Resource": [
                "arn:aws:iam::*:role/spinup/dataset/*",
                "arn:aws:iam::*:instance-profile/spinup/dataset/*",
                "arn:aws:iam::*:group/spinup/dataset/*",
                "arn:aws:iam::*:user/spinup/dataset/*",
                "arn:aws:iam::*:policy/spinup/dataset/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:ListAttachedRolePolicies",
                "iam:PassRole"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3::*:dataset-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AssociateIamInstanceProfile",
                "ec2:DescribeIamInstanceProfileAssociations",
                "ec2:DescribeInstances",
                "ec2:DisassociateIamInstanceProfile"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:ListTagsLogGroup",
                "logs:CreateLogStream",
                "logs:TagLogGroup",
                "logs:DescribeLogGroups",
                "logs:DeleteLogGroup",
                "logs:DescribeLogStreams",
                "logs:GetLogEvents",
                "logs:PutRetentionPolicy",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:log-group:/spinup/ORG/*:log-stream:*",
                "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
            ]
        }
    ]
}

Dataset groups

When creating a data set you need to specify a group that it belongs to. The group could be any arbitrary string and it just provides a way to group similar datasets together (e.g. data sets that are part of the same application or department). Currently, the group is only used for logging purposes but eventually it will play a more significant role.

Authors

E Camden Fisher [email protected] Tenyo Grozev [email protected]

License

GNU Affero General Public License v3.0 (GNU AGPLv3)
Copyright (c) 2020 Yale University

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages