Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying with new MDF portal url #121

Merged
merged 6 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,91 @@
# MDF Connect
The Materials Data Facility Connect service is the ETL flow to deeply index datasets into MDF Search. It is not intended to be run by end-users. To submit data to the MDF, visit the [Materials Data Facility](https://materialsdatafacility.org).

# Architecture
The MDF Connect service is a serverless REST service that is deployed on AWS.
It consists of an AWS API Gateway that uses a lambda function to authenticate
requests against GlobusAuth. If authorised, the endpoints trigger AWS lambda
functions. Each endpoint is implemented as a lambda function contained in a
python file in the [aws/](aws/) directory. The lambda functions are deployed
via GitHub actions as described in a later section.

The API Endpoints are:
* [POST /submit](aws/submit.py): Submits a dataset to the MDF Connect service. This triggers a Globus Automate flow
* [GET /status](aws/status.py): Returns the status of a dataset submission
* [POST /submissions](aws/submissions.py): Forms a query and returns a list of submissions

# Globus Automate Flow
The Globus Automate flow is a series of steps that are triggered by the POST
/submit endpoint. The flow is defined using a python dsl that can be found
in [automate/minimus_mdf_flow.py](automate/minimus_mdf_flow.py). At a high
level the flow:
1. Notifies the admin that a dataset has been submitted
2. Checks to see if the data files have been updated or if this is a metadata only submission
3. If there is a dataset, it starts a globus transfer
4. Once the transfer is complete it may trigger a curation step if the organization is configured to do so
5. A DOI is minted if the organization is configured to do so
6. The dataset is indexed in MDF Search
7. The user is notified of the completion of the submission


# Development Workflow
Changes should be made in a feature branch based off of the dev branch. Create
PR and get a friend to review your changes. Once the PR is approved, merge it
into the dev branch. The dev branch is automatically deployed to the dev
environment. Once the changes have been tested in the dev environment, create a
PR from dev to main. Once the PR is approved, merge it into main. The main
branch is automatically deployed to the prod environment.

# Deployment
The MDF Connect service is deployed on AWS into development and production
environments. The automate flow is deployed into the Globus Automate service via
a second GitHub action.

## Deploy the Automate Flow
Changes to the automate flow are deployed via a GitHub action, triggered by the
push of a new GitHub release. If the release is tagged as "pre-release" it will
be deployed to the dev environment, otherwise it will be deployed to the prod
environment.

The flow IDs for dev and prod are stored in
[automate/mdf_dev_flow_info.json](automate/mdf_dev_flow_info.json) and
[automate/mdf_prod_flow_info.json](automate/mdf_prod_flow_info.json)
respectively. The flow ID is stored in the `flow_id` key.

### Deploy a Dev Release of the Flow
1. Merge your changes into the `dev` branch
2. On the GitHub website, click on the _Release_ link on the repo home page.
3. Click on the _Draft a new release_ button
4. Fill in the tag version as `X.Y.Z-alpha.1` where X.Y.Z is the version number. You can use subsequent alpha tags if you need to make further changes.
5. Fill in the release title and description
6. Select `dev` as the Target branch
7. Check the _Set as a pre-release_ checkbox
8. Click the _Publish release_ button

### Deploy a Prod Release of the Flow
1. Merge your changes into the `main` branch
2. On the GitHub website, click on the _Release_ link on the repo home page.
3. Click on the _Draft a new release_ button
4. Fill in the tag version as `X.Y.Z` where X.Y.Z is the version number.
5. Fill in the release title and description
6. Select `main` as the Target branch
7. Check the _Set as the latest release_ checkbox
8. Click the _Publish release_ button

You can verify deployment of the flows in the
[Globus Automate Console](https://app.globus.org/flows/library).

## Deploy the MDF Connect Service
The MDF Connect service is deployed via a GitHub action. The action is triggered
by a push to the dev or main branch. The action will deploy the service to the
dev or prod environment respectively.

## Updating Schemas
Schemas and the MDF organization database are managed in the automate branch
of the [Data Schemas Repo](https://github.com/materials-data-facility/data-schemas/tree/automate).

The schema is deployed into the docker images used to serve up the lambda
functions.

# Running Tests
To run the tests first make sure that you are running python 3.7.10. Then install the dependencies:
Expand Down
4 changes: 0 additions & 4 deletions aws/requirements_test.txt

This file was deleted.

2 changes: 1 addition & 1 deletion aws/tests/requirements-test.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
pytest
pytest<8.0
pytest-mock
pytest-bdd==4.1.0
git+https://github.com/materials-data-facility/[email protected]
Expand Down
16 changes: 8 additions & 8 deletions aws/tests/test_automate_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def set_environ(self):

@mock.patch('globus_automate_flow.GlobusAutomateFlow', autospec=True)
def test_create_transfer_items(self, _, secrets, organization, set_environ):
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=False)

data_sources = [
Expand All @@ -103,7 +103,7 @@ def test_create_transfer_items(self, _, secrets, organization, set_environ):

@mock.patch('globus_automate_flow.GlobusAutomateFlow', autospec=True)
def test_create_transfer_items_from_origin(self, _, secrets, organization):
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=False)

data_sources = [
Expand All @@ -126,7 +126,7 @@ def test_create_transfer_items_from_origin(self, _, secrets, organization):

@mock.patch('globus_automate_flow.GlobusAutomateFlow', autospec=True)
def test_create_transfer_items_from_google_drive(self, _, secrets, organization):
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
os.environ['GDRIVE_EP'] = "f00dfd6c-edf4-4c8b-a4b1-be6ad92a4fbb"
os.environ['GDRIVE_ROOT'] = "/Shared With Me"
manager = AutomateManager(secrets, is_test=False)
Expand All @@ -151,7 +151,7 @@ def test_create_transfer_items_from_google_drive(self, _, secrets, organization)

@mock.patch('globus_automate_flow.GlobusAutomateFlow', autospec=True)
def test_create_transfer_items_test_submit(self, _, secrets, organization, set_environ):
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=True)

data_sources = [
Expand All @@ -177,7 +177,7 @@ def test_create_transfer_items_test_submit(self, _, secrets, organization, set_e
def test_update_metadata_only(self, mock_automate, secrets, organization, mocker, mdf_rec):
mock_flow = mocker.Mock()
mock_automate.from_existing_flow = mocker.Mock(return_value=mock_flow)
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=False)

data_sources = [
Expand All @@ -201,7 +201,7 @@ def test_update_metadata_only(self, mock_automate, secrets, organization, mocker
def test_mint_doi(self, mock_automate, secrets, organization_mint_doi, mocker, mdf_rec, set_environ):
mock_flow = mocker.Mock()
mock_automate.from_existing_flow = mocker.Mock(return_value=mock_flow)
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=False)
assert manager.datacite_username == "datacite_prod_usrname_1234"
assert manager.datacite_password == "datacite_prod_passwrd_1234"
Expand Down Expand Up @@ -232,7 +232,7 @@ def test_mint_doi(self, mock_automate, secrets, organization_mint_doi, mocker, m
def test_mdf_portal_link(self, mock_automate, secrets, organization_mint_doi, mocker, mdf_rec, set_environ):
mock_flow = mocker.Mock()
mock_automate.from_existing_flow = mocker.Mock(return_value=mock_flow)
os.environ['PORTAL_URL'] = "https://acdc.alcf.anl.gov/mdf/detail/"
os.environ['PORTAL_URL'] = "https://materialsdatafacility.org/detail/"
manager = AutomateManager(secrets, is_test=True)

data_sources = [
Expand All @@ -249,5 +249,5 @@ def test_mdf_portal_link(self, mock_automate, secrets, organization_mint_doi, mo
update_metadata_only=False)

mock_flow.run_flow.assert_called()
assert(mock_flow.run_flow.call_args[0][0]['mdf_portal_link'] == "https://acdc.alcf.anl.gov/mdf/detail/123-456-7890-1.0.1")
assert(mock_flow.run_flow.call_args[0][0]['mdf_portal_link'] == "https://materialsdatafacility.org/detail/123-456-7890-1.0.1")

2 changes: 1 addition & 1 deletion infra/mdf/dev/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ terraform {
version = "~> 4.0.0"
}
}
required_version = "~> 1.5.5"
required_version = "~> 1.9.2"

backend "s3" {
# Replace this with your bucket name!
Expand Down
2 changes: 1 addition & 1 deletion infra/mdf/dev/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ variable "env_vars" {
GDRIVE_ROOT="/Shared With Me"
MANAGE_FLOWS_SCOPE="https://auth.globus.org/scopes/eec9b274-0c81-4334-bdc2-54e90e689b9a/manage_flows"
MONITOR_BY_GROUP="urn:globus:groups:id:5fc63928-3752-11e8-9c6f-0e00fd09bf20"
PORTAL_URL="https://acdc.alcf.anl.gov/mdf/detail/"
PORTAL_URL="https://materialsdatafacility.org/detail/"
RUN_AS_SCOPE="0c7ee169-cefc-4a23-81e1-dc323307c863"
SEARCH_INDEX_UUID="ab71134d-0b36-473d-aa7e-7b19b2124c88"
TEST_DATA_DESTINATION="globus://f10a69a9-338c-4e5b-baa1-0dc92359ab47/mdf_testing/"
Expand Down
2 changes: 1 addition & 1 deletion infra/mdf/prod/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ terraform {
version = "~> 4.0.0"
}
}
required_version = "~> 1.5.5"
required_version = "~> 1.9.2"

backend "s3" {
# Replace this with your bucket name!
Expand Down
2 changes: 1 addition & 1 deletion infra/mdf/prod/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ variable "env_vars" {
GDRIVE_ROOT="/Shared With Me"
MANAGE_FLOWS_SCOPE="https://auth.globus.org/scopes/eec9b274-0c81-4334-bdc2-54e90e689b9a/manage_flows"
MONITOR_BY_GROUP="urn:globus:groups:id:5fc63928-3752-11e8-9c6f-0e00fd09bf20"
PORTAL_URL="https://acdc.alcf.anl.gov/mdf/detail/"
PORTAL_URL="https://materialsdatafacility.org/detail/"
RUN_AS_SCOPE="4c37a999-da4b-4969-b621-58bfb243c5bc"
SEARCH_INDEX_UUID="1a57bbe5-5272-477f-9d31-343b8258b7a5"
TEST_DATA_DESTINATION="globus://f10a69a9-338c-4e5b-baa1-0dc92359ab47/mdf_testing/"
Expand Down
Loading