Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cloud Deployment IV]: Simple neuroconv deployment #393

Closed
wants to merge 64 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
f96652b
added helper function
CodyCBakerPhD Mar 27, 2023
da3ee44
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2023
aa3619a
remake dockerfile; add dandi upload to YAML
CodyCBakerPhD Mar 29, 2023
1ae8034
debugged
CodyCBakerPhD Apr 2, 2023
8f50c80
Create aws_batch_deployment.rst
CodyCBakerPhD Apr 2, 2023
901f1e1
Delete dockerfile_neuroconv_with_rclone
CodyCBakerPhD Apr 2, 2023
d4ae252
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2023
5659b35
Merge branch 'main' into batch_helper
CodyCBakerPhD Apr 2, 2023
95ab319
Merge branch 'batch_helper' into simple_neuroconv_deployment
CodyCBakerPhD Apr 2, 2023
e822bc7
Merge branch 'main' into batch_helper
CodyCBakerPhD Apr 24, 2023
f1f7b9f
typos and formatting
bendichter Feb 18, 2024
53258c4
Merge branch 'batch_helper' into simple_neuroconv_deployment
bendichter Feb 18, 2024
9739320
resolve conflicts
Jul 15, 2024
9213391
add changelog
Jul 15, 2024
a476ba7
correct merge conflict and changelog + imports
Jul 15, 2024
4f6489d
format docstring
Jul 15, 2024
db51921
resolve conflicts
Jul 15, 2024
766185f
add changelog
Jul 15, 2024
9ae7ace
adjust changelog
Jul 15, 2024
c7fb810
split estimator to different PR
Jul 15, 2024
7fedcdd
expose extra options and add tests
Jul 15, 2024
f15cb68
Merge branch 'batch_helper' into simple_neuroconv_deployment
CodyCBakerPhD Jul 15, 2024
935f038
debug import
Jul 15, 2024
7e8ef72
fix bad conflict
Jul 15, 2024
f2be008
add boto3 to requirements
Jul 15, 2024
a4e7bf5
pass AWS credentials in function and actions
Jul 15, 2024
16ef3f6
Merge branch 'main' into batch_helper
CodyCBakerPhD Jul 22, 2024
4939c60
pass secrets
CodyCBakerPhD Jul 22, 2024
7c66c82
correct keyword name
CodyCBakerPhD Jul 22, 2024
b115adb
debug role fetching
CodyCBakerPhD Jul 22, 2024
dfcb148
fix syntax
CodyCBakerPhD Jul 22, 2024
57f65ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 22, 2024
38327f7
splinter out aws tests to reduce costs
CodyCBakerPhD Jul 22, 2024
90deef6
splinter out aws tests to reduce costs
CodyCBakerPhD Jul 22, 2024
0b6e429
temporarily disable
CodyCBakerPhD Jul 22, 2024
06e9bdb
fix suffix
CodyCBakerPhD Jul 22, 2024
fe16dde
limit matrix to reduce costs
CodyCBakerPhD Jul 22, 2024
7f40885
cancel previous
CodyCBakerPhD Jul 22, 2024
34328cf
remove iam role stuff; has to be set on user
CodyCBakerPhD Jul 22, 2024
17898f4
fix API call
CodyCBakerPhD Jul 22, 2024
de4e18f
update to modern standard; expose extra options; rename argument
CodyCBakerPhD Jul 22, 2024
47cc917
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 22, 2024
4eea2db
fix keyword argument in tests
CodyCBakerPhD Jul 22, 2024
4b22903
add status helper
CodyCBakerPhD Jul 22, 2024
e16551d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 22, 2024
29aa19b
debug
CodyCBakerPhD Jul 22, 2024
37223c9
enhance doc
CodyCBakerPhD Jul 22, 2024
1b4d88f
try not casting as strings
CodyCBakerPhD Jul 22, 2024
829e5f2
fix deserialization type
CodyCBakerPhD Jul 22, 2024
e76897f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 22, 2024
df8cb10
debug
CodyCBakerPhD Jul 22, 2024
67e8405
expose submission ID
CodyCBakerPhD Jul 22, 2024
297476f
fix datetime typing
CodyCBakerPhD Jul 22, 2024
2cfaf58
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 22, 2024
9ad5ef2
update test to new structure
CodyCBakerPhD Jul 22, 2024
4db6141
remove trigger
CodyCBakerPhD Jul 22, 2024
6949be0
restore trigger
CodyCBakerPhD Jul 22, 2024
c193c55
Merge branch 'batch_helper' into simple_neuroconv_deployment
CodyCBakerPhD Jul 22, 2024
c990ddf
Merge remote-tracking branch 'origin/simple_neuroconv_deployment' int…
Jul 22, 2024
26c5f69
resolve conflict
Jul 22, 2024
37d5be4
finish initial structure for deployment helper
CodyCBakerPhD Jul 24, 2024
9af5b99
separate base code; add new entrypoint; adjust dockerfiles; add EFS c…
CodyCBakerPhD Jul 25, 2024
4022b60
fix tests; make deletion safe
CodyCBakerPhD Jul 25, 2024
4491a7f
debugs
CodyCBakerPhD Jul 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/workflows/aws_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: AWS Tests
on:
schedule:
- cron: "0 16 * * 1" # Weekly at noon on Monday

concurrency: # Cancel previous workflows on the same pull request
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
DANDI_API_KEY: ${{ secrets.DANDI_API_KEY }}

jobs:
run:
name: ${{ matrix.os }} Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v4
- run: git fetch --prune --unshallow --tags
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Global Setup
run: |
python -m pip install -U pip # Official recommended way
git config --global user.email "[email protected]"
git config --global user.name "CI Almighty"

- name: Install full requirements
run: pip install .[aws,test]

- name: Run subset of tests that use AWS live services
run: pytest -rsx -n auto tests/test_minimal/test_tools/aws_tools.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
name: Build and Upload Docker Image of latest with YAML variable to GHCR
name: Build and Upload Docker Image of Current Dev Branch to GHCR

on:
workflow_run:
workflows: [build_and_upload_docker_image_latest_release]
types: [completed]
schedule:
- cron: "0 16 * * 1" # Weekly at noon EST on Monday
workflow_dispatch:

concurrency: # Cancel previous workflows on the same pull request
Expand All @@ -12,7 +11,7 @@ concurrency: # Cancel previous workflows on the same pull request

jobs:
release-image:
name: Build and Upload Docker Image of latest with YAML variable to GHCR
name: Build and Upload Docker Image of Current Dev Branch to GHCR
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -27,11 +26,16 @@ jobs:
registry: ghcr.io
username: ${{ secrets.DOCKER_UPLOADER_USERNAME }}
password: ${{ secrets.DOCKER_UPLOADER_PASSWORD }}
- name: Build and push YAML variable image based on latest
- name: Get current date
id: date
run: |
date_tag="$(date +'%Y-%m-%d')"
echo "date_tag=$date_tag" >> $GITHUB_OUTPUT
- name: Build and push
uses: docker/build-push-action@v5
with:
push: true # Push is a shorthand for --output=type=registry
tags: ghcr.io/catalystneuro/neuroconv:yaml_variable
tags: ghcr.io/catalystneuro/neuroconv:dev,ghcr.io/catalystneuro/neuroconv:${{ steps.date.outputs.date_tag }}
context: .
file: dockerfiles/neuroconv_latest_yaml_variable
file: dockerfiles/neuroconv_dev_for_ec2_deployment
provenance: false
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Parse the version from the GitHub latest release tag
id: parsed_version
run: |
Expand All @@ -26,6 +27,7 @@ jobs:
echo "version_tag=$version_tag" >> $GITHUB_OUTPUT
- name: Printout parsed version for GitHub Action log
run: echo ${{ steps.parsed_version.outputs.version_tag }}

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
Expand All @@ -36,11 +38,12 @@ jobs:
registry: ghcr.io
username: ${{ secrets.DOCKER_UPLOADER_USERNAME }}
password: ${{ secrets.DOCKER_UPLOADER_PASSWORD }}

- name: Build and push
uses: docker/build-push-action@v5
with:
push: true # Push is a shorthand for --output=type=registry
tags: ghcr.io/catalystneuro/neuroconv:latest,ghcr.io/catalystneuro/neuroconv:${{ steps.parsed_version.outputs.version_tag }}
context: .
file: dockerfiles/neuroconv_latest_release_dockerfile
file: dockerfiles/neuroconv_release_dockerfile
provenance: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Build and Upload Docker Image of Latest Release for EC2 Deployment to GHCR

on:
schedule:
- cron: "0 16 * * 1" # Weekly at noon EST on Monday
Comment on lines +4 to +5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to trigger on github release

workflow_dispatch:

concurrency: # Cancel previous workflows on the same pull request
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
release-image:
name: Build and Upload Docker Image
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Parse the version from the GitHub latest release tag
id: parsed_version
run: |
git fetch --prune --unshallow --tags
tags="$(git tag --list)"
version_tag=${tags: -6 : 6}
echo "version_tag=$version_tag" >> $GITHUB_OUTPUT
- name: Printout parsed version for GitHub Action log
run: echo ${{ steps.parsed_version.outputs.version_tag }}

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ secrets.DOCKER_UPLOADER_USERNAME }}
password: ${{ secrets.DOCKER_UPLOADER_PASSWORD }}

- name: Build and push
uses: docker/build-push-action@v5
with:
push: true # Push is a shorthand for --output=type=registry
tags: ghcr.io/catalystneuro/neuroconv_for_ec2_deployment:dev
context: .
file: dockerfiles/neuroconv_release_for_ec2_deployment
provenance: false
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ jobs:
registry: ghcr.io
username: ${{ secrets.DOCKER_UPLOADER_USERNAME }}
password: ${{ secrets.DOCKER_UPLOADER_PASSWORD }}

- name: Build and push
uses: docker/build-push-action@v5
with:
Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/live-service-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ jobs:
- name: Install full requirements
run: pip install .[test,full]

- name: Run subset of tests that use S3 live services
run: pytest -rsx -n auto tests/test_minimal/test_tools/s3_tools.py
- name: Run subset of tests that use DANDI live services
run: pytest -rsx -n auto tests/test_minimal/test_tools/dandi_transfer_tools.py
- name: Run subset of tests that use Globus live services
Expand Down
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Upcoming

### Features
* Added MedPCInterface for operant behavioral output files. [PR #883](https://github.com/catalystneuro/neuroconv/pull/883)
* Added helper function `neuroconv.tools.data_transfers.submit_aws_batch_job` for basic automated submission of AWS batch jobs. [PR #384](https://github.com/catalystneuro/neuroconv/pull/384)



## v0.5.0 (July 17, 2024)

Expand All @@ -12,7 +17,6 @@

### Features
* Added docker image and tests for an automated Rclone configuration (with file stream passed via an environment variable). [PR #902](https://github.com/catalystneuro/neuroconv/pull/902)
* Added MedPCInterface for operant behavioral output files. [PR #883](https://github.com/catalystneuro/neuroconv/pull/883)

### Bug fixes
* Fixed the conversion option schema of a `SpikeGLXConverter` when used inside another `NWBConverter`. [PR #922](https://github.com/catalystneuro/neuroconv/pull/922)
Expand Down
6 changes: 6 additions & 0 deletions dockerfiles/neuroconv_dev_for_ec2_deployment
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM python:3.11.7-slim
LABEL org.opencontainers.image.source=https://github.com/catalystneuro/neuroconv
LABEL org.opencontainers.image.description="A docker image extending the dev branch of the NeuroConv package with modifications related to deployment on EC2 Batch."
ADD ./ neuroconv
RUN cd neuroconv && pip install .[full]
CMD printf "$NEUROCONV_YAML" > run.yml && python -m neuroconv_ec2 run.yml --data-folder-path "$NEUROCONV_DATA_PATH" --output-folder-path "$NEUROCONV_OUTPUT_PATH" --overwrite --upload-to-dandiset-id "$DANDISET_ID" --update-tracking-table "$TRACKING_TABLE" --tracking-table-submission-id "$SUBMISSION_ID" --efs-volume-name-to-cleanup "$EFS_VOLUME"
4 changes: 0 additions & 4 deletions dockerfiles/neuroconv_latest_yaml_variable

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM python:3.11.7-slim
LABEL org.opencontainers.image.source=https://github.com/catalystneuro/neuroconv
LABEL org.opencontainers.image.description="A docker image for the most recent official release of the NeuroConv package."
LABEL org.opencontainers.image.description="A docker image for an official release of the full NeuroConv package."
RUN apt update && apt install musl-dev python3-dev -y
RUN pip install "neuroconv[full]"
CMD ["python -m"]
4 changes: 4 additions & 0 deletions dockerfiles/neuroconv_release_for_ec2_deployment
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM ghcr.io/catalystneuro/neuroconv:latest
LABEL org.opencontainers.image.source=https://github.com/catalystneuro/neuroconv
LABEL org.opencontainers.image.description="A docker image extending the official release of the NeuroConv package with modifications related to deployment on EC2 Batch."
CMD printf "$NEUROCONV_YAML" > run.yml && python -m neuroconv_ec2 run.yml --data-folder-path "$NEUROCONV_DATA_PATH" --output-folder-path "$NEUROCONV_OUTPUT_PATH" --overwrite --upload-to-dandiset-id "$DANDISET_ID" --update-tracking-table "$TRACKING_TABLE" --tracking-table-submission-id "$SUBMISSION_ID" --efs-volume-name-to-cleanup "$EFS_VOLUME"
168 changes: 168 additions & 0 deletions docs/developer_guide/aws_batch_deployment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
One way of deploying items on AWS Batch is to manually setup the entire workflow through AWS web UI, and to manually submit each jobs in that manner.

Deploying hundreds of jobs in this way would be cumbersome.

Here are two other methods that allow simpler deployment by using `boto3`


Semi-automated Deployment of NeuroConv on AWS Batch
---------------------------------------------------

Step 1: Transfer data to Elastic File System (EFS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The nice thing about using EFS is that we are only ever billed for our literal amount of disk storage over time, and do not need to specify a particular fixed allocation or scaling strategy.

It is also relatively easy to mount across multiple AWS Batch jobs simultaneously.

Unfortunately, the one downside is that it's pricing per GB-month is significantly higher than either S3 or EBS.

To easily transfer data from a Google Drive (or theoretically any backend supported by `rclone`), set the following environment variables for rclone credentials: `DRIVE_NAME`, `TOKEN`, `REFRESH_TOKEN`, and `EXPIRY`.

.. note:

I eventually hope to just be able to read and pass these directly from a local `rclone.conf` file, but

.. note:

All path references must point to `/mnt/data/` as the base in order to persist across jobs.

.. code: python

import os
from datetime import datetime

from neuroconv.tools.data_transfers import submit_aws_batch_job

job_name = "<unique job name>"
docker_container = "ghcr.io/catalystneuro/rclone_auto_config:latest"
efs_name = "<your EFS volume name>"

log_datetime = str(datetime.now()).replace(" ", ":") # no spaces in CLI
RCLONE_COMMAND = f"{os.environ['RCLONE_COMMAND']} -v --config /mnt/data/rclone.conf --log-file /mnt/data/submit-{log_datetime}.txt"

environment_variables = [
dict(name="DRIVE_NAME", value=os.environ["DRIVE_NAME"]),
dict(name="TOKEN", value=os.environ["TOKEN"]),
dict(name="REFRESH_TOKEN", value=os.environ["REFRESH_TOKEN"]),
dict(name="EXPIRY", value=os.environ["EXPIRY"]),
dict(name="RCLONE_COMMAND", value=RCLONE_COMMAND),
]

submit_aws_batch_job(
job_name=job_name,
docker_container=docker_container,
efs_name=efs_name,
environment_variables=environment_variables,
)


An example `RCLONE_COMMAND` for a drive named 'MyDrive' and the GIN testing data stored under `/ephy_testing_data/spikeglx/Noise4Sam_g0/` of that drive would be

.. code:

RCLONE_COMMAND = "sync MyDrive:/ephy_testing_data/spikeglx/Noise4Sam_g0 /mnt/data/Noise4Sam_g0"


Step 2: Run the YAML Conversion Specification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Continuing the example above, if we have the YAML file `test_batch.yml`

.. code:

metadata:
NWBFile:
lab: My Lab
institution: My Institution

conversion_options:
stub_test: True

data_interfaces:
ap: SpikeGLXRecordingInterface
lf: SpikeGLXRecordingInterface

experiments:
ymaze:
metadata:
NWBFile:
session_description: Testing batch deployment.

sessions:
- nwbfile_name: /mnt/data/test_batch_deployment.nwb
source_data:
ap:
file_path: /mnt/data/Noise4Sam_g0/Noise4Sam_g0_imec0/Noise4Sam_g0_t0.imec0.ap.bin
lf:
file_path: /mnt/data/Noise4Sam_g0/Noise4Sam_g0_imec0/Noise4Sam_g0_t0.imec0.lf.bin
metadata:
NWBFile:
session_id: test_batch_deployment
Subject:
subject_id: "1"
sex: F
age: P35D
species: Mus musculus

then we can run the following stand-alone script to deploy the conversion after confirming Step 1 completed successfully.

.. code:

from neuroconv.tools.data_transfers import submit_aws_batch_job

job_name = "<unique job name>"
docker_container = "ghcr.io/catalystneuro/neuroconv:dev_auto_yaml"
efs_name = "<name of EFS>"

yaml_file_path = "/path/to/test_batch.yml"

with open(file=yaml_file_path) as file:
YAML_STREAM = "".join(file.readlines()).replace('"', "'")

environment_variables = [dict(name="YAML_STREAM", value=YAML_STREAM)]

submit_aws_batch_job(
job_name=job_name,
docker_container=docker_container,
efs_name=efs_name,
environment_variables=environment_variables,
)


Step 3: Ensure File Cleanup
~~~~~~~~~~~~~~~~~~~~~~~~~~~

TODO: write a dockerfile to perform this step with the API

It's a good idea to confirm that you have access to your EFS from on-demand resources in case you ever need to go in and perform a manual cleanup operation.

Boot up a EC2 t2.micro instance using AWS Linux 2 image with minimal resources.

Create 2 new security groups, `EFS Target` (no policies set) and `EFS Mount` (set inbound policy to NFS with the `EFS Target` as the source).

On the EC2 instance, change the security group to the `EFS Target`. On the EFS Network settings, add the `EFS Mount` group.

Connect to the EC2 instance and run

.. code:

mkdir ~/efs-mount-point # or any other name you want; I do recommend keeping this in the home directory (~) for ease of access though
sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-<efs number>.efs.us-east-2.amazonaws.com:/ ~/efs-mount-point # Note that any operations performed on contents of the mounted volume must utilize sudo

and it _should_ work, but this step is known to have various issues. If you did everything exactly as illustrated above, hopefully it should work. At least it did on 4/2/2023.

You can now read, write, and importantly delete any contents on the EFS.

Until the automated DANDI upload is implemented in YAML functionality, you will need to use this method to manually remove the NWB file.

Even after, you should double check to ensure the `cleanup=True` flag to that function properly executed.



Fully Automated Deployment of NeuroConv on AWS Batch
----------------------------------------------------

Coming soon...

Approach is essentially the same as the semi-automated, I just submit all jobs at the same time with the jobs being dependent on the completion of one another.
Loading
Loading