Skip to content

Scripts and tools for spinning up deep learning stacks using AWS CloudFormation. Experimental.

License

Notifications You must be signed in to change notification settings

dennisobrien/usf-dl-aws-cf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

usf-dl-aws-cf

Note: This is no longer actively developed. It was a proof of concept to provision an AWS EC2 with libraries used for the fast.ai course as USF in 2016. There are currently better ways to get a machine up and running. If I revisit the AWS option, I will update this repo.

Scripts and tools for spinning up deep learning stacks using AWS CloudFormation. Experimental.

Based on shell scripts provided by Jeremy Howard in the Deep Learning I course at USF.

Requirements

In order to run the deploy playbook you will need these installed.

  • python2 - Ansible 2.1.x does not yet fully support Python3. I recommend using a conda environment with Python 2 as the interpreter.
  • ansible - Used for setting up the AWS stack and provisioning the EC2 machine.
  • troposphere - Nice pythonic alternative to writing CloudFormation templates.
  • boto - Used by ansible for AWS services.

In addition to the software setup there are a few additional prerequisites:

  • AWS account - This is spinning up AWS services that might cost you money.
  • AWS private key - I'm assuming you have already configured keys in AWS and have access to the private key locally.

Pre-flight Check

  • vars.yml - Create a vars.yml file by copying vars.template.yml and customizing the values.
    • This is ignored by git and will not be committed.
  • passwords.yml - Create a passwords.yml using ansible-vault.
    • $ ansible-vault create passwords.yml
    • Add the following values:
---
  jupyter_notebook_password: "your_notebook_password_here"
  github_username: your_github_username
  github_access_key: your_github_access_key
- This is ignored by git and will not be committed.
  • Download cudnn-8.0-linux-x64-v5.1.tgz from the nVidia developer program. Copy it to files/.

Deploying

$ ansible-playbook -vv --private-key ~/.ssh/YOUR_AWS_KEY deploy.yml --ask-vault-pass

  • The environment variable ANSIBLE_HOST_KEY_CHECKING is required since the ec2 instance is created the first time this playbook is run so it will not be in your known_hosts.

This does the following:

  • Provisions an EC2 machine based on the instance type specified in vars.yml.
  • Installs system software on the EC2 instance.
  • Installs Anaconda.
  • Generates SSL certs and copies configuration files to secure Jupyter notebook server.
  • Installs nVidia CUDA.
  • Installs nVidia cuDNN.
  • Creates a conda environment "py3" and installs packages.
  • Writes .theanorc to configure Theano for GPU.
  • (Re-)starts Jupyter notebook service via systemd.

TODO

  • Separate the playbook into multiple roles.
  • Use the cloudformation template that creates an elastic ip (and VPC, subnet, etc.).
  • Deploying will restart jupyter notebook if there is a change to the jupyter-notebook.service definition. In this case, if there are any currently running kernels, they will be restarted. It would be nice to at least warn if this were the case.
  • Look into using EC2 spot pricing using block durations. This gives a guaranteed block of time for the instance to persist before being terminated. This seems good for many types of interactive sessions.
    • More research needed here. I think this would require attaching existing EBS volumes to newly created EC2 instances.
  • Look into using Terraform instead of Troposphere as an API layer on top of AWS.
  • Spawn the spot block EC2 from JupyterHub.

Resources

These are projects and posts I found helpful while putting this together.

About

Scripts and tools for spinning up deep learning stacks using AWS CloudFormation. Experimental.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages