Skip to content

sidps/dist_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Mar 9, 2019
aa8a2ae · Mar 9, 2019

History

1 Commit
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019
Mar 9, 2019

Repository files navigation

Description

This repo contains all code related to my Masters thesis involving distributed training of neural networks.

Link to Thesis on arXiv: https://arxiv.org/abs/1812.02407


Installation:

Using virtualenv is highly recommend. Developed using Python 3.6.5 and PyTorch 0.3.1

Assuming you're using Python 3.6.5 inside a virtual environment with pip available, you will first need to install PyTorch

On a mac (no CUDA), use:

$ pip install http://download.pytorch.org/whl/torch-0.3.1-cp36-cp36m-macosx_10_7_x86_64.whl

On Linux (with CUDA 8), use:

$ pip install http://download.pytorch.org/whl/cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl

(or pick a different binary with pytorch==0.3.1: https://pytorch.org/previous-versions/)

And then ...

$ pip install -r requirements.txt

Execution

$ python main.py

will run the default "experiment" - the Iris classification task, with default configs - 4 workers, data split evenly, a 3-layer neural network with 3-way Softmax classifier, run over 3 epochs using all-reduce.

This is tiny enough that it should run on any modern computer in seconds, and serves well as a Hello World.

Run the following for more options:

$ python main.py --help

The most useful arguments would be:

--experiment {iris,mnist,cifar10}
--agg-method {local,noComm,gradAllReduce,elasticGossip,gossipingSgd}
                      aggregation method used to aggregate gradients or
                      params across all workers during training
--agg-period AGG_PERIOD
                      if applicable, the period at which an aggregation
                      occurs
--agg-prob AGG_PROB   if applicable, the probability with which agg occurs
--elastic-alpha ELASTIC_ALPHA
                      "moving rate" for elastic gossip

Logging and output

Logs are Bunyan formatted, so you will need the Bunyan CLI tool to view them.

$ npm install bunyan

Logs are stored at ./logs/<exp-id> where <exp-id> can be specified using the --exp-id argument, this defaults to ./logs/unspecified/.

Logs are Bunynan-formatted, which means they're also JSON formatted. If you'd simply like to read them:

$ cat <logs> | bunyan -o short -l INFO

The logs folder has one log file for each worker, identified by rank, and a metadata.json, which is a dump of the command-line arguments including the defaults.

$ cat ./logs/unspecified/metadata.json | jq

Tests

$ python -m pytest

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published