forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 1
Building on Google Cloud Platform
Ralf Gommers edited this page May 17, 2019
·
1 revision
tl;dr decently fast, a bit expensive but if you don't have a good build box available locally, this is an option.
Config:
- 16 vCPUs, 16 GB memory, 1 Tesla P100 GPU, $1.44/hr
- Intel optimized Deep Learning image: CUDA 10, MKL-DNN
- 30 GB boot disk (SSD)
- firewall: no http/https
First SSH login asks to install NVIDIA drivers --> yes There will be some warnings about 32-bit, DRM and X drivers, just ignore, not relevant.
The system has GCC 6.3.0 installed by default, no Clang:
$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
We'll be using Conda compilers, so this can be safely ignored.
Some potential issues related to logging in and working over SSH:
- SSH keys disappearing, turns out they get overwritten; one needs to add them to the Metadata Server via the Cloud Console instead (see GCP docs)
- detaching or SSH timeout kills proces:w ses. Best way to work around this is with a terminal multiplexer (tmux or screen).
- also set in /etc/ssh/ssh_config:
Host *
ServerAliveInterval 120
To install conda, dependencies for PyTorch, and start building PyTorch itself:
sudo apt-get install locate
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-<etc>.sh # install miniconda
# Now from instructions at https://github.com/pytorch/pytorch#install-dependencies:
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c pytorch magma-cuda100
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git submodule sync
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
tmux
time python setup.py develop # real: 25min, user: 364min, sys: 18min
$ time python test/test_nn.py # passes, real: 13min, user: 69min, sys: 86min