This repository demonstrates using Paperspace Gradient to train and deploy a deep learning model to recognize handwritten characters, which is a canonical sample problem in machine learning.
We build a convolutional neural network to classify the MNIST dataset using the tf.data, tf.estimator.Estimator, and tf.layers APIs.
pip install -U gradient
Please check our documentation on how to install Gradient CLI and obtain a Token
Please check our documentation on how to create a project and get the project id
gradient experiments run singlenode \
--name mnist \
--projectId <your-project-id> \
--experimentEnv "{\"EPOCHS_EVAL\":5,\"TRAIN_EPOCHS\":10,\"MAX_STEPS\":1000,\"EVAL_SECS\":10}" \
--container tensorflow/tensorflow:1.13.1-gpu-py3 \
--machineType K80 \
--command "python mnist.py" \
--workspace https://github.com/Paperspace/mnist-sample.git
That's it!
gradient experiments run multinode \
--name mnist-multinode \
--projectId <your-project-id> \
--experimentEnv "{\"EPOCHS_EVAL\":5,\"TRAIN_EPOCHS\":10,\"MAX_STEPS\":1000,\"EVAL_SECS\":10}" \
--experimentType GRPC \
--workerContainer tensorflow/tensorflow:1.13.1-gpu-py3 \
--workerMachineType K80 \
--workerCommand 'pip install -r requirements.txt && python mnist.py' \
--workerCount 2 \
--parameterServerContainer tensorflow/tensorflow:1.13.1-py3 --parameterServerMachineType K80 \
--parameterServerCommand 'pip install -r requirements.txt && python mnist.py' \
--parameterServerCount 1 --workspace https://github.com/Paperspace/mnist-sample.git
You can run the original Google mnist-sample code on Paperspace with minimal changes by simply setting TF_CONFIG and model_dir as follows.
First import from gradient-sdk:
from gradient_sdk import get_tf_config
then in your main():
if __name__ == '__main__':
get_tf_config()
This function will set TF_CONFIG
, INDEX
and TYPE
for each node.
For multi-worker training, as mentioned before, you need to set the TF_CONFIG
environment variable for each binary running in your cluster. The TF_CONFIG
environment variable is a JSON string that specifies the tasks that constitute a cluster, each task's address, and each task's role in the cluster.
In order to serve a Tensorflow model, simply export a SavedModel from your Tensorflow program. SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models.
Please refer to Tensorflow documentation for detailed instructions on how to export SavedModels.
tf.estimator.train_and_evaluate(mnist_classifier, train_spec, eval_spec)
#Starting to Export model
image = tf.placeholder(tf.float32, [None, 28, 28])
input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({
'image': image,
})
mnist_classifier.export_savedmodel(<export directory>,
input_fn,
strip_default_attrs=True)
#Model Exported
We use TensorFlow's SavedModelBuilder module to export the model. SavedModelBuilder saves a "snapshot" of the trained model to reliable storage so that it can be loaded later for inference.
For details on the SavedModel format, please see the documentation at SavedModel README.md.
For export directory, be sure to set it to PS_MODEL_PATH
when running a model deployment on Gradient:
export_dir = os.path.abspath(os.environ.get('PS_MODEL_PATH'))
You can also use Gradient SDK to ensure you have the correct path:
from gradient_sdk.utils import data_dir, model_dir, export_dir
Users sometimes run into local machine environment issues when trying to use Python. A common solution for this is to create and use a Python virtual environment to run Python from within. To do so:
- Create and activate a Python virtual environment (we recommend using python3.7+):
cd mnist-sample
python3 -m venv venv
source venv/bin/activate
- Install the required Python packages:
pip install -r requirements-local.txt
To train a the mnist model locally:
-
Make sure you have the latest version of TensorFlow installed.
-
Also make sure you've added the models folder to your Python path; otherwise you may encounter an error like
ImportError: No module named mnist
. -
Download the code from GitHub:
git clone [email protected]:Paperspace/mnist-sample.git
- Start training the model:
python mnist.py
Note: local training will take a long time, so be prepared to wait!
If you want to shorten model training time, you can change the max steps parameter:
python mnist.py --max_steps=1500
The mnist dataset is downloaded to the ./data
directory.
Model results are stored in the ./models
directory.
Both directories can be safely deleted if you would like to start the training over from the beginning.
You can export the model into a specific directory, in the Tensorflow SavedModel format, by using the argument --export_dir
:
python mnist.py --export_dir /tmp/mnist_saved_model
If no export directory is specified, the model is saved to a timestamped directory under ./models
subdirectory (e.g. mnist-sample/models/1513630966/
).
To test the prediction endpoint of a model deployed with Tensorflow Serving on Paperspace, run the following commands, replacing your-deployment-id
with your deployment's id:
python serving_rest_client_test.py --url https://services.paperspace.io/model-serving/your-deployment-id:predict
Optionally you can provide a path to an image file to run a prediction on, for example:
python serving_rest_client_test.py --url https://services.paperspace.io/model-serving/your-deployment-id:predict --path example5.png
Note: it may be useful to run this test from within a virtual environment to guard against issues in your local environment. To do so, use the instructions above.
Open another terminal window and run the following in the directory where you cloned this repo:
docker run -t --rm -p 8501:8501 -v "$PWD/models:/models/mnist" -e MODEL_NAME=mnist tensorflow/serving
Now you can test the local inference endpoint by running:
python serving_rest_client_test.py
Optionally you can provide a path to an image file to run a prediction on:
python serving_rest_client_test.py --path example3.png
Once you've completed local testing using the tensorflow/serving docker container, stop the running container as follows:
docker ps
docker kill <container-id-or-name>
If you are training on Tensorflow using a GPU but would like to export the model for use in Tensorflow Serving on a CPU-only server, you can train and/or export the model using --data_format=channels_last
:
python mnist.py --data_format=channels_last
The SavedModel will be saved in a timestamped directory under models
subdirectory (e.g. mnist-sample/models/1513630966/
).
You can also use the saved_model_cli
tool to inspect and execute the SavedModel.
Just type (change name mnist to your model name if you have other name):
tensorboard --logdir models/mnist