Serving PyTorch Models for Inference as gRPC API
- model - example torchscript module with example image and class to label mapping file
- protos - protocol buffer defintion for gRPC service
- client.py - gRPC client example for connecting to gRPC server to run inference on sample file in model directory
- Dockerfile - containerize the gRPC server to for deployment
- Makefile - commands for generating stubs, running gRPC service, building docker image, other chores
- requirements.txt - python requirements for running this service
- server.py - gRPC server for loading and running model inference
- Refer model/torchscript.py for conversion of torchvision model to TorchScript module
- TorchScript module is written as follows:
class YourModule(nn.Module):
def __init__(self):
# initialize the quantized model with pretrained weights
# load class to label dictionary
def forward(self, input):
# run forward pass and compute classes with its probabilities
# map classes to labels
# return result
- The example file uses ImageNet Classes for mapping imagenet class to its human readable label
- Run following command to create TorchScript module:
cd model
python torchscript.py save
- Run following command to load TorchScript module and run inference over sample image:
cd model
python torchscript.py run
- Refer protos/inference.proto for Protocol Buffer definition of the gRPC Service
- Run following command to generate the stubs:
make proto
- Run the following command to start the gRPC Server
python server.py
- Run the following command to start the gRPC Client
python client.py <url>
# python client.py localhost:8000
- Example server with quantized ResNet50 is hosted on Fly.
- Configure client to run inferencing on the hosted server:
python client.py pytorch-serving.fly.dev:8000
- Example Output:
health: UP
prediction: {"pizza": 0.44644981622695923, "potpie": 0.009865873493254185, "hot pot": 0.007180684246122837, "consomme": 0.005226321052759886, "spatula": 0.0047011710703372955}
This project is licensed under MIT License.