Skip to content

Latest commit

 

History

History
48 lines (40 loc) · 1.95 KB

README.md

File metadata and controls

48 lines (40 loc) · 1.95 KB

SENTENCE TRANSFORMER DEPLOYMENT ON KSERVE

Currently this code supports only EMBEDDING type of models. The sentence transformer models are based on PyTorch so we used this doc to develop the models. So this expects the models to be stored as MAR files. So this repository helps us in creating one for our models.

Consider this as example for which you want to create a MAR file - model

STEPS -

Change the model_name detail here based on sentence transformer model you choose - setup_config

Run the following commands

1. python Download_Transformer_models.py
2. torch-model-archiver --model-name sentenceTransformerMiniLM --version 1.0 --serialized-file Transformer_model_mini/pytorch_model.bin --handler ./Transformer_handler_generalized.py --extra-files "Transformer_model_mini/config.json,./setup_config.json"

Once done, you will have MAR file of the model you need. url

Later create a folder on s3 with the structure mentioned here

Once its deployed. Create a model on Kubeflow dashboard using below yaml for first deployment.

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sentence-transformers"
  namespace: kubeflow-tai-example-com
  labels:
    istio: kservegateway
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: kubeflow-user
    resources:
        limits:
          cpu: "1"
          memory: 1500Mi
        requests:
          cpu: "1"
          memory: 1000Mi
    pytorch:
      storageUri: "s3://s3-kubeflow-develop/sentenceTransformer"