Skip to content

Serving ML model in high-load models with cortex and traffic splitter

License

Notifications You must be signed in to change notification settings

puhoshville/cortex-multiarmed-bandit

Repository files navigation

cortex-multiarmed-bandit

Serving ML models in a high-load manner with cortex and traffic splitter.

With this template, you can deploy real-time recommender systems behind the multi-armed bandit and balance traffic. No knowledge of Kubernetes or autoscaling is needed! It's all there out of the box.

At the heart of this project is the open-source Cortex project and its unique feature: TrafficSplitter. You only need to prepare models, this project will do the rest. Cheers!

Project description

Here is an example multi-armed bandit with two models behind: one return only postirive random numbers, second – only negatives.

Also simple executor.py provided. It allows you to execute requests to the models and provide some feedback on it.

Before start

1. Clone this repo:

```bash
git clone https://github.com/puhoshville/cortex-multiarmed-bandit.git
```

2. Setup AWS account and make

Pls, make sure that you have AWS CLI installed.

For more information read this.

3. Install cortex 0.39.1 or greater

We have to install cortex explicitly through pip! No go-binary installation!

# install the CLI
pip install cortex

! More actual information you can find here: https://docs.cortex.dev

Build and push images

We have two separate files: model_a.py and model_b.py. Model A returns random positive numbers, Model B – negative. So, we always can identify the model – it will help us later.

1. Build model images

For each model we have to create separate docker container. For this purpose we use Docker's multi-stage builds.

This images use the base, but serves different models.

For building models:

docker build . --target model-a -t cortex-bandit:model-a
docker build . --target model-b -t cortex-bandit:model-b

2. Check images

To make sure image working correctly we can run it locally:

docker run --rm -it -p 8080:8080 cortex-bandit:model-a

And do some requests:

curl -X POST -H "Content-Type: application/json" -d '{"msg": "hello world"}' localhost:8080

We will se something like this:

$ curl -X POST -H "Content-Type: application/json" -d '{"msg": "hello world"}' localhost:8080
78

3. Push image

  1. Make sure, that aws cli tool is installed
  2. Login into AWS ECR
    aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com
  3. Create repository (needed only once)
    aws ecr create-repository --repository-name cortex-bandit
  4. Tag images
    docker tag cortex-bandit:model-a <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a
    docker tag cortex-bandit:model-b <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b
  5. Push it
    docker push <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a
    docker push <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b

Specify links <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a and <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b in cortex.yaml

! If you are using Apple M1 core, please use this command to build and push docker images:

docker buildx build --platform linux/amd64  . --target model-a --push -t 385626522460.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a
docker buildx build --platform linux/amd64  . --target model-b --push -t 385626522460.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b

Run cluster

In cluster.yaml you can find simple Kubernetes cluster configuration, which includes 1 or 2 instances of t3.large type.

cortex cluster up cluster.yaml 

Be patient! It can take a while!

For more information about cluster configuration look here

Run services

Specify your docker images links in cortex.yaml.

After that you can run this command:

cortex deploy cortex.yaml

Executor

  1. Install requirements: pip install -r requirements-executor.txt

  2. Get your api endpoint:

    cortex get multiarmed-bandit 
  3. Place this url into URL variable in executor.py

  4. Get operator endpoint:

    cortex cluster info
  5. Place this endpoint into operator_endpoint parameter in executor.py

  6. Run:

    python executor.py    

About

Serving ML model in high-load models with cortex and traffic splitter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published