docker proj

alonitac · Jan 8, 2024 · 4c969bc · 4c969bc
1 parent 9d7e766
commit 4c969bc
Show file tree

Hide file tree

Showing 8 changed files with 396 additions and 0 deletions.
diff --git a/docker_project/README.md b/docker_project/README.md
@@ -0,0 +1,186 @@
+# Object Detection Service
+
+
+## Background
+
+In this project, you'll design, develop and deploy an object detection service that consists of multiple containerized microservices. 
+
+Users send images through an interactive Telegram bot (the bot you've implemented in the Python project), the service detects objects in the image and send the results to the user.
+
+The service consists of 3 microservices: 
+
+- `polybot`: Telegram Bot container.
+- `yolo5`: Image prediction container based on the Yolo5 pre-train deep learning model.
+- `mongo`: MongoDB cluster to store data.
+
+## Preliminaries
+
+Create a dedicated GitHub repo for the project (or use the same GitHub repo from the previous Python project and utilize your Telegram bot implementation).
+
+## Implementation guidelines
+
+### The `mongo` microservice
+
+MongoDB is a [document](https://www.mongodb.com/document-databases), [NoSQL](https://www.mongodb.com/nosql-explained/nosql-vs-sql) database, offers high availability deployment using multiple replica sets.
+**High availability** (HA) indicates a system designed for durability and redundancy.
+A **replica set** is a group of MongoDB servers, called nodes, containing an identical copy of the data.
+If one of the servers fails, the other two will pick up the load while the crashed one restarts, without any data loss.
+
+Follow the official docs to deploy containerized MongoDB cluster on your local machine. 
+Please note that the mongo deployment should be configured **to persist the data that was stored in it**.
+
+https://www.mongodb.com/compatibility/deploying-a-mongodb-cluster-with-docker
+
+Got HA mongo deployment? great, let's move on...
+
+### The `yolo5` microservice
+
+[Yolo5](https://github.com/ultralytics/yolov5) is a state-of-the-art object detection AI model. It is known for its high accuracy object detection in images and videos.
+You'll work with a lightweight model that can detect [80 objects](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml) while running on your old, poor, CPU machine. 
+
+The service files are under the `docker_project/yolo5` directory. Copy these files to your repo.
+
+#### Develop the app
+
+The `yolo5/app.py` app is a flask based webserver, with a single endpoint `/predict`, which can be used to predict objects in images.  
+
+To use this endpoint, you don't send the image directly in the HTTP request. Instead, you attach a query parameter called `imgName` to the URL (e.g. `localhost:8081/predict?imgName=street.jpeg`), which represents an image name stored in an **S3 bucket**. 
+The service downloads this image from the S3 bucket and detect objects in it. 
+
+Take a look on the code, and complete the `# TODO`s. Feel free to change/add any functionality as you wish!
+
+#### Build and run the app
+
+The `yolo5` app can be running only as a Docker container. This is because the app depends on many files that don't exist on your local machine, but do exist in the [`ultralytics/yolov5`](https://hub.docker.com/r/ultralytics/yolov5) base image.
+
+Take a look at the provided `Dockerfile`, it's already implemented for you, no need to touch.
+
+If you run the container on your local machine, you may need to **mount** (as a volume) the directory containing the AWS credentials on your local machine (`$HOME/.aws/credentials`) to allow the container communicate with S3.  
+
+**Note: Never build a docker image with AWS credentials stored in it! Never commit AWS credentials in your source code! Never!**
+
+Once the image was built and run successfully, you can communicate with it directly by:
+
+```bash
+curl -X POST localhost:8081/predict?imgName=street.jpeg
+```
+
+For example, here is an image and the corresponding results summary:
+
+<img src="../.img/street.jpeg" width="60%">
+
+```json
+{
+    "prediction_id": "9a95126c-f222-4c34-ada0-8686709f6432",
+    "original_img_path": "data/images/street.jpeg",
+    "predicted_img_path": "static/data/9a95126c-f222-4c34-ada0-8686709f6432/street.jpeg",
+    "labels": [
+      {
+        "class": "person",
+        "cx": 0.0770833,
+        "cy": 0.673675,
+        "height": 0.0603291,
+        "width": 0.0145833
+      },
+      {
+        "class": "traffic light",
+        "cx": 0.134375,
+        "cy": 0.577697,
+        "height": 0.0329068,
+        "width": 0.0104167
+      },
+      {
+        "class": "potted plant",
+        "cx": 0.984375,
+        "cy": 0.778793,
+        "height": 0.095064,
+        "width": 0.03125
+      },
+      {
+        "class": "stop sign",
+        "cx": 0.159896,
+        "cy": 0.481718,
+        "height": 0.0859232,
+        "width": 0.053125
+      },
+      {
+        "class": "car",
+        "cx": 0.130208,
+        "cy": 0.734918,
+        "height": 0.201097,
+        "width": 0.108333
+      },
+      {
+        "class": "bus",
+        "cx": 0.285417,
+        "cy": 0.675503,
+        "height": 0.140768,
+        "width": 0.0729167
+      }
+    ],
+    "time": 1692016473.2343626
+}
+```
+
+The model detected a _person_, _traffic light_, _potted plant_, _stop sign_, _car_, and a _bus_. Try it yourself with different images.
+
+### The `polybot` microservice
+
+You can either integrate your bot implementation from the previous Python project, or use the code sample given to you under `docker_project/polybot` directory. 
+
+In case you use the code sample, make sure you have Telegram bot token, and you know how to expose your bot using `ngrok` when running it locally.
+
+In the sample code, under `bot.py` you'll find the class `ObjectDetectionBot` with a `handle_message()` method that handles incoming messages from end-users.
+When users send an image to the bot, you have to upload this image to S3 and perform an HTTP request to the `yolo5` service to predict the objects in this image.
+
+Complete the `# TODO`s in `bot.py` to achieve this goal (or implement equivalent steps if you use your own bot implementation).
+
+Here is an end-to-end example of how it may look like when all your microservices are running. Feel free to send the results to the user in any other form.
+
+<img src="../.img/polysample.jpg" width="30%">
+
+## Deploy the service in a single EC2 instance as a Docker Compose project
+
+Create a Docker Compose project in the `docker-compose.yaml` file to provision the service (all 3 microservices) in a single command (`docker compose up`).
+Deploy the compose project in a single EC2 instance located in a public subnet.
+
+Deployment notes:
+
+- Don't configure your compose file to build the images. Instead, push the `yolo5` and `polybot` images to DockerHub or an [ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/getting-started-console.html) repo and use these images. 
+- Attach an IAM role with the relevant permissions (e.g. read/write access to S3). Don't manage AWS credentials yourself, and never hard-code AWS credentials in the `docker-compose.yaml` file. 
+- Don't hard-code your telegram token in the compose file, this is a sensitive data. [Read here](https://docs.docker.com/compose/use-secrets/) how to provide your compose project this data in a safe way.  
+- Use `snyk` to clean your images from any HIGH and CRITICAL security vulnerabilities.
+
+#### Exposing the bot to Telegram server
+
+You can expose the polybot to Telegram servers by Ngrok, as done in the previous exercise (install and launch ngrok on the EC2 instance). 
+
+Alternatively, you can use the instance's **public IP address** as the registered bot app URL in Telegram servers.
+This requires some code changes in `polybot/app.py`.
+
+Since the IP address may be changed, you should retrieve the public IP dynamically when the app is launching. You can get the instance public IP **from within** the instance by:
+
+```python
+import requests 
+
+# reference https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
+TELEGRAM_APP_URL = requests.get('http://169.254.169.254/latest/meta-data/public-ipv4').text
+```
+
+In addition, your flask webserver should listen to HTTPS requests (Telegram doesn't accept unsecure HTTP communication).
+For that, you should generate **self-signed certificate**, and use it when running the flask, as well as setting the webhook in Telegram. 
+
+Here is a simple working example:    
+https://github.com/eternnoir/pyTelegramBotAPI/blob/master/examples/webhook_examples/webhook_flask_echo_bot.py
+
+
+## Submission
+
+You have to present your work to the course staff, in a **15 minutes demo**. Your presentations would be evaluated according to the below list, in order of priority:
+
+1. Showcasing a live, working demo of your work. Both locally and in the cloud.
+2. Demonstrating deep understanding of the system.
+3. Applying best practices and clean work.
+4. Successful integration of a new feature, idea, or extension. Be creative!
+
+## Good luck
diff --git a/docker_project/docker-compose.yaml b/docker_project/docker-compose.yaml
diff --git a/docker_project/polybot/app.py b/docker_project/polybot/app.py
@@ -0,0 +1,27 @@
+import flask
+from flask import request
+import os
+from bot import ObjectDetectionBot
+
+app = flask.Flask(__name__)
+
+TELEGRAM_TOKEN = os.environ['TELEGRAM_TOKEN']
+TELEGRAM_APP_URL = os.environ['TELEGRAM_APP_URL']
+
+
+@app.route('/', methods=['GET'])
+def index():
+    return 'Ok'
+
+
+@app.route(f'/{TELEGRAM_TOKEN}/', methods=['POST'])
+def webhook():
+    req = request.get_json()
+    bot.handle_message(req['message'])
+    return 'Ok'
+
+
+if __name__ == "__main__":
+    bot = ObjectDetectionBot(TELEGRAM_TOKEN, TELEGRAM_APP_URL)
+
+    app.run(host='0.0.0.0', port=8443)
diff --git a/docker_project/polybot/bot.py b/docker_project/polybot/bot.py
@@ -0,0 +1,77 @@
+import telebot
+from loguru import logger
+import os
+import time
+from telebot.types import InputFile
+
+
+class Bot:
+
+    def __init__(self, token, telegram_chat_url):
+        # create a new instance of the TeleBot class.
+        # all communication with Telegram servers are done using self.telegram_bot_client
+        self.telegram_bot_client = telebot.TeleBot(token)
+
+        # remove any existing webhooks configured in Telegram servers
+        self.telegram_bot_client.remove_webhook()
+        time.sleep(0.5)
+
+        # set the webhook URL
+        self.telegram_bot_client.set_webhook(url=f'{telegram_chat_url}/{token}/', timeout=60)
+
+        logger.info(f'Telegram Bot information\n\n{self.telegram_bot_client.get_me()}')
+
+    def send_text(self, chat_id, text):
+        self.telegram_bot_client.send_message(chat_id, text)
+
+    def send_text_with_quote(self, chat_id, text, quoted_msg_id):
+        self.telegram_bot_client.send_message(chat_id, text, reply_to_message_id=quoted_msg_id)
+
+    def is_current_msg_photo(self, msg):
+        return 'photo' in msg
+
+    def download_user_photo(self, msg):
+        """
+        Downloads the photos that sent to the Bot to `photos` directory (should be existed)
+        :return:
+        """
+        if not self.is_current_msg_photo(msg):
+            raise RuntimeError(f'Message content of type \'photo\' expected')
+
+        file_info = self.telegram_bot_client.get_file(msg['photo'][-1]['file_id'])
+        data = self.telegram_bot_client.download_file(file_info.file_path)
+        folder_name = file_info.file_path.split('/')[0]
+
+        if not os.path.exists(folder_name):
+            os.makedirs(folder_name)
+
+        with open(file_info.file_path, 'wb') as photo:
+            photo.write(data)
+
+        return file_info.file_path
+
+    def send_photo(self, chat_id, img_path):
+        if not os.path.exists(img_path):
+            raise RuntimeError("Image path doesn't exist")
+
+        self.telegram_bot_client.send_photo(
+            chat_id,
+            InputFile(img_path)
+        )
+
+    def handle_message(self, msg):
+        """Bot Main message handler"""
+        logger.info(f'Incoming message: {msg}')
+        self.send_text(msg['chat']['id'], f'Your original message: {msg["text"]}')
+
+
+class ObjectDetectionBot(Bot):
+    def handle_message(self, msg):
+        logger.info(f'Incoming message: {msg}')
+
+        if self.is_current_msg_photo(msg):
+            photo_path = self.download_user_photo(msg)
+
+            # TODO upload the photo to S3
+            # TODO send a request to the `yolo5` service for prediction
+            # TODO send results to the Telegram end-user
diff --git a/docker_project/polybot/requirements.txt b/docker_project/polybot/requirements.txt
@@ -0,0 +1,5 @@
+pyTelegramBotAPI>=4.12.0
+loguru>=0.7.0
+requests>=2.31.0
+flask>=2.3.2
+matplotlib
diff --git a/docker_project/yolo5/Dockerfile b/docker_project/yolo5/Dockerfile
@@ -0,0 +1,10 @@
+FROM ultralytics/yolov5:latest-cpu
+WORKDIR /usr/src/app
+RUN pip install --upgrade pip
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+RUN curl -L https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt -o yolov5s.pt
+
+COPY . .
+
+CMD ["python3", "app.py"]
diff --git a/docker_project/yolo5/app.py b/docker_project/yolo5/app.py
@@ -0,0 +1,83 @@
+import time
+from pathlib import Path
+from flask import Flask, request
+from detect import run
+import uuid
+import yaml
+from loguru import logger
+import os
+
+images_bucket = os.environ['BUCKET_NAME']
+
+with open("data/coco128.yaml", "r") as stream:
+    names = yaml.safe_load(stream)['names']
+
+app = Flask(__name__)
+
+@app.route('/predict', methods=['POST'])
+def predict():
+    # Generates a UUID for this current prediction HTTP request. This id can be used as a reference in logs to identify and track individual prediction requests.
+    prediction_id = str(uuid.uuid4())
+
+    logger.info(f'prediction: {prediction_id}. start processing')
+
+    # Receives a URL parameter representing the image to download from S3
+    img_name = request.args.get('imgName')
+
+    # TODO download img_name from S3, store the local image path in original_img_path
+    #  The bucket name should be provided as an env var BUCKET_NAME.
+    original_img_path = ...
+
+    logger.info(f'prediction: {prediction_id}/{original_img_path}. Download img completed')
+
+    # Predicts the objects in the image
+    run(
+        weights='yolov5s.pt',
+        data='data/coco128.yaml',
+        source=original_img_path,
+        project='static/data',
+        name=prediction_id,
+        save_txt=True
+    )
+
+    logger.info(f'prediction: {prediction_id}/{original_img_path}. done')
+
+    # This is the path for the predicted image with labels
+    # The predicted image typically includes bounding boxes drawn around the detected objects, along with class labels and possibly confidence scores.
+    predicted_img_path = Path(f'static/data/{prediction_id}/{original_img_path}')
+
+    # TODO Uploads the predicted image (predicted_img_path) to S3 (be careful not to override the original image).
+
+    # Parse prediction labels and create a summary
+    pred_summary_path = Path(f'static/data/{prediction_id}/labels/{original_img_path.split(".")[0]}.txt')
+    if pred_summary_path.exists():
+        with open(pred_summary_path) as f:
+            labels = f.read().splitlines()
+            labels = [line.split(' ') for line in labels]
+            labels = [{
+                'class': names[int(l[0])],
+                'cx': float(l[1]),
+                'cy': float(l[2]),
+                'width': float(l[3]),
+                'height': float(l[4]),
+            } for l in labels]
+
+        logger.info(f'prediction: {prediction_id}/{original_img_path}. prediction summary:\n\n{labels}')
+
+        prediction_summary = {
+            'prediction_id': prediction_id,
+            'original_img_path': original_img_path,
+            'predicted_img_path': predicted_img_path,
+            'labels': labels,
+            'time': time.time()
+        }
+
+        # TODO store the prediction_summary in MongoDB
+
+        return prediction_summary
+    else:
+        return f'prediction: {prediction_id}/{original_img_path}. prediction result not found', 404
+
+
+if __name__ == "__main__":
+    app.run(host='0.0.0.0', port=8081)