Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hackathon additions #99

Merged
merged 11 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
12 changes: 12 additions & 0 deletions example_pipeline/colon.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@


# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: /colon_data/polyps_small/images/train
val: /colon_data/polyps_small/images/valid
test: /colon_data/polyps_small/images/valid # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
Comment on lines +5 to +6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't test and validation be different splits?

If it is intended maybe it's nice to add a comment explaining why.


# number of classes
nc: 1

# class names
names: [ 'polyp']
228 changes: 228 additions & 0 deletions example_pipeline/find_bounding_box.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dive to deep but I found this notebook a bit confusing at a glance...

It seems to be the function find_connected_components defined three times? It is also not clear at a first glance what these connected components are.

If the notebook is converting segmentation masks to bounding boxes I think the function names and docstrings/comments should state this more clearly.

Also I think most of the recommendations for the show_bbox notebook would apply here too (see below)

Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of items: 2\n",
"Item 1: x_min=0, y_min=0, x_max=100, y_max=105\n",
"Item 2: x_min=144, y_min=225, x_max=193, y_max=278\n"
]
}
],
"source": [
"import cv2\n",
"import numpy as np\n",
"\n",
"def find_connected_components(image_path):\n",
" # Read the image\n",
" image = cv2.imread(image_path)\n",
" \n",
" # Convert the image to grayscale\n",
" gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)\n",
" \n",
" # Threshold the image to create a binary image\n",
" _, binary = cv2.threshold(gray, 1, 255, cv2.THRESH_BINARY)\n",
" \n",
" # Find connected components\n",
" num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary, connectivity=8)\n",
" \n",
" # Initialize a list to store bounding boxes\n",
" bounding_boxes = []\n",
" \n",
" for i in range(1, num_labels): # Skip the background label 0\n",
" x_min = stats[i, cv2.CC_STAT_LEFT]\n",
" y_min = stats[i, cv2.CC_STAT_TOP]\n",
" width = stats[i, cv2.CC_STAT_WIDTH]\n",
" height = stats[i, cv2.CC_STAT_HEIGHT]\n",
" x_max = x_min + width - 1\n",
" y_max = y_min + height - 1\n",
" \n",
" bounding_boxes.append((x_min, y_min, x_max, y_max))\n",
" \n",
" return num_labels - 1, bounding_boxes # Subtract 1 to exclude the background\n",
"\n",
"# Example usage\n",
"image_path = '../images/tst_seg_tst_seg500.png'\n",
"num_items, bounding_boxes = find_connected_components(image_path)\n",
"print(f\"Number of items: {num_items}\")\n",
"for i, (x_min, y_min, x_max, y_max) in enumerate(bounding_boxes):\n",
" print(f\"Item {i+1}: x_min={x_min}, y_min={y_min}, x_max={x_max}, y_max={y_max}\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "invalid syntax (2760322440.py, line 44)",
"output_type": "error",
"traceback": [
"\u001b[1;36m Cell \u001b[1;32mIn[3], line 44\u001b[1;36m\u001b[0m\n\u001b[1;33m num_items, boundinimport cv2\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n"
]
}
],
"source": [
"import cv2\n",
"import numpy as np\n",
"\n",
"def find_connected_components(image_path, output_file):\n",
" # Read the image\n",
" image = cv2.imread(image_path)\n",
" \n",
" # Convert the image to grayscale\n",
" gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)\n",
" \n",
" # Threshold the image to create a binary image\n",
" _, binary = cv2.threshold(gray, 1, 255, cv2.THRESH_BINARY)\n",
" \n",
" # Find connected components\n",
" num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary, connectivity=8)\n",
" \n",
" # Initialize a list to store bounding boxes\n",
" bounding_boxes = []\n",
" \n",
" for i in range(1, num_labels): # Skip the background label 0\n",
" x_min = stats[i, cv2.CC_STAT_LEFT]\n",
" y_min = stats[i, cv2.CC_STAT_TOP]\n",
" width = stats[i, cv2.CC_STAT_WIDTH]\n",
" height = stats[i, cv2.CC_STAT_HEIGHT]\n",
" x_max = x_min + width - 1\n",
" y_max = y_min + height - 1\n",
" \n",
" # Convert to center coordinates and dimensions\n",
" x_center = x_min + width / 2\n",
" y_center = y_min + height / 2\n",
" \n",
" bounding_boxes.append((x_center, y_center, width, height))\n",
" \n",
" # Write results to a text file\n",
" with open(output_file, 'w') as f:\n",
" for x_center, y_center, width, height in bounding_boxes:\n",
" f.write(f\"0 {x_center:.8f} {y_center:.8f} {width:.8f} {height:.8f}\\n\")\n",
" \n",
" return num_labels - 1, bounding_boxes # Subtract 1 to exclude the background\n",
"\n",
"# Example usage\n",
"image_path = '../images/tst_seg_tst_seg500.png'\n",
"output_file = '../labels/bounding_boxes.txt'\n",
"num_items, bounding_boxes = find_connected_components(image_path, output_file)\n",
"print(f\"Number of items: {num_items}\")\n",
"for i, (x_center, y_center, width, height) in enumerate(bounding_boxes):\n",
" print(f\"Item {i+1}: x_center={x_center}, y_center={y_center}, width={width}, height={height}\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of items: 2\n",
"Item 1: x_center=0.09115523465703972, y_center=0.11830357142857142, width=0.18231046931407943, height=0.23660714285714285\n",
"Item 2: x_center=0.30505415162454874, y_center=0.5625, width=0.09025270758122744, height=0.12053571428571429\n"
]
}
],
"source": [
"import cv2\n",
"import numpy as np\n",
"import os\n",
"\n",
"def find_connected_components(image_path, output_dir):\n",
" # Read the image\n",
" image = cv2.imread(image_path)\n",
" \n",
" # Get image dimensions\n",
" image_height, image_width = image.shape[:2]\n",
" \n",
" # Convert the image to grayscale\n",
" gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)\n",
" \n",
" # Threshold the image to create a binary image\n",
" _, binary = cv2.threshold(gray, 1, 255, cv2.THRESH_BINARY)\n",
" \n",
" # Find connected components\n",
" num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary, connectivity=8)\n",
" \n",
" # Initialize a list to store bounding boxes\n",
" bounding_boxes = []\n",
" \n",
" for i in range(1, num_labels): # Skip the background label 0\n",
" x_min = stats[i, cv2.CC_STAT_LEFT]\n",
" y_min = stats[i, cv2.CC_STAT_TOP]\n",
" width = stats[i, cv2.CC_STAT_WIDTH]\n",
" height = stats[i, cv2.CC_STAT_HEIGHT]\n",
" \n",
" # Convert to center coordinates and dimensions\n",
" x_center = x_min + width / 2\n",
" y_center = y_min + height / 2\n",
" \n",
" # Convert to ratios\n",
" x_center /= image_width\n",
" y_center /= image_height\n",
" width /= image_width\n",
" height /= image_height\n",
" \n",
" bounding_boxes.append((x_center, y_center, width, height))\n",
" \n",
" # Create the output file path\n",
" base_name = os.path.basename(image_path)\n",
" output_file = os.path.join(output_dir, os.path.splitext(base_name)[0] + '.txt')\n",
" \n",
" # Write results to a text file\n",
" with open(output_file, 'w') as f:\n",
" for x_center, y_center, width, height in bounding_boxes:\n",
" f.write(f\"0 {x_center:.8f} {y_center:.8f} {width:.8f} {height:.8f}\\n\")\n",
" \n",
" return num_labels - 1, bounding_boxes # Subtract 1 to exclude the background\n",
"\n",
"# Example usage\n",
"image_path = '../images/tst_seg_tst_seg500.png'\n",
"output_dir = '../labels'\n",
"num_items, bounding_boxes = find_connected_components(image_path, output_dir)\n",
"print(f\"Number of items: {num_items}\")\n",
"for i, (x_center, y_center, width, height) in enumerate(bounding_boxes):\n",
" print(f\"Item {i+1}: x_center={x_center}, y_center={y_center}, width={width}, height={height}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pgta",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
62 changes: 62 additions & 0 deletions example_pipeline/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
This is an example pipeline on how to convert output from Blender Randomiser to COCO format labelling in preperation of fine tuning polyps detection using YOLOv7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is an example pipeline on how to convert output from Blender Randomiser to COCO format labelling in preperation of fine tuning polyps detection using YOLOv7
This is an example pipeline on how to convert output from the Blender Randomiser tool to COCO annotation format in preparation of fine tuning polyps detection using YOLOv7


# Data Preperation
Once the segmentation of polyps are obtained. Use find_bounding_box.ipynb to output the bounding box information as text files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once the segmentation of polyps are obtained. Use find_bounding_box.ipynb to output the bounding box information as text files.
Once the segmentation of polyps is obtained, use `find_bounding_box.ipynb` to output the bounding box information as text files.


show_bbox.ipynb can be used to show if the correct bounding boxes have been converted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
show_bbox.ipynb can be used to show if the correct bounding boxes have been converted.
The notebook `show_bbox.ipynb` can be used to show if the correct bounding boxes have been converted.


# IMPORTANT!
**<span style="color:red">
When Google Colab session finishes, ALL data is wiped, please save the models!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When Google Colab session finishes, ALL data is wiped, please save the models!
When Google Colab session finishes, ALL data is wiped, please save the models locally!

</span>**
## To tune the model using custom data:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## To tune the model using custom data:
## To fine-tune the model using custom data:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this title maybe be "Fine-tuning a model locally"?

and then the next section "Fine-tuning a model via Google colab"?

I think that would be a bit more clear

- clone yolov7 git and install it https://github.com/WongKinYiu/yolov7; pip install -r requirements.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- clone yolov7 git and install it https://github.com/WongKinYiu/yolov7; pip install -r requirements.txt
- git clone yolov7 and install its requirements:
```
git clone https://github.com/WongKinYiu/yolov7
pip install -r requirements.txt
```

- log into wandb account (your own API - see below) DO NOT SHARE!
- run python training script

```python train.py --epochs 100 --device 0 --entity colon_coders --workers 8 --batch-size 32 --data /content/colon.yaml --img 512 512 --cfg /content/yolov7_training_config.yaml --weights '/content/yolov7_training.pt' --name yolov7-colon --hyp data/hyp.scratch.custom.yaml```

## Google colab instructions
- upload polyps.zip to google drive
- upload colon.yaml, yolov7_training.pt, yolov7_training_config.yaml to google drive
- open google colab and mount drive
- unzip polyps.zip
```python
import zipfile
with zipfile.ZipFile("/content/drive/MyDrive/polyps.zip", 'r') as zip_ref:
zip_ref.extractall("/content/colon_data")
```
- Important: remove the cache files (otherwise the model will use the cache file to load the data which has the incorrect file paths)
- could use to code in show_bbox.ipynb to see if data and bounding boxes has been loaded correctly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- could use to code in show_bbox.ipynb to see if data and bounding boxes has been loaded correctly
- You can use the notebook `show_bbox.ipynb` to verify that the data and bounding boxes have been loaded correctly

- install yolo7
```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this snippet can go above?

(the info is also repeated, so maybe you can just refer here to the steps above)

!git clone https://github.com/WongKinYiu/yolov7
%cd yolov7
!pip install -r requirements.txt
```
- set up wandb
```python
!pip install wandb
import wandb
wandb.login()
```
- tune model: make sure colon.yaml has the correct file paths for data, also make sure --data, --cfg and --weights has the correct file paths
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- tune model: make sure colon.yaml has the correct file paths for data, also make sure --data, --cfg and --weights has the correct file paths
- fine-tune model: make sure `colon.yaml` has the correct file paths for data, also make sure `--data`, `--cfg` and `--weights` have the correct file paths

Could this bit me removed?


```python train.py --epochs 100 --device 0 --entity colon_coders --workers 8 --batch-size 32 --data /content/colon.yaml --img 512 512 --cfg /content/yolov7_training_config.yaml --weights '/content/yolov7_training.pt' --name yolov7-colon --hyp data/hyp.scratch.custom.yaml```
- When training is finished, model output is saved under yolov7/runs/train
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When training is finished, model output is saved under yolov7/runs/train
- When training is finished, the model output is saved under yolov7/runs/train


## Run on test data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Run on test data
## Run evaluation on test data

If the evaluation produces metrics (mAP or equivalent) maybe it would be useful to mention it here.

!python test.py --data /content/colon.yaml --img 512 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights runs/train/yolov7-colon2/weights/best.pt --name yolov7_colon_val

## Notes
- The data location is specified in the config file colon.yaml
- Training config is specified in yolov7_training_config.yaml

## Weights and Biases
Weights and Biases is a very good tool to use to track training progress. YoloV7 uses this tool and it is very easy to set up
- https://wandb.ai/site/research sign up for the free account
- log into your account, go to top right, under your name select "user profile"
- go to section "danger zone" and reveal your API code
- this code is then used to log in to wandb when prompted
- when you finish you can change the API or throw it away.
- when training is in progress, go to WandB website, click on top left, you should see the project YOLO which will show the current training session
99 changes: 99 additions & 0 deletions example_pipeline/show_bbox.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook looks very useful but I think it could benefit from these additions:

  • adding a markdown cell at the top with a description at the top of what the notebook does (maybe this can be taken from the instructions, and removed from the instructions to avoid duplication)
  • splitting it into cells (or at least make the "Example usage" bit into a new cell)
  • adding in-line comments
  • adding docstrings to the helper functions

Large diffs are not rendered by default.

Loading
Loading