-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Results for YOLO Lesion Detection #8
Comments
Great preliminary results! As you said, lesions are hard to see so I am not surprised that some are missed. But keep in mind that this is a 2D model, and if a lesion is not seen on one sagittal slice, it might be seen on the adjacent slice, which will still be useful for creating the 3D box that will be used to crop around the lesion and then run the segmentation model on a smaller 3D patch. |
Here is a recap of the tests that I've tried and their results since my last update: The new validation method was used to compare results (see issue #11) Tests1- Ratio of unlabeled slices in the train setI trained two nano models with the same parameters, but one model was trained on the full training dataset (∼55% unlabeled) and the other model was trained on a dataset containing a 25% unlabeled ratio.
Recall increased, but precision decreased. Since the precision is much higher than the recall, I chose to keep working with the 25% dataset. 2- Confidence threshold at inferenceI compared different confidence thresholds using the nano model trained on 25% unlabeled.
These results seem to show that many lower confidence boxes are correct, but since they are harder lesions to detect, the boxes might not be as precise which is why the recall varies more with the 20% IoU threshold. And as expected, a lower confidence threshold reduces the precision. 3- Model depthThe ultralytics library has 5 different model depths available: nano (n), small (s), medium (m), large (l), extra-large (x).
The x model seems to perform the best, although the m model has similar results and is much faster for training and inference. 4- Hyperparameter sweepsI used the integrated Learning rateI found that an initial learning rate of 0.09 and a final learning rate of 0.08 performed the best. Class and box lossThe It seems like a higher box loss comparatively to the class loss yielded the best results. The run with the highest recall had Augmentation:
|
Recall (IoU 40%) | Precision (IoU 40%) | Recall (IoU 20%) | Precision (IoU 20%) | |
---|---|---|---|---|
YOLOv8m best params | 36.9% | 42.7% | 46.7% | 54.4% |
YOLOv8x | 37.2% | 39.8% | 47.4% | 51.3% |
It would be interesting to train an x model on the new set of params, not sure why I haven't done that yet..
Here are a few images of the results obtained with the YOLOv8x model (red boxes are labels and blue boxes are predictions):
sub-mon211_ses-M0_PSIR
: 7 TP, 2FP, 1FN
This is one of the images with the most false negatives. There seems to be many small lesions that weren't detected:
sub-tor035_ses-M0_PSIR
: 5 TP, 1 FP, 11 FN
This is one of the images that had the biggest difference in TP, FP, FN between a 20% and 40% IoU threshold.
sub-van212_ses-M0_PSIR
3TP, 5FP, 6FN with 40% IoU and 5TP, 2FP, 4FN with 20% IoU
Great summary of your results ! Also, for the final model, if you have time, it could be nice to have a PR-curve (precision-recall curve) and also the PR-AUC (precision-recall area-under-the-curve) score. |
In the sweep with the many parameters, I had left-right flipping, translation, value (image intensity), rotation and scaling. I'm not so sure that I came to the correct conclusion though when analyzing that sweep's results... It might be worth doing sweeps with fewer augmentation parameters at a time to be able to actually see the effect of each augmentation type. But then again, I did a sweep for just scale and rotation, and those results weren't conclusive either. Perhaps they don't have a significant effect. |
I was able to train the YOLOv8 model on the Canproco database (version: bcd627ed4):
To track training progress, I used ClearML since it is easily integrated with the ultralytics yolo package.
Scripts
The
yolo_training.py
script is used to train a new model and theyolo_testing.py
script is used to evaluate the model's performance on thetest
set.Results
Test 1 - Default
All default parameters were used and mosaic data augmentation was turned off.
Here are the metrics used to track the training process:
Here were the results:
Seeing these results, my first thought was that the contrast of the images needed to be enhanced to make the lesions more visible. So I tried adding a histogram equalization step for my next test.
Test 2 - With histogram equalization
For this test, when pre-processing the data, I used skimage's adaptive histogram equalization before saving each slice as a png.
Training parameters were kept the same.
Here is the training progress:
And here were the results:
Thoughts and next steps
For testing, the IoU (intersection over union) parameter was set to its default value (0.7), which I believe means that only predictions with an IoU above 0.7 were considered true positives. This might explain the discrepancy between the low metrics and visual results (although visually, other test batches did seem to have fewer correct detections than the one shown above).
I can think of 2 main reasons why the model isn't performing as well as it could:
So for my next tests, I want to start by seeing how the IoU parameter influences metrics during testing.
Then, I want to try reducing the number of empty slices in my training set.
As for the contrast, the histogram equalization did seem to slightly improve results, but I'm not sure if there's maybe a better method to improve contrast.
The text was updated successfully, but these errors were encountered: