How to subset ROIs by size? #676

km4htc · 2021-01-25T06:49:11Z

km4htc
Jan 25, 2021

I've been taking photos on two different, but overall comparable cameras—both canon DSLRs, but one's had the IR filter removed—and so I've trained two classifiers for each. The classifiers were trained on the same set of paired photos, but the one for the "IR" photos runs wayyyyyy faster when classifying. (Probably an important detail: the photos aren't equal sizes—the IR photos are smaller which I'm sure is part of the explanation, but where it might take ~30s to classify an IR image it's more like 5-10 min for a standard one). It's a bit of an apples and oranges comparison (different fruits but fruits nonetheless), but I'm wondering what would cause one to be so much more efficient and if there's a way to optimize the slower classifier. Would re-training the slow classifier on additional images help at all? One part curious, one part getting impatient waiting for these to finish.

CORRECTION/UPDATE: I realized that it's not actually the classification step that's running slow, but rather some downstream object finding that's much slower for the standard images... apologies for not testing this in a more specific way before posting.

It seems that plantcv.roi_objects() is the bottleneck in the classifying workflow I'm using because there are so many small "noisy" objects in every image. I'm curious if there's a way to subset the output of pcv.find_objects() by the size of the contours so that you can simply keep the N largest contours (likely just your objects of interest).

In my photos I have 5 plants from left to right, though because the camera depth and angle might change photo to photo, or because some photos only have 4 or fewer plants, clustering and splitting sometimes does some weird things. I've found ways to work around it but oftentimes that means finding objects and re-finding them to hone in on exactly where the plants are for cropping and setting column numbers for clustering which ends up taking more time than I think it should. I think I could save a lot of time by finding objects once, then subsetting to the largest N (based on distribution of sizes where the plant objects are far and away larger than any noisy spots). Any ideas?

Answered by km4htc

Feb 4, 2021

Looking through the source code for roi_objects gave me the tools to figure out how to subset ROIs by size; it's cut down my post-classification time quite a bit by being able to skip some intermediate steps that I no longer needed. It's working well for me because the objects/contours I want are close to an order of magnitude larger than any unwanted objects/contours so I can pretty safely set a contour length threshold and have it work across all my images. The code below has also made it a lot more reliable at discerning between images that may have anywhere from 0-5 plants in the image by counting the objects kept after size subsetting. Anyway, here's the fix! (maybe useful to incorpo…

View full answer

HaleySchuhl · 2021-01-26T17:38:58Z

HaleySchuhl
Jan 26, 2021
Collaborator

Hi @km4htc thanks for opening a discussion! Are you using the naive bayes classifier to segment your images? That function is definitely one of the most computationally intensive since it makes probability curves for each of your classes and tests pixel by pixel for their probability of belonging to each class. Larger training sets might speed up analysis, but mostly I only recommend collecting more training data when results are systematically confusing the classes.

It is true that pcv.roi_objects can be a very slow step, especially in noisy images, since contours are looped over and tested (depending on the roi_type). Right now we do have the functionality to keep just the largest contour identified within an ROI, but it would be a bit more complicated to allow users to keep the N largest contours within an ROI. Using pcv.fill to clean up noise before pcv.find_objects can help significantly decrease the number of computations taking placing while filtering objects.

Regarding separating the individual plants with inconsistent layouts in your photos, I would suggest looking at the new spatial clustering, since it's true that other methods rely on consistent positioning of plants across a dataset. Please let us know how this works out for you and if you have any further questions!

1 reply

km4htc Feb 3, 2021
Author

Hey Haley thanks for the reply! I've been using a naive bayes classifier to create a mask (of all 5 plants in the photo) and then use cluster_contours to split them out. It's working really well!... there's just a few instances every hundred or so photos where the clustering goes wrong because the position of the plants might change slightly from day to day as I load them into the rig I've made to try to keep photos as consistent as possible.

The spatial clustering looks really cool. I tried running it when I saw your reply and ran into some issues. I just came back to it again, but seem to be hitting the same problem. When I run spatial clustering in a Jupiter notebook, it seems to be executing for a minute or so but nothing plots, and if I try to plot the output I get "pcv is not defined" so it seems to be restarting the kernel for some reason (other objects I had saved before then also get lost). Not sure if this is something specific to Jupiter notebooks in this case.

Re: the question about N largest contours (feel free to totally ignore this because I'm pretty sure this is just me not entirely understanding how to subset an array of contours rather than a pcv specific problem)—is it possible to isolate a single contour + hierarchy from the output of find_objects? I think I've figured out how to iterate through the output and identify the largest ones but run into problems when I try to select a specific contour and apply it back to the original image. If I were able to do that then, at least for my specific use case, I think I could hard code a pixel area threshold to get just the contours of the plants, ditch the noise, and not have to worry about where specifically the plants are in the photo (or if I happen to have just 4 plants instead of 5). I'll try looking through the source code for roe_objects set to look for largest and see if there's a solution already thought up by you all in there :)

km4htc · 2021-02-04T23:42:17Z

km4htc
Feb 4, 2021
Author

Looking through the source code for roi_objects gave me the tools to figure out how to subset ROIs by size; it's cut down my post-classification time quite a bit by being able to skip some intermediate steps that I no longer needed. It's working well for me because the objects/contours I want are close to an order of magnitude larger than any unwanted objects/contours so I can pretty safely set a contour length threshold and have it work across all my images. The code below has also made it a lot more reliable at discerning between images that may have anywhere from 0-5 plants in the image by counting the objects kept after size subsetting. Anyway, here's the fix! (maybe useful to incorporate something like it in either find_objects or roi_objects in the future?)

`from plantcv import plantcv as pcv
import numpy as np
import cv2

Read image, classify, and create mask

img, path, filename = pcv.readimage(filename = args.image)
classified = pcv.naive_bayes_classifier(rgb_img=img, pdf_file=args.classifier)
mask = pcv.visualize.colorize_masks(masks=[classified['plant'], classified['background']],
colors=['white', 'black'])
mask = pcv.rgb2gray(mask)
mask = pcv.median_blur(gray_img = mask, ksize = 10)

Keep contours larger than arg.size pixels and make a new mask with just these objects

obj_contour, obj_hierarchy = pcv.find_objects(img=img, mask=mask)
new_mask = np.zeros(np.shape(img)[:2], dtype=np.uint8)
for c, cnt in enumerate(obj_contour):
if len(obj_contour[c]) > int(args.size):
cv2.fillPoly(new_mask, [np.vstack(obj_contour[c])], (255))

Revive child contours by creating a new mask of pixels shared between original mask and new_mask, then re-find objects

mask = pcv.logical_and(mask, new_mask)
obj_contour, obj_hierarchy = pcv.find_objects(img=img, mask=mask)

Count the number of objects/contours kept for clustering + splitting downstream

Count external contours only (some parent/children may both be kept after subsetting by size)

cnts, hier = cv2.findContours(new_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2:]
n = len(cnts)

Crop to plants only for more reliable clustering

obj_combined, mask = pcv.object_composition(mask, obj_contour, obj_hierarchy)

Find the upper, lower, left, and right bounds of the combined object for cropping

all_x = []
for o, obj in enumerate(obj_combined):
all_x.append(obj_combined[o][0][0])
all_y = []
for o, obj in enumerate(obj_combined):
all_y.append(obj_combined[o][0][1])

X = max(all_x)
x = min(all_x)
Y = max(all_y)
y = min(all_y)
cropped_img = pcv.crop(img, x=x, y=y, h=(Y-y), w=(X-x))
cropped_mask = pcv.crop(mask, x=x, y=y, h=(Y-y), w=(X-x))

Cluster img with ncol set to number of objects kept

(Assumes that you have a single row...)

clusters_i, contours, hierarchies = pcv.cluster_contours(img=cropped_img,
roi_objects=obj_contour,
roi_obj_hierarchy=obj_hierarchy,
nrow=1,
ncol=n)
clustered_img = pcv.visualize.clustered_contours(cropped_img, clusters_i, contours, hierarchies, nrow=1, ncol=n)`

1 reply

nfahlgren Feb 5, 2021
Maintainer

Awesome, thanks @km4htc, we will definitely take a look!

It's not quite the same, but possibly pcv.fill() might help? Given a binary mask it removes objects with a number of pixels less than the threshold size.

One thing you mentioned is that the true plant contours are an order of magnitude larger than the background noise. One issue with pcv.fill is that you have to pick the threshold size by guess and check. There's probably always a need for that kind of setting, but it would be interesting to consider how to have an "auto" setting where it could look at the distribution of sizes and come up with a cutoff automatically. Maybe given a hint from the user on the expected number of objects or something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to subset ROIs by size? #676

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to subset ROIs by size? #676

km4htc Jan 25, 2021

Replies: 2 comments · 2 replies

HaleySchuhl Jan 26, 2021 Collaborator

km4htc Feb 3, 2021 Author

km4htc Feb 4, 2021 Author

Read image, classify, and create mask

Keep contours larger than arg.size pixels and make a new mask with just these objects

Revive child contours by creating a new mask of pixels shared between original mask and new_mask, then re-find objects

Count the number of objects/contours kept for clustering + splitting downstream

Count external contours only (some parent/children may both be kept after subsetting by size)

Crop to plants only for more reliable clustering

Find the upper, lower, left, and right bounds of the combined object for cropping

Cluster img with ncol set to number of objects kept

(Assumes that you have a single row...)

nfahlgren Feb 5, 2021 Maintainer

km4htc
Jan 25, 2021

Replies: 2 comments 2 replies

HaleySchuhl
Jan 26, 2021
Collaborator

km4htc Feb 3, 2021
Author

km4htc
Feb 4, 2021
Author

nfahlgren Feb 5, 2021
Maintainer