-
I've been taking photos on two different, but overall comparable cameras—both canon DSLRs, but one's had the IR filter removed—and so I've trained two classifiers for each. The classifiers were trained on the same set of paired photos, but the one for the "IR" photos runs wayyyyyy faster when classifying. (Probably an important detail: the photos aren't equal sizes—the IR photos are smaller which I'm sure is part of the explanation, but where it might take ~30s to classify an IR image it's more like 5-10 min for a standard one). It's a bit of an apples and oranges comparison (different fruits but fruits nonetheless), but I'm wondering what would cause one to be so much more efficient and if there's a way to optimize the slower classifier. Would re-training the slow classifier on additional images help at all? One part curious, one part getting impatient waiting for these to finish. CORRECTION/UPDATE: I realized that it's not actually the classification step that's running slow, but rather some downstream object finding that's much slower for the standard images... apologies for not testing this in a more specific way before posting. It seems that plantcv.roi_objects() is the bottleneck in the classifying workflow I'm using because there are so many small "noisy" objects in every image. I'm curious if there's a way to subset the output of pcv.find_objects() by the size of the contours so that you can simply keep the N largest contours (likely just your objects of interest). In my photos I have 5 plants from left to right, though because the camera depth and angle might change photo to photo, or because some photos only have 4 or fewer plants, clustering and splitting sometimes does some weird things. I've found ways to work around it but oftentimes that means finding objects and re-finding them to hone in on exactly where the plants are for cropping and setting column numbers for clustering which ends up taking more time than I think it should. I think I could save a lot of time by finding objects once, then subsetting to the largest N (based on distribution of sizes where the plant objects are far and away larger than any noisy spots). Any ideas? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @km4htc thanks for opening a discussion! Are you using the naive bayes classifier to segment your images? That function is definitely one of the most computationally intensive since it makes probability curves for each of your classes and tests pixel by pixel for their probability of belonging to each class. Larger training sets might speed up analysis, but mostly I only recommend collecting more training data when results are systematically confusing the classes. It is true that Regarding separating the individual plants with inconsistent layouts in your photos, I would suggest looking at the new spatial clustering, since it's true that other methods rely on consistent positioning of plants across a dataset. Please let us know how this works out for you and if you have any further questions! |
Beta Was this translation helpful? Give feedback.
-
Looking through the source code for roi_objects gave me the tools to figure out how to subset ROIs by size; it's cut down my post-classification time quite a bit by being able to skip some intermediate steps that I no longer needed. It's working well for me because the objects/contours I want are close to an order of magnitude larger than any unwanted objects/contours so I can pretty safely set a contour length threshold and have it work across all my images. The code below has also made it a lot more reliable at discerning between images that may have anywhere from 0-5 plants in the image by counting the objects kept after size subsetting. Anyway, here's the fix! (maybe useful to incorporate something like it in either find_objects or roi_objects in the future?) `from plantcv import plantcv as pcv Read image, classify, and create maskimg, path, filename = pcv.readimage(filename = args.image) Keep contours larger than arg.size pixels and make a new mask with just these objectsobj_contour, obj_hierarchy = pcv.find_objects(img=img, mask=mask) Revive child contours by creating a new mask of pixels shared between original mask and new_mask, then re-find objectsmask = pcv.logical_and(mask, new_mask) Count the number of objects/contours kept for clustering + splitting downstreamCount external contours only (some parent/children may both be kept after subsetting by size)cnts, hier = cv2.findContours(new_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2:] Crop to plants only for more reliable clusteringobj_combined, mask = pcv.object_composition(mask, obj_contour, obj_hierarchy) Find the upper, lower, left, and right bounds of the combined object for croppingall_x = [] X = max(all_x) Cluster img with ncol set to number of objects kept(Assumes that you have a single row...)clusters_i, contours, hierarchies = pcv.cluster_contours(img=cropped_img, |
Beta Was this translation helpful? Give feedback.
Looking through the source code for roi_objects gave me the tools to figure out how to subset ROIs by size; it's cut down my post-classification time quite a bit by being able to skip some intermediate steps that I no longer needed. It's working well for me because the objects/contours I want are close to an order of magnitude larger than any unwanted objects/contours so I can pretty safely set a contour length threshold and have it work across all my images. The code below has also made it a lot more reliable at discerning between images that may have anywhere from 0-5 plants in the image by counting the objects kept after size subsetting. Anyway, here's the fix! (maybe useful to incorpo…