You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to find the 3D positions of objects in the camera's view.
There are 6D pose estimation methods, such as CosyPose, but training can be heavy and inference can be slow (~0.3s according to the paper)
For the competition tasks, we have a few simplications we can work with:
Objects are typically flat and so can be reasonably assumed to be on a 2D plane
The dimensions of each object will be provided
Some idea:
Using corners of YOLOv4 bounding boxes to estimate pose: This will be accurate only if the object is perfectly parallel to the image plane and axis-aligned and if the object lies completely within the image and distortion is ignored (so likely not accurate at all in most cases).
Training a CNN to predict the 3D position of the object given the image and bounding box information as input. This is difficult as it requires the simulation to match the actual camera parameters (which will be different for each camera on the robot).
Training a CNN to find the corners of a bounding box aligned to the object's plane. Then, assuming we can resolve the ambiguity of the object's rotation relative to this bounding box (should only be 2 choices here for flat objects), solvePnP can be used to figure out the object's 6D pose (this is similar to the method used to determine the 6D pose of markers). The benefit of this is the camera-related math is decoupled from the bounding box regression.
Threshold out the object from the image after cropping it out with the bounding box, then use cv::minAreaRect to create a rotated bounding box. Then use solvePnp as in 3.
1 can be implemented pretty quickly but will likely not work very well. 2 seems too reliant on hardware at this stage. 3 would require some modification to the bounding box generator but seems doable. 4 would be more accurate than 1 but less accurate than 3, and the thresholding may be difficult to achieve in practice.
The text was updated successfully, but these errors were encountered:
We want to find the 3D positions of objects in the camera's view.
There are 6D pose estimation methods, such as CosyPose, but training can be heavy and inference can be slow (~0.3s according to the paper)
For the competition tasks, we have a few simplications we can work with:
Some idea:
1 can be implemented pretty quickly but will likely not work very well. 2 seems too reliant on hardware at this stage. 3 would require some modification to the bounding box generator but seems doable. 4 would be more accurate than 1 but less accurate than 3, and the thresholding may be difficult to achieve in practice.
The text was updated successfully, but these errors were encountered: