Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Position Estimation #105

Open
kevinh42 opened this issue Oct 31, 2021 · 0 comments
Open

3D Position Estimation #105

kevinh42 opened this issue Oct 31, 2021 · 0 comments
Assignees

Comments

@kevinh42
Copy link
Contributor

We want to find the 3D positions of objects in the camera's view.

There are 6D pose estimation methods, such as CosyPose, but training can be heavy and inference can be slow (~0.3s according to the paper)

For the competition tasks, we have a few simplications we can work with:

  1. Objects are typically flat and so can be reasonably assumed to be on a 2D plane
  2. The dimensions of each object will be provided

Some idea:

  1. Using corners of YOLOv4 bounding boxes to estimate pose: This will be accurate only if the object is perfectly parallel to the image plane and axis-aligned and if the object lies completely within the image and distortion is ignored (so likely not accurate at all in most cases).
  2. Training a CNN to predict the 3D position of the object given the image and bounding box information as input. This is difficult as it requires the simulation to match the actual camera parameters (which will be different for each camera on the robot).
  3. Training a CNN to find the corners of a bounding box aligned to the object's plane. Then, assuming we can resolve the ambiguity of the object's rotation relative to this bounding box (should only be 2 choices here for flat objects), solvePnP can be used to figure out the object's 6D pose (this is similar to the method used to determine the 6D pose of markers). The benefit of this is the camera-related math is decoupled from the bounding box regression.
  4. Threshold out the object from the image after cropping it out with the bounding box, then use cv::minAreaRect to create a rotated bounding box. Then use solvePnp as in 3.

1 can be implemented pretty quickly but will likely not work very well. 2 seems too reliant on hardware at this stage. 3 would require some modification to the bounding box generator but seems doable. 4 would be more accurate than 1 but less accurate than 3, and the thresholding may be difficult to achieve in practice.

@kevinh42 kevinh42 self-assigned this Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant