-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPTH VALUE OF THE EACH PIXEL #261
Comments
The midas models output inverse depth maps (or images). So each pixel of the output corresponds to a value like: However, the mapping is also only relative, it doesn't tell you the exact (absolute) depth. Aside from noise/errors, the true depth value is shifted/scaled compared to the result you get from the midas output after inverting, so more like:
Where |
According to #171 I believe the equation is: |
Good point! I was thinking these are the same mathematically, but there is a difference, and having the shifting done before inverting makes more sense. |
How are A and B calculated for a video ? @JoshMSmith44 |
I believe MiDas is a single-image method and therefore there is a different A and B for each frame in the video sequence. |
but in MIDAS it calculates using, true depth and calculated depth comparison. What if we have completely new images and want to find metric depth ? |
In order to get the true depth using the above method you need to know two true depth pixel values for each relative depth image you correct (realistically you want many more). This could come from a a sensor, sparse structure-from-motion point cloud, etc. if you don't have access to true depth and you need access to metric depth then you should look into Metric depth estimation methods like ZoeDepth, Depth-Anything,and ZeroDepth. |
hi, if I just use midas output, which you said the inverse depth, to train my model. I want to get the relative depth for an image. Am I do something wrong? |
Hi, @heyoeyo, I want to know how the metric depth dataset(like DIML) and relative depth dataset(like RedWeb) to train together? Dose change metric depth dataset to relative depth dataset first? Can you help me? Thank you very much!! |
One of the MiDaS papers describes how the data is processed for training. The explanation starts on page 5, under the section: Training on Diverse Data There they describe several approaches they considered, which are later compared on plots (see page 7) showing that the combination of the 'ssitrim + reg' loss functions worked the best. These loss functions are both described on page 6 (equations 7 & 11). The explanation just above the 'ssitrim' loss is where they describe how different data sets are handled. The basic idea is that they first run their model on an input image to get a raw prediction, which is then normalized (using equation 6 in the paper). They repeat the same normalization procedure for the ground truth, and then calculate the error as: So due to the normalization step, both relative & metric depth data sources should be able to be processed/trained using the same procedure. |
@heyoeyo , thank you for your reply. And I have another question about relative depth evaluation. Why the output of model( relative depth) should be converted to metric depth, and evaluate at metric depth dataset, like NYU, KITTY, using the rmse、abs_rel.eg.? Why not just use the relative depth dataset for evaluation? |
I think it depends on what the evaluation is trying to show. Converting to metric depth would have the effect of more heavily weighting errors on scenes that have wider depth ranges. For example a 10% error on an indoor scene with elements that are only 10m away would be a 1m error, whereas a 10% error on an outdoor scene with objects 100m away would have a 10m error, and that might be something the authors want to prioritize (i.e. model accuracy across very large depth ranges). It does some strange to me that the MiDaS paper converted some results to metric depth for their experiments section though. Since it seems they just used a least squares fit to align the relative depth results with the metric ground truth (described on pg 7), it really feels like this just over-weights the performance of the model on outdoor scenes. It makes a lot more sense to do the evaluation directly in absolute depth for something like ZoeDepth, where the model is directly predicting the metric values and therefore those 1m vs 10m errors are actually relevant to the model's capability. |
How can i reach this information on midas ?
The text was updated successfully, but these errors were encountered: