Question about L2 computation #47

wljungbergh · 2024-02-26T20:27:43Z

Hi, and thank you for your work.

When reviewing your evaluation code I find that your computation of the L2 displacement error (here) is computed using the average displacement error up to and including that particular time. This differs from how previous works (e.g., UniAD and ST-P3) have defined the metric. They instead compute the metric as the L2 norm at that particular timestep (see here and here)

I might have misunderstood your code and if so please let me know... but if not could you provide the numbers using the metric definition used in ST-P3 and UniAD? This would make them more easily comparable.

Can you please shed some light on this? Which of the two definitions is considered correct? (might very well be that UniAD and ST-P3 have defined the metric wrong).

Thanks,

rb93dett · 2024-02-27T07:47:03Z

Please refer to this issue.

wljungbergh · 2024-02-27T09:06:53Z

Thanks a lot for the clarification. I don't know how I missed that issue... sorry about that. I now see that you define the metric similarly to ST-P3.

However, upon digging into the code of UniAD they are not conforming to the definition from SP-T3, which they have acknowledged here.

planning_results_computed = results["planning_results_computed"]
planning_tab = PrettyTable()
planning_tab.field_names = [
    "metrics",
    "0.5s",
    "1.0s",
    "1.5s",
    "2.0s",
    "2.5s",
    "3.0s",
]
for key in planning_results_computed.keys():
    value = planning_results_computed[key]
    row_value = []
    row_value.append(key)
    for i in range(len(value)):
        row_value.append("%.4f" % float(value[i]))
    planning_tab.add_row(row_value)

Here, planning_results_computed is the results from a single PlanningMetric.compute() (with n_future=6), meaning that they are computing the L2 distance as the pointwise norm rather than the mean of the norms up to that timestep.

Because of this, the comparison between your method and UniAD is misleading (as VAD's numbers use the more lenient metric definition while UniAD numbers are presented in the same table but using a different metric definition).

It would decrease the confusion if you would add their performance when using your (and ST-P3's original) metric.

Here are their displacement values when using your (and ST-P3) metric definition:

Method	L2 (m) 1s	L2 (m) 2s	L2 (m) 3s
ST-P3	1.33	2.11	2.90
UniAD (their metric)	0.48	0.96	1.65
UniAD (your metric)	0.42	0.64	0.91
VAD-Tiny	0.46	0.76	1.12
VAD-Base	0.41	0.70	1.05

I will post these results on their GitHub as well in case they want to update their numbers (or show them in conjunction)

FYI, to comply with your metric definition we simply changed the code above to

for i in range(len(value)):
    row_value.append("%.4f" % float(value[:i+1].mean()))

PS. Please let us know if you think we've missed something and wrongly computed UniADs performance with your metric.

wljungbergh mentioned this issue Feb 27, 2024

Results using ST-P3 and VAD metrics OpenDriveLab/UniAD#166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about L2 computation #47

Question about L2 computation #47

wljungbergh commented Feb 26, 2024 •

edited

Loading

rb93dett commented Feb 27, 2024

wljungbergh commented Feb 27, 2024

Question about L2 computation #47

Question about L2 computation #47

Comments

wljungbergh commented Feb 26, 2024 • edited Loading

rb93dett commented Feb 27, 2024

wljungbergh commented Feb 27, 2024

wljungbergh commented Feb 26, 2024 •

edited

Loading