Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about L2 computation #47

Open
wljungbergh opened this issue Feb 26, 2024 · 2 comments
Open

Question about L2 computation #47

wljungbergh opened this issue Feb 26, 2024 · 2 comments

Comments

@wljungbergh
Copy link

wljungbergh commented Feb 26, 2024

Hi, and thank you for your work.

When reviewing your evaluation code I find that your computation of the L2 displacement error (here) is computed using the average displacement error up to and including that particular time. This differs from how previous works (e.g., UniAD and ST-P3) have defined the metric. They instead compute the metric as the L2 norm at that particular timestep (see here and here)

I might have misunderstood your code and if so please let me know... but if not could you provide the numbers using the metric definition used in ST-P3 and UniAD? This would make them more easily comparable.

Can you please shed some light on this? Which of the two definitions is considered correct? (might very well be that UniAD and ST-P3 have defined the metric wrong).

Thanks,

@rb93dett
Copy link
Collaborator

Please refer to this issue.

@wljungbergh
Copy link
Author

Thanks a lot for the clarification. I don't know how I missed that issue... sorry about that. I now see that you define the metric similarly to ST-P3.

However, upon digging into the code of UniAD they are not conforming to the definition from SP-T3, which they have acknowledged here.

planning_results_computed = results["planning_results_computed"]
planning_tab = PrettyTable()
planning_tab.field_names = [
    "metrics",
    "0.5s",
    "1.0s",
    "1.5s",
    "2.0s",
    "2.5s",
    "3.0s",
]
for key in planning_results_computed.keys():
    value = planning_results_computed[key]
    row_value = []
    row_value.append(key)
    for i in range(len(value)):
        row_value.append("%.4f" % float(value[i]))
    planning_tab.add_row(row_value)

Here, planning_results_computed is the results from a single PlanningMetric.compute() (with n_future=6), meaning that they are computing the L2 distance as the pointwise norm rather than the mean of the norms up to that timestep.

Because of this, the comparison between your method and UniAD is misleading (as VAD's numbers use the more lenient metric definition while UniAD numbers are presented in the same table but using a different metric definition).

It would decrease the confusion if you would add their performance when using your (and ST-P3's original) metric.

Here are their displacement values when using your (and ST-P3) metric definition:

Method L2 (m) 1s L2 (m) 2s L2 (m) 3s
ST-P3 1.33 2.11 2.90
UniAD (their metric) 0.48 0.96 1.65
UniAD (your metric) 0.42 0.64 0.91
VAD-Tiny 0.46 0.76 1.12
VAD-Base 0.41 0.70 1.05

I will post these results on their GitHub as well in case they want to update their numbers (or show them in conjunction)

FYI, to comply with your metric definition we simply changed the code above to

for i in range(len(value)):
    row_value.append("%.4f" % float(value[:i+1].mean()))

PS. Please let us know if you think we've missed something and wrongly computed UniADs performance with your metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants