v4 vs v0 #187

talium0713 · 2021-11-20T05:40:40Z

Hi. I am trying to compare my proposed algorithms with 50M frames because of high computation.
I found that the benchmark is set with 'v0' which uses sticky_actions (25% random).
However, most of the papers report their score with 'v4', so I wonder if there is a big difference in performance between v0 and v4.
If there is a benchmark in v4, it would be really appreciated if you could share it.

By the way, I'm a huge fan of your works and always following up on your DRL suggestions.
Thanks.

psc-g · 2021-11-20T14:07:55Z

hi, thanks for your note! indeed, we use v0 by default as suggested by machado et al. <https://arxiv.org/abs/1709.06009>, as it is a more robust evaluation protocol. in our white paper <https://arxiv.org/abs/1812.06110> we do compare v0 with v4 (see figure 6), and you can see there are fairly significant differences. unfortunately we don't have checkpoints for the v4 runs. if computation is a concern, lots of recent papers have been evaluating on just 100K frames. this can be very noisy, but we provide some guidance into how to get more statistically significant results in our recent paper <https://arxiv.org/abs/2108.13264>. hope this helps, and good luck!

…

-psc

On Sat, Nov 20, 2021 at 12:41 AM talium0713 ***@***.***> wrote: Hi. I am trying to compare my proposed algorithms with 50M frames because of high computation. I found that the benchmark is set with 'v0' which uses sticky_actions (25% random). However, most of the papers report their score with 'v4', so I wonder if there is a big difference in performance between v0 and v4. If there is a benchmark in v4, it would be really appreciated if you could share it. By the way, I'm a huge fan of you and always following up on your DRL suggestions. Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#187>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMO4KKMVFUTNTA4N4KTUM4YHRANCNFSM5INSLMFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

talium0713 · 2021-11-21T09:20:26Z

Thanks for the good advice with great papers!
Honestly, atari environment is a heavy experiment that requires too much computational resources for an individual researcher like me.
On the other hand, many researchers seem to have a hard time as many reviewers require the overall results of the Atari games.

In that regard, the study on the evaluation methodology you proposed is likely to be a more important paper.
I wonder if evaluating only on 100K is a reliable method, but I will definitely read your paper and try to understand it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4 vs v0 #187

v4 vs v0 #187

talium0713 commented Nov 20, 2021 •

edited

Loading

psc-g commented Nov 20, 2021 via email

talium0713 commented Nov 21, 2021

v4 vs v0 #187

v4 vs v0 #187

Comments

talium0713 commented Nov 20, 2021 • edited Loading

psc-g commented Nov 20, 2021 via email

talium0713 commented Nov 21, 2021

talium0713 commented Nov 20, 2021 •

edited

Loading