More training steps lead to inaccurate matching result #15

4B5F5F4B · 2018-11-12T03:31:10Z

I generated a weight file by setting -train_steps to 100 and got pretty good matching result, but then I tried to generate another weight file by seeing -trains_steps to 500 and got nothing matched. I think more training steps should ensure more accurate matching result, am I right?

I attached the executable files(libpng 1.2.54 compiled by gcc-6.3, gcc-7.3, gcc-8.2 with different options) I used in my training and the input file I used to match.
ELF.zip
pngtest_libpng_12_54.zip

4B5F5F4B · 2018-11-12T03:38:03Z

thomasdullien · 2018-11-12T09:53:09Z

Hey there, awesome report, thanks for this. The answer is a bit complicated: More training steps is only guaranteed to give you better matching results on the examples that you train on. For "unseen" examples, overtraining / overfitting can occur; it can be seen in the diagram on this slide: https://docs.google.com/presentation/d/16r_AUSWmtGw0CNxRg60VlTqkjBRxlvjEgxF10O0imk4/edit#slide=id.g427b6e6213_2_37 For the example of "find more variants of a function we already have N examples for" and the training set in the presentation, the training starts making results worse from about 420 steps onward. For the example "get better at recognizing functions you have never seen before, just learn about compilers", this happens much earlier -- before 100 training steps. One of the steps I want to take in the future to reduce overfitting is to migrate the training code to use either Tensorflow or Julia and switch from L-BFGS to SGD-based algorithms. This should allow increasing the training data significantly, which should help reduce overfitting risk... Cheers, Thomas Am Mo., 12. Nov. 2018 um 04:38 Uhr schrieb 4B5F5F4B < [email protected]>:

…

[image: matching2] <https://user-images.githubusercontent.com/19218802/48325194-667c9880-e66f-11e8-8ddb-164abb8cee1c.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEYBvAwCiqK5ksFjW9U4rcfAiRGZ_95Eks5uuO0bgaJpZM4YY5TZ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More training steps lead to inaccurate matching result #15

More training steps lead to inaccurate matching result #15

4B5F5F4B commented Nov 12, 2018

4B5F5F4B commented Nov 12, 2018

thomasdullien commented Nov 12, 2018 via email

More training steps lead to inaccurate matching result #15

More training steps lead to inaccurate matching result #15

Comments

4B5F5F4B commented Nov 12, 2018

4B5F5F4B commented Nov 12, 2018

thomasdullien commented Nov 12, 2018 via email