You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the work. I recently start working on reinforcement learning of mathematical research (with the formal language and deduction system of a proof assistant as the environment); it's not straightforward to design a proper reward, but novelty is certainly a good measure of progress, and your work is inspiring.
One idea I have, which I also intend to apply in my project, is about the measurement of prediction error; it seems to me that some GAN idea is applicable here. The predictor can be seen as a generator, so how about training a discriminator (conditioned on the current state) with the predicted outcomes as negative samples and the actual outcomes as positive samples? Maybe then you can just predict the pixels, and the discriminator will extract features automatically and ignore any essentially unpredictable features, like the exact locations of tree leaves in a breeze. Also it would be unnecessary to distinguish between things that affect or can be controlled by the agent and things that do not.
I am a beginner in reinforcement learning apart from my participation in the Leela Zero project. I haven't looked much into the details of the various algorithms and NN architectures, and just want to get some feedback about whether the general idea is promising. Thank you in advance!
The text was updated successfully, but these errors were encountered:
My initial thoughts were the same. I read few papers which outline the similarities between RL algorithms and GAN, like for example - https://arxiv.org/pdf/1610.01945.pdf
Im not sure whether we can augment GAN with RL algorithms or would it just complicate the whole stuff
Thank you for the work. I recently start working on reinforcement learning of mathematical research (with the formal language and deduction system of a proof assistant as the environment); it's not straightforward to design a proper reward, but novelty is certainly a good measure of progress, and your work is inspiring.
One idea I have, which I also intend to apply in my project, is about the measurement of prediction error; it seems to me that some GAN idea is applicable here. The predictor can be seen as a generator, so how about training a discriminator (conditioned on the current state) with the predicted outcomes as negative samples and the actual outcomes as positive samples? Maybe then you can just predict the pixels, and the discriminator will extract features automatically and ignore any essentially unpredictable features, like the exact locations of tree leaves in a breeze. Also it would be unnecessary to distinguish between things that affect or can be controlled by the agent and things that do not.
I am a beginner in reinforcement learning apart from my participation in the Leela Zero project. I haven't looked much into the details of the various algorithms and NN architectures, and just want to get some feedback about whether the general idea is promising. Thank you in advance!
The text was updated successfully, but these errors were encountered: