You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 24, 2023. It is now read-only.
Hi, thanks for your work and release of the code, I have one question related with training location network using REINFORCE algorithm. If I understand right,
In modules.py , the following part is the implementation for REINFORCE
and for calculating the loss_reinforce and reward, the relevant part is the following
# calculate rewardpredicted=torch.max(log_probas, 1)[1]
R= (predicted.detach() ==y).float()
R=R.unsqueeze(1).repeat(1, self.num_glimpses)
...
...
# compute reinforce loss# summed over timesteps and averaged across batchadjusted_reward=R-baselines.detach()
loss_reinforce=torch.sum(-log_pi*adjusted_reward, dim=1) # gradient ascent (negative)loss_reinforce=torch.mean(loss_reinforce, dim=0)
My question is how do we update parameters in fully connected layer if we detach all the related parameters?
I read some examples on REINFORCE algorithm implementation like pytorch document and pytorch REINFORCE official example.
however, I still cannot figure out how the detach function works
I saw another similar issues #29 and issues #20
Any help would be appreciated and thanks for your time!
Best wishes
The text was updated successfully, but these errors were encountered:
Hi, please, pay attention log_pi is never detached from the computational graph. Therefore we can backpropagate through location network here: loss_reinforce = torch.sum(-log_pi * adjusted_reward, dim=1) # gradient ascent (negative)
Minimizing this loss will increase probability (log_pi) of selecting action that provided good rewad in the past. (More strictly it will increase or decrease the probability of mapping the particular hidden state vector to the particular action according to the reward of the whole trajectory!)
@malashinroman Hi, thanks for your quick and thorough reply!
As I understand now, the detach() is used as follows:
(1) two FC layers inside location network is trained through loss_reinforce through "mu" and "h_t" is not influenced by the location network.
(2)the FC layer inside baseline network is trained through loss_baseline and not by REINFORCE .
Besides, "h_t" is not influenced by the baseline network.
Is this right?
Hi, thanks for your work and release of the code, I have one question related with training location network using REINFORCE algorithm. If I understand right,
In modules.py , the following part is the implementation for REINFORCE
and for calculating the loss_reinforce and reward, the relevant part is the following
My question is how do we update parameters in fully connected layer if we detach all the related parameters?
I read some examples on REINFORCE algorithm implementation like
pytorch document and pytorch REINFORCE official example.
however, I still cannot figure out how the detach function works
I saw another similar issues #29 and issues #20
Any help would be appreciated and thanks for your time!
Best wishes
The text was updated successfully, but these errors were encountered: