You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I can't understand this (389 - 407 line in run_summarization.py), why the "dqn_best_action" use
state other than state_prime ? I think dist_q_val = -tf.log(dist) * q_value (model.py) which means we should let dist and q_value be close each other , right ? Shouldn't we use ||Q-q||^2 (https://arxiv.org/pdf/1805.09461.pdf Eq. 29)
# 389 line
q_estimates = dqn_results['estimates'] # shape (len(transitions), vocab_size)
dqn_best_action = dqn_results['best_action']
#dqn_q_estimate_loss = dqn_results['loss']
# use target DQN to estimate values for the next decoder state
dqn_target_results = self.dqn_target.run_test_steps(self.dqn_sess, x= b_prime._x)
q_vals_new_t = dqn_target_results['estimates'] # shape (len(transitions), vocab_size)
# 407 line
q_estimates[i][tr.action] = tr.reward + FLAGS.gamma * q_vals_new_t[i][dqn_best_action[i]]
The text was updated successfully, but these errors were encountered:
Hello! I can't understand this (389 - 407 line in run_summarization.py), why the "dqn_best_action" use
state other than state_prime ? I think dist_q_val = -tf.log(dist) * q_value (model.py) which means we should let dist and q_value be close each other , right ? Shouldn't we use ||Q-q||^2 (https://arxiv.org/pdf/1805.09461.pdf Eq. 29)
The text was updated successfully, but these errors were encountered: