A problem about Q updates #20

painterner · 2018-10-26T14:31:49Z

Hello! I can't understand this (389 - 407 line in run_summarization.py), why the "dqn_best_action" use
state other than state_prime ? I think dist_q_val = -tf.log(dist) * q_value (model.py) which means we should let dist and q_value be close each other , right ? Shouldn't we use ||Q-q||^2 (https://arxiv.org/pdf/1805.09461.pdf Eq. 29)

     # 389 line
     q_estimates = dqn_results['estimates'] # shape (len(transitions), vocab_size)
      dqn_best_action = dqn_results['best_action']
      #dqn_q_estimate_loss = dqn_results['loss']

      # use target DQN to estimate values for the next decoder state
      dqn_target_results = self.dqn_target.run_test_steps(self.dqn_sess, x= b_prime._x)
      q_vals_new_t = dqn_target_results['estimates'] # shape (len(transitions), vocab_size)
      
      # 407 line
      q_estimates[i][tr.action] = tr.reward + FLAGS.gamma * q_vals_new_t[i][dqn_best_action[i]]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A problem about Q updates #20

A problem about Q updates #20

painterner commented Oct 26, 2018

A problem about Q updates #20

A problem about Q updates #20

Comments

painterner commented Oct 26, 2018