Skip to content

Commit

Permalink
fix link
Browse files Browse the repository at this point in the history
  • Loading branch information
WWWonderer committed Feb 6, 2024
1 parent 77fd68b commit 735292e
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,12 @@ Below is the key implementation part for the epsilon-greedy:
R_over_t.append(curr_R) # cumulative rewards
{% endhighlight %}

Some experimentation results on a 3-arm problem can be seen here:
Some experimentation results on a 3-armed problem can be seen here:

![p15_multibandit_epsilon_greedy_1](https://raw.githubusercontent.com/WWWonderer/tech_blog/main/assets/images/p15_multibandit_epsilon_greedy_1.png){:style="display:block; margin-left:auto; margin-right:auto"}

We can conclude that the value of $\epsilon$ does affect convergence, and a small value works best in the long run.

Reference: [Reinforcement Learning - An Introduction][sutton_book] by Richard S. Sutton and Andrew G. Barto (chapter 2)

[sutton_book]: https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py
[sutton_book]: https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf

0 comments on commit 735292e

Please sign in to comment.