From 735292e5deea725f3fe8729eb674c825572e194a Mon Sep 17 00:00:00 2001 From: Shuo Zhang Date: Tue, 6 Feb 2024 16:08:54 -0500 Subject: [PATCH] fix link --- .../2024-02-01-multi-armed-bandit-and-epsilon-greedy.markdown | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-02-01-multi-armed-bandit-and-epsilon-greedy.markdown b/_posts/2024-02-01-multi-armed-bandit-and-epsilon-greedy.markdown index d6f0556..7f8df94 100644 --- a/_posts/2024-02-01-multi-armed-bandit-and-epsilon-greedy.markdown +++ b/_posts/2024-02-01-multi-armed-bandit-and-epsilon-greedy.markdown @@ -113,7 +113,7 @@ Below is the key implementation part for the epsilon-greedy: R_over_t.append(curr_R) # cumulative rewards {% endhighlight %} -Some experimentation results on a 3-arm problem can be seen here: +Some experimentation results on a 3-armed problem can be seen here: ![p15_multibandit_epsilon_greedy_1](https://raw.githubusercontent.com/WWWonderer/tech_blog/main/assets/images/p15_multibandit_epsilon_greedy_1.png){:style="display:block; margin-left:auto; margin-right:auto"} @@ -121,4 +121,4 @@ We can conclude that the value of $\epsilon$ does affect convergence, and a smal Reference: [Reinforcement Learning - An Introduction][sutton_book] by Richard S. Sutton and Andrew G. Barto (chapter 2) -[sutton_book]: https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py \ No newline at end of file +[sutton_book]: https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf \ No newline at end of file