We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
value_iteration 测试的成功率是: 0.638 ,价值算法需要不断 的迭代,做策略评估, 代码里面只做了一次迭代
The text was updated successfully, but these errors were encountered:
All of these algorithms converge to an optimal policy for discounted finite MDPs. FYI,引自强化学习导论,你可以尝试添加discount
Sorry, something went wrong.
johnjim0816
No branches or pull requests
value_iteration 测试的成功率是: 0.638 ,价值算法需要不断 的迭代,做策略评估, 代码里面只做了一次迭代
The text was updated successfully, but these errors were encountered: