-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: feature(pu): add Go env and AlphaZero league training #55
Conversation
puyuan1996
commented
Jul 21, 2023
- add go_env, related unittest
- add go mcts bot and alphazero/muzero config
- add league version of alphazero
- add ctree version of alphazero
…r for tictactoe and gomoku
…option, fix norm_type in az prediction net, fix temperature in az_league
…lish league config
if (action == -1) { | ||
break; | ||
} | ||
simulate_env.attr("step")(action); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simulate_env 执行step后的board可以打印出来检查是否正确
|
||
while (!node->is_leaf()) { | ||
int action; | ||
std::tie(action, node) = _select_child(node, simulate_env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
打印legal action查看每次select_child后是否输出正常。
std::vector<std::pair<int, int>> action_visits; | ||
// std::cout << "position11 " << std::endl; | ||
for (int action = 0; action < simulate_env.attr("action_space").attr("n").cast<int>(); ++action) { | ||
if (root->children.count(action)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥是count不是std::find?
Node* parent; | ||
float prior_p; | ||
int visit_count; | ||
float value_sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
统一用double?
|
||
int action = -1; | ||
Node* child = nullptr; | ||
double best_score = -9999999; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double 是浮点数 初始化为浮点数而不是负整数
// std::cout << "position8 " << std::endl; | ||
_simulate(root, simulate_env_copy, policy_forward_fn); | ||
// std::cout << "position9 " << std::endl; | ||
simulate_env_copy = py::none(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这句话貌似多余了
|
||
while (!node->is_leaf()) { | ||
int action; | ||
std::tie(action, node) = _select_child(node, simulate_env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_select_child的返回值是指针,这儿为啥还需要tie
…tago_game_state, polish go_alphazero_league_config
We have a new polished PR. |