We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
求问selfplay方式下每次策略评估时的逻辑是否有问题?当opponent_elo优于ego_elo时,expected_score 较小,但此时actual_score往往较大,是不是就无法正确评价选手在比赛中的表现了?
ego_elo = np.array([self.latest_elo for _ in range(self.n_eval_rollout_threads)]) opponent_elo = np.array([self.policy_pool[key] for key in eval_choose_opponents]) expected_score = 1 / (1 + 10**((opponent_elo-ego_elo)/400)) actual_score = np.zeros_like(expected_score) diff = opponent_average_episode_rewards - eval_average_episode_rewards actual_score[diff > 100] = 1 # win actual_score[abs(diff) < 100] = 0.5 # tie actual_score[diff < -100] = 0 # lose
The text was updated successfully, but these errors were encountered:
No branches or pull requests
求问selfplay方式下每次策略评估时的逻辑是否有问题?当opponent_elo优于ego_elo时,expected_score 较小,但此时actual_score往往较大,是不是就无法正确评价选手在比赛中的表现了?
Update elo
The text was updated successfully, but these errors were encountered: