Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selfplay方式下每次策略评估时的逻辑问题 #51

Open
cymmerida123 opened this issue Dec 6, 2024 · 0 comments
Open

selfplay方式下每次策略评估时的逻辑问题 #51

cymmerida123 opened this issue Dec 6, 2024 · 0 comments

Comments

@cymmerida123
Copy link

求问selfplay方式下每次策略评估时的逻辑是否有问题?当opponent_elo优于ego_elo时,expected_score 较小,但此时actual_score往往较大,是不是就无法正确评价选手在比赛中的表现了?

Update elo

    ego_elo = np.array([self.latest_elo for _ in range(self.n_eval_rollout_threads)])
    opponent_elo = np.array([self.policy_pool[key] for key in eval_choose_opponents])
    expected_score = 1 / (1 + 10**((opponent_elo-ego_elo)/400))

    actual_score = np.zeros_like(expected_score)
    diff = opponent_average_episode_rewards - eval_average_episode_rewards
    actual_score[diff > 100] = 1 # win
    actual_score[abs(diff) < 100] = 0.5 # tie
    actual_score[diff < -100] = 0 # lose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant