It is my firm belief that matchmaking based on skill is one of the most important things that fighting games are currently missing. Much is made about fighting games being 'difficult', but fundamentally these are arcade games that you're meant to be able to just pick up and play; and you absolutely can, provided you find a similarly skilled opponent to play with.
I'm not at all the most qualified to talk about this, merely very enthusiastic. For better overviews I highly recommend watching the GDC talks by Josh Menke, who's designed skill and rating systems for a ton of different games, for example Skill, Matchmaking, and Ranking Systems Design.
Most skill systems use some variation of the Bradley-Terry model to predict player interactions (say, Alice should beat Bob 60% of the time) and then updating the skills according to game results. If Alice beats Bob more than 60% of the time, her skill rating will increase, if she doesn't, it'll drop.
On the Elo scale, you can look at the difference in rating to determine your probability of winning:
Rating difference | Win probability |
---|---|
0 | 50% |
50 | 57% |
100 | 64% |
150 | 70% |
200 | 76% |
250 | 81% |
300 | 85% |
400 | 91% |
500 | 96% |
600 | 98% |
I'd say anything under 150 difference in rating is acceptable since then you at least have a 50% probability of picking up a game in a FT2, even if you're likely to lose overall.
On ratingupdate, your skill rating is shown using two numbers in the format X ±Y
, where X
is your actual rating and Y
represents how certain the system in in that rating - specifically, it's 95% certain that you're somewhere between X - Y
and X + Y
.
When you first start playing your uncertainty (or deviation), is set to 700
. This is because the system knows nothing at all about you yet, and it could turn out that you're either completely new to the game and have a skill level around 1500 - 700 = 800
or a seasoned veteran with a skill level around 1500 + 700 = 2200
.
As you play matches and the system learns more about you, how certain the system is in your rating increases. How certain the system is in your rating also affects how much your rating changes: If the system knows nothing at all about you yet (high uncertainty) your rating will move more, but if you have low uncertainty then your rating won't move much even if you lose or win quite a few games in a row, since that kind of thing is to be expected.
Additionally, how uncertain your opponent's rating is matters - losing against someone with high uncertainty doesn't say much, since you can't really know if you're better or worse than a given rating when we're not even sure what rating you just fought against was.
Rating-update used to run on Glicko-2, but this new version uses a modified version the Glicko-1 algorithm that avoids the problem of having rating periods. This means the ratings get updated continuously, after each game, so it's suitable for use in matchmaking.
The algorithm needs to increase deviation over some time in order to have some "play" and to model the fact that if we haven't seen someone play in a month, we really don't know as much about them as if they just played a bunch of sets.
Vanilla Glicko handles decay in connection with rating periods, but since we're not using those we instead decay based on how many "virtual rating periods" have passed since the last time we decayed the deviation. You could probably be smarter and do this without a loop, but this was clearest as a starting point.
C
here is a constant you can change to tune how quickly player's ratings decay. In principle, you should experiment on your data to see what value gives you the best predictive power, but a simpler starting point is just solving INITIAL_DEVIATION = sqrt(AVERAGE_DEVIATION^2 * NUM_PERIODS * C^2)
forC
in order to figure out a value where it'll take NUM_PERIODS
to get from AVERAGE_DEVIATION
back to the INITIAL_DEVIATION
that every new player has.
INITIAL_DEVIATION
is set to 350
on the original Glicko scale, and that's what rating-update uses too, although deviations are doubled for presentations to achieve 95% certainty.
//I apologize for using Rust as syntax for examples, but that's what the
//codebase is written in and I'll try to keep it simple.
//Call this for both players' ratings, using the time since the last time we
//updated this player's rating as time_elapsed.
fn decay_deviation(rating: &mut Rating, time_elapsed: i64) {
//1. Figure out how many cycles of decay we need to use
let decay_count = time_elapsed / RATING_PERIOD_LENGTH;
//2. Loop that many times, decaying the rating each time.
for _ in 0..decay_count {
//3. Update according to Glicko's formula.
rating.deviation = f64::sqrt(f64::powf(rating.deviation, 2.0) + f64::powf(C, 2.0));
//4. Clamp the value to prevent it reaching higher than the initial deviation.
rating.deviation = f64::min(rating.deviation, INITIAL_DEVIATION);
}
}
As mentioned, the original algorithm handles updates in terms of rating periods, where all players are updated simultaneously. This has some upsides, but is unsuitable for realtime applications or anything like MMR where we'd want to be able to quickly find a player's skill and match them to good opponents.
This also means we're only dealing with a game per "rating period", so we can simplify the formulas a bit.
First of all, we need to determine some intermediate values that'll help us calculate new ratings and deviations.
q
Is really a constant that what use to transform rating scale:
//This is really just 0.00575646273
const Q: f64 = f64::ln(10.0) / 400.0;
g
is a number that represents how a rating's certainty affects outcomes:
//Hairy looking, I know
fn calc_g(rating_deviation: f64) -> f64 {
1.0 /
f64::sqrt(1.0 +
(3.0 * f64::powf(Q, 2.0) * f64::powf(rating_deviation, 2.0))
/ f64::powf(PI, 2.0))
}
This looks very hairy but if we run a through deviations through it we can roughly see what's going. As rating deviation increases, we're more and more certain ratings and weight things accordingly.
deviation |
g(deviation) |
---|---|
350 | 0.67 |
300 | 0.72 |
250 | 0.78 |
200 | 0.84 |
150 | 0.90 |
100 | 0.95 |
50 | 0.99 |
e
represents the expected value of a match, ignoring one side's deviation for now. This is the logistic curve used in the Elo system, but using g
to weight the outcomes. Someone being rated 200 higher than you doesn't necessarily mean much if their deviation is through the roof, at best the algorithm can shrug and say "yeah I guess maybe he'll win but I'm not that sure"
fn calc_e(own_rating: Rating, other_rating: Rating) -> f64 {
let rating_difference = own_rating.rating - other_rating.rating;
let g = calc_g(other_rating.deviation);
1.0 / (1.0 + f64::powf(10.0, g * rating_difference / -400.0))
}
d^2
is used to help determine how quickly your rating should move. For example, we get a lot more information from close matches (where the expected outcome is near 50%) than we do from blowouts. If you're expected to beat someone between 80% and 95% of time, winning doesn't really help us guess if it's closer to 90% or 95%.
fn calc_d2(own_rating: Rating, other_rating: Rating) -> f64 {
let g = calc_g(other_rating.deviation);
let e = calc_e(own_rating, other_rating);
1.0 / (f64::powf(Q, 2.0) * f64::powf(g, 2.0) * e * (1.0 - e))
}
Now we can finally determine the new rating for a player. outcome
is 1
for a win or 0
for a loss. If we had ties those would be 0.5
, but those don't apply to fighting games very often.
fn calc_new_rating(own_rating: Rating, other_rating: Rating, outcome: f64) -> f64 {
let g = calc_g(other_rating.deviation);
let e = calc_e(own_rating, other_rating);
let d_2 = calc_d2(own_rating, other_rating);
(Q / (1.0 / f64::powf(own_rating.deviation, 2.0)) + (1.0 / d_2)) * g * (outcome - e)
}
Now that we've seen the player play a game we can update the deviation to represent our increased certainty in whatever rating they have.
fn calc_new_deviation(own_rating: Rating, other_rating: Rating) -> f64 {
let d_2 = calc_d2(own_rating, other_rating);
f64::sqrt(1.0 / ((1.0 / f64::powf(own_rating.deviation, 2.0)) + (1.0 / d_2)))
}
To calculate the expected outcome of match taking both deviations into account, you can use the following formula:
fn calc_expected_outcome(own_rating: Rating, other_rating: Rating) -> f64 {
let rating_difference = own_rating.rating - other_rating.rating;
//This is the bit that's different from calc_e!
let g = calc_g(f64::sqrt(
f64::powf(own_rating.deviation, 2.0) + f64::powf(other_rating.deviation, 2.0),
));
1.0 / (1.0 + f64::powf(10.0, g * rating_difference / -400.0))
}