This page is dedicated to carrying out calculations to demonstrate an example of rating changes from a single match. We will be following the logic described here, so it is advised to read that page first. This page will focus on detailing the actual calculations used in the paper1 which introduces this rating model.
Note
It is not necessary to understand every calculation on this page to get a good sense of what the rating model is doing. This documentation exists primarily for those who are curious about the mathematics, and also so that the rating assignment process is more transparent than only specifying “the ratings are fed into a model and numbers come out of it.”
If you are interested more in the general philosophy of what ratings represent, it is highly recommended to read this page instead.
The sample match we will be using is TYC: (I love happy dreams) vs (ben) (osu! link). It was chosen for its relatively short length and smaller format—2v2, team size 3—while still effectively illustrating the main ideas.
We will assume the six players who played in the match had the following ratings and volatilities immediately before this match, and we will calculate their ratings and volatilities after the match ends. Please note that these values are not the actual ratings of these players; these are sample numbers for illustration purposes.
| Player | Rating ( | Volatility ( |
|---|---|---|
| thighhigh | ||
| PotjeNutella | ||
| Miori Celesta | ||
| CMeFly | ||
| Piemanray314 | ||
| glixh_hunt3r |
Note that all games in this match were verified, so all of them will be used for rating calculation. If there were a game with incorrect lobby sizes (because of a disconnect) or an incorrect beatmap ID (because of a warmup), those would be excluded for the corresponding rejection reasons.
Method A Calculation
In Method A of calculating rating changes, we first look at the four players of each game and rank their scores from highest to lowest. The teams that the players play for and the mods that they use are irrelevant, since no maps were played with the EZ mod (which would receive a 1.75x multiplier). Thus the rankings are as shown:
| Game | 1st | 2nd | 3rd | 4th |
|---|---|---|---|---|
| thighhigh | glixh_hunt3r | Miori Celesta | PotjeNutella | |
| CMeFly | PotjeNutella | thighhigh | glixh_hunt3r | |
| PotjeNutella | thighhigh | Miori Celesta | Piemanray314 | |
| PotjeNutella | Piemanray314 | glixh_hunt3r | CMeFly | |
| thighhigh | Miori Celesta | PotjeNutella | Piemanray314 | |
| PotjeNutella | CMeFly | thighhigh | glixh_hunt3r |
These rankings are the only information taken into account for rating calculations (individual scores are not considered). For each game, the players who play in it will be given a “game rating change.” All games are calculated in the same way, so we will demonstrate this process for the first game.
Here, we follow the notation and logic of Algorithm 4 of the paper (found on page 21 in the link1). The description also references Algorithm 1, which can be found on page 15.
First, we compute an overall uncertainty constant
Here
Next, we calculate two values
First, consider the player who placed highest, thighhigh in this case. The model currently thinks the probability that thighhigh will rank highest is
where we are plugging in the pre-match ratings from our table above. This is roughly
and the factor by which their rating variance (squared volatility) should be decreased is
We do not include the “variance damping factor”
Next, consider the second-highest-ranking player, glixh_hunt3r. The model now cares about both the probability of glixh_hunt3r ranking highest (notice this is slightly lower than
as well as the probability of glixh_hunt3r ranking highest except for thighhigh,
glixh_hunt3r’s suggested rating change from this game is then
and their variance decrease factor is
This suggested rating increase is smaller than thighhigh’s suggested rating increase, essentially because placing second means the formula for
Note
The quantity
is not the overall probability of glixh_hunt3r placing 2nd among the four players, but instead the probability that assuming thighhigh placed first, glixh_hunt3r ranks above the other players. We use this slightly less precise language for readability.
Next, to calculate rating changes for the third-highest-ranking player Miori Celesta, the model cares about the probability of Miori Celesta ranking highest, ranking highest except for thighhigh, and also ranking highest except for thighhigh and glixh_hunt3r. The resulting numbers are
From this, we get that Miori Celesta’s suggested rating change is
and their variance decrease factor is
Finally, to calculate rating changes for the fourth-highest-ranking player PotjeNutella, we must compute
Using formulas similar to the ones above (just with an extra term each), we then find that
Remember that all of this was done just for the first game of the match, but the exact same procedure works for all of the other games in the match. Here are the resulting values of
| Game | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method B Calculation
In Method B of calculating rating changes, we again begin by ranking scores from highest to lowest, but we now treat any players who did not play a game as tying for last place. This yields the following table:
| Game | 1st | 2nd | 3rd | 4th | Tied 5th |
|---|---|---|---|---|---|
| thighhigh | glixh_hunt3r | Miori Celesta | PotjeNutella | CMeFly, Piemanray314 | |
| CMeFly | PotjeNutella | thighhigh | glixh_hunt3r | Miori Celesta, Piemanray314 | |
| PotjeNutella | thighhigh | Miori Celesta | Piemanray314 | CMeFly, glixh_hunt3r | |
| PotjeNutella | Piemanray314 | glixh_hunt3r | CMeFly | thighhigh, Miori Celesta | |
| thighhigh | Miori Celesta | PotjeNutella | Piemanray314 | CMeFly, glixh_hunt3r | |
| PotjeNutella | CMeFly | thighhigh | glixh_hunt3r | Miori Celesta, Piemanray314 |
We will again demonstrate how game rating changes look by calculating the rating changes for the first game. This time, we have the overall uncertainty factor
With this,
The highest-ranking player, thighhigh, now has
so that (very similarly to above)
and
The calculations for players 2 through 4 are very similar. For the second-highest-ranking player glixh_hunt3r, we find
leading to
For the third-highest-ranking player Miori Celesta,
leading to
For the fourth-highest-ranking player PotjeNutella,
leading to
Finally, the model handles last-place ties by essentially equalizing the rating gains they would have in the different positions. For CMeFly, we have
and the suggested rating change is then
(more generally if there is an
Similarly, for Piemanray314 we have
and with the same formulas as CMeFly we find
Here is the table of
| Game | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Comparing to the previous table, we can see that rating changes are typically more positive in Method B for players who played in a game, and they are very negative for players who did not play.
Overall Changes
Finally, we essentially do a weighted average of all of these numbers to determine the final rating changes for the whole match. For simplicity, we will demonstrate this just for one of the players, Piemanray314.
We first average the values of
Similarly, we first obtain an “effective averaged
Finally, Piemanray314’s final rating and volatility are calculated by using these effective values to modify the initial rating and volatility:
The factor of
| Player | Rating ( | Volatility ( |
|---|---|---|
| thighhigh | ||
| PotjeNutella | ||
| Miori Celesta | ||
| CMeFly | ||
| Piemanray314 | ||
| glixh_hunt3r |
All players’ volatilities have decreased, indicating that the model is slightly more confident about the updated ratings. The player thighhigh’s rating has significantly increased due to their relatively high participation and placement among players throughout the match. Note that overall rating changes are not precisely zero-sum due to the differences in players’ volatilities.
Footnotes
-
Weng, Ruby & Lin, Chih-Jen. (2011). A Bayesian Approximation Method for Online Ranking. Journal of Machine Learning Research. 12. 267-300. https://jmlr.csail.mit.edu/papers/volume12/weng11a/weng11a.pdf. ↩ ↩2 ↩3