Sample Match

This page is dedicated to carrying out calculations to demonstrate an example of rating changes from a single match. We will be following the logic described here, so it is advised to read that page first. This page will focus on detailing the actual calculations used in the paper¹ which introduces this rating model.

Note

It is not necessary to understand every calculation on this page to get a good sense of what the rating model is doing. This documentation exists primarily for those who are curious about the mathematics, and also so that the rating assignment process is more transparent than only specifying “the ratings are fed into a model and numbers come out of it.”

If you are interested more in the general philosophy of what ratings represent, it is highly recommended to read this page instead.

The sample match we will be using is TYC: (I love happy dreams) vs (ben) (osu! link). It was chosen for its relatively short length and smaller format—2v2, team size 3—while still effectively illustrating the main ideas.

We will assume the six players who played in the match had the following ratings and volatilities immediately before this match, and we will calculate their ratings and volatilities after the match ends. Please note that these values are not the actual ratings of these players; these are sample numbers for illustration purposes.

Player	Rating ()	Volatility ()
thighhigh
PotjeNutella
Miori Celesta
CMeFly
Piemanray314
glixh_hunt3r

Note that all games in this match were verified, so all of them will be used for rating calculation. If there were a game with incorrect lobby sizes (because of a disconnect) or an incorrect beatmap ID (because of a warmup), those would be excluded for the corresponding rejection reasons.

Method A Calculation

In Method A of calculating rating changes, we first look at the four players of each game and rank their scores from highest to lowest. The teams that the players play for and the mods that they use are irrelevant, since no maps were played with the EZ mod (which would receive a 1.75x multiplier). Thus the rankings are as shown:

1st	2nd	3rd	4th
thighhigh	glixh_hunt3r	Miori Celesta	PotjeNutella
CMeFly	PotjeNutella	thighhigh	glixh_hunt3r
PotjeNutella	thighhigh	Miori Celesta	Piemanray314
PotjeNutella	Piemanray314	glixh_hunt3r	CMeFly
thighhigh	Miori Celesta	PotjeNutella	Piemanray314
PotjeNutella	CMeFly	thighhigh	glixh_hunt3r

These rankings are the only information taken into account for rating calculations (individual scores are not considered). For each game, the players who play in it will be given a “game rating change.” All games are calculated in the same way, so we will demonstrate this process for the first game.

Here, we follow the notation and logic of Algorithm 4 of the paper (found on page 21 in the link¹). The description also references Algorithm 1, which can be found on page 15.

First, we compute an overall uncertainty constant , which is given by

Here is a constant specified by our constants file, and the other four terms in the square roots come from the volatilities of the four players prior to the match. This is used to compute the predicted probabilities of players placing in various orders. It roughly means that for this game, a difference of rating points between two players means the higher-rated player has times the chance of placing above the lower-rated player.

Next, we calculate two values and for each player in the game. These specify an additive rating change and a multiplicative volatility change, respectively. Instead of repeating all of the formulas from the paper, we will try to work out an example in understandable words.

First, consider the player who placed highest, thighhigh in this case. The model currently thinks the probability that thighhigh will rank highest is

where we are plugging in the pre-match ratings from our table above. This is roughly because all four players have similar pre-match ratings. thighhigh’s suggested rating change from this game is then

and the factor by which their rating variance (squared volatility) should be decreased is

We do not include the “variance damping factor” found on page 26 of the paper¹, because volatility is instead increased by decay outside of matches. Notice that the less likely the model thinks it is for thighhigh to place first, and also the higher their volatility, the higher their suggested rating change.

Next, consider the second-highest-ranking player, glixh_hunt3r. The model now cares about both the probability of glixh_hunt3r ranking highest (notice this is slightly lower than because glixh_hunt3r has the lowest pre-match rating),

as well as the probability of glixh_hunt3r ranking highest except for thighhigh,

glixh_hunt3r’s suggested rating change from this game is then

and their variance decrease factor is

This suggested rating increase is smaller than thighhigh’s suggested rating increase, essentially because placing second means the formula for “subtracts off two probabilities instead of one.” But it is not much smaller because glixh_hunt3r’s pre-match volatility is larger than thighhigh’s.

Note

The quantity is not the overall probability of glixh_hunt3r placing 2nd among the four players, but instead the probability that assuming thighhigh placed first, glixh_hunt3r ranks above the other players. We use this slightly less precise language for readability.

Next, to calculate rating changes for the third-highest-ranking player Miori Celesta, the model cares about the probability of Miori Celesta ranking highest, ranking highest except for thighhigh, and also ranking highest except for thighhigh and glixh_hunt3r. The resulting numbers are

From this, we get that Miori Celesta’s suggested rating change is

and their variance decrease factor is

Finally, to calculate rating changes for the fourth-highest-ranking player PotjeNutella, we must compute

Using formulas similar to the ones above (just with an extra term each), we then find that

Remember that all of this was done just for the first game of the match, but the exact same procedure works for all of the other games in the match. Here are the resulting values of , , and for each one, rounded for readability (if a player did not play in a game, then the corresponding cells are left blank):

Game

Method B Calculation

In Method B of calculating rating changes, we again begin by ranking scores from highest to lowest, but we now treat any players who did not play a game as tying for last place. This yields the following table:

1st	2nd	3rd	4th	Tied 5th
thighhigh	glixh_hunt3r	Miori Celesta	PotjeNutella	CMeFly, Piemanray314
CMeFly	PotjeNutella	thighhigh	glixh_hunt3r	Miori Celesta, Piemanray314
PotjeNutella	thighhigh	Miori Celesta	Piemanray314	CMeFly, glixh_hunt3r
PotjeNutella	Piemanray314	glixh_hunt3r	CMeFly	thighhigh, Miori Celesta
thighhigh	Miori Celesta	PotjeNutella	Piemanray314	CMeFly, glixh_hunt3r
PotjeNutella	CMeFly	thighhigh	glixh_hunt3r	Miori Celesta, Piemanray314

We will again demonstrate how game rating changes look by calculating the rating changes for the first game. This time, we have the overall uncertainty factor

With this, and values are calculated for all six players based on the rankings in game 1:

The highest-ranking player, thighhigh, now has

so that (very similarly to above)

and

The calculations for players 2 through 4 are very similar. For the second-highest-ranking player glixh_hunt3r, we find

leading to

For the third-highest-ranking player Miori Celesta,

leading to

For the fourth-highest-ranking player PotjeNutella,

leading to

Finally, the model handles last-place ties by essentially equalizing the rating gains they would have in the different positions. For CMeFly, we have

and the suggested rating change is then

(more generally if there is an -way tie, the becomes a ), while the suggested variance decrease factor has the same form of formula as before:

Similarly, for Piemanray314 we have

and with the same formulas as CMeFly we find

Here is the table of and values under Method B. Note that we no longer have a column for because it is the same across all games in this method (all six players are considered in all games).

Game

Comparing to the previous table, we can see that rating changes are typically more positive in Method B for players who played in a game, and they are very negative for players who did not play.

Overall Changes

Finally, we essentially do a weighted average of all of these numbers to determine the final rating changes for the whole match. For simplicity, we will demonstrate this just for one of the players, Piemanray314.

We first average the values of across Methods A and B at a : ratio to get an “effective average ”

Similarly, we first obtain an “effective averaged ” by calculating

Finally, Piemanray314’s final rating and volatility are calculated by using these effective values to modify the initial rating and volatility:

The factor of increases rating changes for longer games, so because this match ended quickly, the rating changes are slightly dampened. The table below shows the resulting adjustments to ratings and volatilities for all six players of this match, rounded to the nearest tenth.

Player	Rating ()	Volatility ()
thighhigh
PotjeNutella
Miori Celesta
CMeFly
Piemanray314
glixh_hunt3r

All players’ volatilities have decreased, indicating that the model is slightly more confident about the updated ratings. The player thighhigh’s rating has significantly increased due to their relatively high participation and placement among players throughout the match. Note that overall rating changes are not precisely zero-sum due to the differences in players’ volatilities.

Weng, Ruby & Lin, Chih-Jen. (2011). A Bayesian Approximation Method for Online Ranking. Journal of Machine Learning Research. 12. 267-300. https://jmlr.csail.mit.edu/papers/volume12/weng11a/weng11a.pdf. ↩ ↩² ↩³

o!TR Docs

Explorer

Sample Match

Method A Calculation

Method B Calculation

Overall Changes

Graph View

Table of Contents

o!TR Docs

Explorer

Sample Match

Method A Calculation

Method B Calculation

Overall Changes

Footnotes

Graph View

Table of Contents