Skip to content

Margin-of-victory logic appears inverted: 10–0 win moves ratings less than 1–0 win #181

@jpetterson

Description

@jpetterson

Description

When using rate with the scores argument and a nonzero margin, I would expect (based on the documentation under “Score Margins”) that a larger score gap produces a "more impressive" win and therefore a larger rating update.

Example expectation: scores=[10,0] should look like a more decisive win than scores=[1,0], so the winner's mu should go higher (and/or the loser's mu should go lower) for [10,0] than for [1,0].

Instead, what I observe is the opposite:

  • As the score gap increases, the magnitude of the update actually shrinks.
  • In some cases the loser is punished less (higher mu, higher sigma) when they lose by more.

This looks like

  1. a bug in how margin/scores are applied in the update step, or
  2. undocumented behavior that contradicts the docs, or
  3. misunderstanding on my part

Minimal Reproduction

from openskill.models import PlackettLuce
res = []
model = PlackettLuce(margin = 1.0)
pl1 = model.rating()
pl2 = model.rating()
(pl1,), (pl2,) = model.rate(teams = [[pl1], [pl2]])

res.append({
    'score0': 'baseline',
    'score1': 'baseline',
    'mu0': pl1.mu,
    'mu1': pl2.mu,
    'sigma0': pl1.sigma,
    'sigma1': pl2.sigma,
})

for score in [1, 10, 100]:
    r = model.rate(teams = [[pl1], [pl2]], scores = [score, 0])
    res.append({
        'score0': score,
        'score1': 0,
        'mu0': r[0][0].mu,
        'mu1': r[1][0].mu,
        'sigma0': r[0][0].sigma,
        'sigma1': r[1][0].sigma,
    })

print(pd.DataFrame(res)) # pandas just for pretty printing

Output I get (OpenSkill 6.1.3 on a Mac):

     score0    score1        mu0        mu1    sigma0    sigma1
0  baseline  baseline  27.635389  22.364611  8.065901  8.065901
1         1         0  29.656297  20.343703  7.822887  7.822887
2        10         0  28.516660  21.483340  7.921360  7.921360
3       100         0  27.802472  22.197528  8.034383  8.034383

Observations:

  • With a 1–0 win, Player 1's mu jumps the most (29.6563) and Player 2's mu drops the most (20.3437).
  • With a 10–0 or 100–0 blowout, Player 1's mu increases less, and Player 2's mu is actually closer to the baseline again, as if they were punished less for getting stomped.
  • sigma also drifts back up for both players in the extreme blowout, suggesting the update is treating the result as less informative, not more.

This feels backwards: bigger score gap → smaller update.

I see the same qualitative effect with other model types, but focusing on PlackettLuce here for simplicity.

Version

v6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions