-
-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
bug: maybeMight be a bug.Might be a bug.
Description
Description
When using rate with the scores argument and a nonzero margin, I would expect (based on the documentation under “Score Margins”) that a larger score gap produces a "more impressive" win and therefore a larger rating update.
Example expectation: scores=[10,0] should look like a more decisive win than scores=[1,0], so the winner's mu should go higher (and/or the loser's mu should go lower) for [10,0] than for [1,0].
Instead, what I observe is the opposite:
- As the score gap increases, the magnitude of the update actually shrinks.
- In some cases the loser is punished less (higher
mu, highersigma) when they lose by more.
This looks like
- a bug in how
margin/scoresare applied in the update step, or - undocumented behavior that contradicts the docs, or
- misunderstanding on my part
Minimal Reproduction
from openskill.models import PlackettLuce
res = []
model = PlackettLuce(margin = 1.0)
pl1 = model.rating()
pl2 = model.rating()
(pl1,), (pl2,) = model.rate(teams = [[pl1], [pl2]])
res.append({
'score0': 'baseline',
'score1': 'baseline',
'mu0': pl1.mu,
'mu1': pl2.mu,
'sigma0': pl1.sigma,
'sigma1': pl2.sigma,
})
for score in [1, 10, 100]:
r = model.rate(teams = [[pl1], [pl2]], scores = [score, 0])
res.append({
'score0': score,
'score1': 0,
'mu0': r[0][0].mu,
'mu1': r[1][0].mu,
'sigma0': r[0][0].sigma,
'sigma1': r[1][0].sigma,
})
print(pd.DataFrame(res)) # pandas just for pretty printing
Output I get (OpenSkill 6.1.3 on a Mac):
score0 score1 mu0 mu1 sigma0 sigma1
0 baseline baseline 27.635389 22.364611 8.065901 8.065901
1 1 0 29.656297 20.343703 7.822887 7.822887
2 10 0 28.516660 21.483340 7.921360 7.921360
3 100 0 27.802472 22.197528 8.034383 8.034383
Observations:
- With a 1–0 win, Player 1's
mujumps the most (29.6563) and Player 2'smudrops the most (20.3437). - With a 10–0 or 100–0 blowout, Player 1's
muincreases less, and Player 2'smuis actually closer to the baseline again, as if they were punished less for getting stomped. sigmaalso drifts back up for both players in the extreme blowout, suggesting the update is treating the result as less informative, not more.
This feels backwards: bigger score gap → smaller update.
I see the same qualitative effect with other model types, but focusing on PlackettLuce here for simplicity.
Version
v6
Metadata
Metadata
Assignees
Labels
bug: maybeMight be a bug.Might be a bug.