trueskill parameter tuning results

Post here if you want to help developing something for FAF.

Re: trueskill parameter tuning results

Postby Axle » 12 May 2016, 13:54

The next part of trueskill that begs attention is the model of what happens to a player's skill *between* games. Trueskill has this 'tau' parameter, which basically says that if we reckon a player has skill 1800±50 based on the results of game N, then maybe by the time he enters his (N+1)th game he has skill 1800±60 because we know that human skill doesn't stay the same, it keeps changing. Probably something happened in between game N and game N+1 and his skill changed a bit in some unpredictable way.

Well if we imagine reasons why it might have changed, we might be able to go some way towards reducing the unpredictability of it. Two obvious factors that come to mind are rust and experience.

Rust: the longer a player abstains from playing, the more his technical abilities diminish and the more he forgets the nuances of his coveted and polished build orders.

Experience: Having lost one too many games by forgetting to build energy, maybe now, after this game, he can finally remember to build energy. Or maybe he just watched a few replays and learned some tricks. That all ought to be worth a few points of skill at least.

So in order to model rust and experience, I added some stuff to reduce the mean and increase the variance of player's skill depending on the time since last game, and also increased the mean just a little bit just for playing on the assumption that players will generally learn something by playing.

As before there was a great reduction in NLML, and also an interesting change in the optimal 'tau' parameter. This reduced from 18 down to 10. Now the situation isn't quite as simple as previously with the 'beta' parameter because while the 'tau' parameter is lower, we are adding variance elsewhere through the rust model. But what we can say is that we're not indiscriminately blanketing all games with the same amount of 'tau'. Only games where the player has had significant amounts of down-time do we add any significant amount of variance beyond tau. And that tau is now much lower so we can say that we've reduced the amount of residual uncertainty in our model - another win!

Again with the distribution of outcome probabilities we see a general, if slight, shift to the right. A reduction in number of games in the 50-60% region and an increase in the number of games in the >70% region. Player rating progressions show significant differences again. I've included Photon's progression this time because he has an interesting step change after a haitus just after the 200th game that I can point to. Without the rust model, his loss of points is kind of gradual, whereas with the rust model the loss of points is very rapid. But after that, the subsequent recovery of points is quite similar. And generally the ratings are much less eratic.

btw, it looks like the average player might initially lose points at a rate of 8pts/month of inactivity with uncertainty 30pts(stdev)/month. And he learns maybe 0.6 points just by playing a game.

Next to come: What happens if we control for factional imbalance?
Attachments
Axeskill-RustAdjust-progression-Photon.png
Photon rating progression (rust adjustment [blue] versus tau-only adjustment [green])
Axeskill-RustAdjust-progression-Photon.png (81.31 KiB) Viewed 964 times
Axeskill-RustAdjust-progression-TA4Life.png
TA4Life rating progression (rust adjustment [blue] versus tau-only adjustment [green])
Axeskill-RustAdjust-progression-TA4Life.png (71.38 KiB) Viewed 964 times
Axeskill-RustAdjust-distribution.png
distribution of outcome probabilities (rust adjustment [blue] versus tau-only adjustment [green])
Axeskill-RustAdjust-distribution.png (46.19 KiB) Viewed 964 times
Last edited by Axle on 12 May 2016, 14:15, edited 1 time in total.
Axle
Avatar-of-War
 
Posts: 79
Joined: 02 Apr 2013, 10:14
Has liked: 0 time
Been liked: 3 times
FAF User Name: Axle

Re: trueskill parameter tuning results

Postby Axle » 12 May 2016, 14:03

And the next thing you might wonder about is factional balance. The balance team sometimes makes changes to the game that turns the balance on its head. eg a long long time ago cybran used to suck real bad, they made some changes, and now UEF suck real bad instead. So I figured I want to dynamically track factional imbalance rather than model it with static parameters.

To do this I introduced "virtual-factional-allies" to the rating. Something similar was actually suggested by someone on the forum a few years ago but I can't find a good set of keywords to find the post (if the originator is reading this, shouts out to you!). The basic idea is that for each game, each player gets a virtual ally depending on the faction matchup. I'm UEF facing cybran? Then I get the UEF-v-Cybran ally, and my opponent gets the Cybran-v-UEF ally. At the end of game the spoils (or losses) in terms of rating update are distributed between me and my virtual ally as per canonical trueskill (ie bayesian inference).

Now these virtual-allies get different beta and tau parameters (and are exempt from rust / experience if used in conjunction with my rust model) in recognition that their underlying skill changes very infrequently - only when balance team makes a change. Also the mirror matchup virtual-allies (eg UEF-v-UEF) never get any rating updates, locked at mean=1500,sigma=0 in recognition that mirror matchups should always cancel as a given.

So that all sounds pretty exciting, but what do the results show?. The rating progression for the virtual-factional-allies does indeed show that cybran is OP and UEF sucks balls. I was hoping to see step changes in the progression that I could point to and say "aha! balance change" but its difficult to point to any definitive step change. Overall the NLML did drop, but not as much as the skill-as-a-function-of-mapsize and the rust models. Also disappointingly, no change in optimal beta or tau :( Also the distribution of outcome probabilities shows no discernable difference (apart from the aggregate quantitiative NLML), and TA4Life's rating progression shows no significant difference either.

It is interesting to note that certain faction matchups do appear to have a significant penalty. eg using uef against cybran may disadvantage you by up to 50 pts of skill! ( uef-v-cybran's ally has skill of about 1475, and therefore cybran-v-uef's ally has skill about 1525, for a total of 50 pts difference). If this is infact a reliable causitive measure of imbalance, it is conceivably worthwhile incorporating into the auto-match system for better player experience. For the player that always chooses UEF, theres no difference because his own personal rating will soon enough adjust accordinly and he'll be appropriately matched. However if he (or his opponent) likes to play random, or frequently changes faction, auto-matcher would now be able to instantly compensate.

I can't upload all the faction vs faction rating progressions, so I'll just summarise the results here:

aeon vs cybran: 16pts in cybran favour
aeon vs seraphim: 8pts in seraphim favour
cybran vs seraphim: 8pts in cybran favour
uef vs aeon: 24pts in aeon favour
uef vs cybran: 60pts in cybran favour
uef vs seraphim: 50pts in seraphim favour

Next to come: What if we put it all together?
Attachments
Axeskill-FactionAdjust-progression-uef vs cybran.png
UEF vs cybran rating progression
Axeskill-FactionAdjust-progression-uef vs cybran.png (49.22 KiB) Viewed 963 times
Axeskill-FactionAdjust-progression-TA4Life.png
TA4Life rating progression (faction adjust)
Axeskill-FactionAdjust-progression-TA4Life.png (64.06 KiB) Viewed 963 times
Axeskill-FactionAdjust-distribution.png
distribution of outcome probabilities (faction adjust)
Axeskill-FactionAdjust-distribution.png (45.27 KiB) Viewed 963 times
Last edited by Axle on 13 May 2016, 00:37, edited 1 time in total.
Axle
Avatar-of-War
 
Posts: 79
Joined: 02 Apr 2013, 10:14
Has liked: 0 time
Been liked: 3 times
FAF User Name: Axle

Re: trueskill parameter tuning results

Postby Axle » 12 May 2016, 14:05

So now I enable all the enhancements described above: draw probabilities that depend on map size; player skill as a function of map size; a rudimentary model of player rust; compensation for factional imbalance.

See below the resulting distribution of outcome probabilities that compare all the enhancements versus canonical trueskill (augmented 3 draw probabilities).

This time the difference is more pronounced than for the individual enhancements by themselves - a reduction in the number of games in with 40-60% probabilities and an increase in the number of games >70%. The overall NLML is greatly reduced from 89728 down to 89115. The optimum 'beta' parameter is down from 250 to 150, and 'tau' is down from 17 to 10. All good things indicating a more predictive rating system.

From a player-centric point of view, see TA4Life's rating progression. It is more stable, but he can expect it to respond more rapidly to his absences. And hopefully he'd be able to switch between UEF and cybran whenever he wants and have a slightly better chance of getting a fair matchup.

Thats it for now, its been a funky adventure working out all the details of factor graphs and putting trueskill on a course of steroids.
Attachments
Axeskill-AllTogether-TA4Life.png
TA4Life rating progression (all enhancements together [blue] versus no enhancements [green])
Axeskill-AllTogether-TA4Life.png (69.13 KiB) Viewed 960 times
Axeskill-AllTogether-distribution.png
distribution of outcome probabilities (all enhancements together [blue] versus no enhancements [green])
Axeskill-AllTogether-distribution.png (45.59 KiB) Viewed 960 times
Axle
Avatar-of-War
 
Posts: 79
Joined: 02 Apr 2013, 10:14
Has liked: 0 time
Been liked: 3 times
FAF User Name: Axle

Re: trueskill parameter tuning results

Postby JaggedAppliance » 09 Oct 2016, 18:17

Hi Axle, I was wondering what figures you would suggest to be used in FAF as of now. This change was made in August: https://github.com/FAForever/server/com ... 2c16ce7eca
Tau was increased from 5 to 10. Beta was decreased from 250 to 240.
Do you think this is a good change? Should we be using numbers from your later posts, for example having a beta value of 150? Hope you will return and respond soon.
"and remember, u are a noob, u don’t have any rights to disagree" - Destructor

My Youtube channel with casts > https://www.youtube.com/channel/UCVukA3 ... xnqxq3YD1g
My Twitch > https://www.twitch.tv/jaggedappliance
JaggedAppliance
Councillor - Balance
 
Posts: 576
Joined: 08 Apr 2015, 14:45
Has liked: 626 times
Been liked: 248 times
FAF User Name: JaggedAppliance

Re: trueskill parameter tuning results

Postby Axle » 07 Jan 2017, 04:29

Hi Jagg! Sorry for the late reply, not getting email notifications and such.

Yes I think beta=240, tau=10 is a good change. Apart from being closer to "optimal", I think players will be happier with it - their rating will continue to be at least a little responsive to changes in their skill (rust, training, practice) after they've played many games. Lack of responsiveness is something I've seen them complain about often.

In terms of predictive power of the rating system, I think changing the draw probably from 0.1 to 0.05 would be a step in the right direction too.

Axle
Axle
Avatar-of-War
 
Posts: 79
Joined: 02 Apr 2013, 10:14
Has liked: 0 time
Been liked: 3 times
FAF User Name: Axle

Previous

Return to Contributors

Who is online

Users browsing this forum: No registered users and 1 guest