Page 1 of 3

trueskill parameter tuning results

PostPosted: 07 Feb 2016, 14:46
by Axle
In case anyone is interested in "optimum" trueskill parameters for the FAF 1v1 ladder, these might qualify:

- mu = 1500
- sigma = 500
- beta = 240
- tau = 18
- draw_probability = 0.045

You can read slightly more about it in the readme.pdf here: https://github.com/Axle1975/pytrueskill

Re: trueskill parameter tuning results

PostPosted: 07 Feb 2016, 23:27
by yorick
This certainly looks interesting, once everything calmed down a bit with the new server this is worth a deeper look ( maybe for teamgames aswell later).
The draw probability in FA is quite dependend on the map ( i.e. on 5x5 maps way higher then on bigger maps). I vague recall some map stats on FAF with draw probability on there as well, but i dont know if that was used or a global draw probability.

Also it might be interesting to compare these values to the ones that are currently used. In the server code i found some values, but i dont know if these are ones that are actually used (used for everything, ladder /global rating only).https://github.com/FAForever/server/blob/a9e878ed09eed1cc19dd88518e2ce491d0e860f4/config.py#L26
-mu = 1500
- sigma = 500
- beta = 250
-tau = 5
- draw_probability = 0.10

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 01:03
by Axle
Hi Yorick! Thanks for that feedback.

I think you are very correct that draw_probability depends on the map (and whether your name is Lame or not), probably worth exploring. The draw_probability used here is just a global draw_probability.

One of things I originally wanted to explore was the possibility to model skill as a function of map and faction matchup too. All the necessary data is available. However I figured the above results could be immediately applicable to FAF since it requires no fiddling with the existing trueskill algorithms.

Its also interesting that the existing tau is so much lower than my tau. I have two comments on that:
- I noticed that theres a rapidly increasing penalty as tau becomes much smaller
- Some players have mentioned that after playing many games, they've subjectively found trueskill too sluggish to adjust their ratings. this could be a symptom of a too-low tau.

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 02:00
by Sheeo
Amazing work Axle. I'd love to get you setup with more data -- Aulex has been working on an API for searching through game results, it's still being worked on though.

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 02:46
by Axle
Thanks Sheeo! I'd love to get more comprehensive and up to date data. Theres no knowing how many missing games there are in my existing dataset.

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 04:46
by Aulex
Sheeo wrote:Amazing work Axle. I'd love to get you setup with more data -- Aulex has been working on an API for searching through game results, it's still being worked on though.

Still in hiatus, until I find time.
Nice work Axle, in terms of searching in relation to rating, I only set up rating bounds. Did you want something more specific or will this be sufficient?

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 05:09
by Axle
Hi Aulex, I wouldn't really want to search by rating. I'd be more interested in all 1vs1 games and their results.

Without knowing FAF db schema exactly (and pls forgive my rusty sql), I suspect I'd like something similar to:

select replay.id, replay.map_name, replay.time_start, replay.time_end, replay.duration, replay.game_type,
player.name, (player.uniqueID so as not to be confused by name changes??), player.score, player.faction, player.rating_mean, player.rating_stdev
from replays
join player on replay.id=player.replayid
where replay.game_type=1vs1

But for good measure, I'd like all the custom games too, for later on :D

Infact, maybe its better if I can just get my hands on a backup of the whole db :D:D

Re: trueskill parameter tuning results

PostPosted: 08 Feb 2016, 21:28
by Softly
I too would like some stats on past games to play with

Re: trueskill parameter tuning results

PostPosted: 09 Feb 2016, 10:12
by Axle
Here's some interesting figures:

Plots of rating progression for the 5 most prolific ladder players. In blue we have the current trueskill parameters. In red, my "optimimum" parameters. And in green, half way between.

So I guess the most obvious thing is that the "optimum" progression is a lot more volatile. The other thing to notice is that for beta=10, the log likelihood (L) isn't really that much worse than beta=18. As is often the case with maximum likelihood optimisations, the absolute optimum isn't necessarily the only acceptable solution, there are many suboptimal solutions that are almost as good. And if we throw into the mix that we don't like the degree of volatility that beta=18 gives us, well beta=10 is less volatile and is almost a predictive.

Image
Image
Image
Image
Image

So is what I'm saying is, I think maybe beta=10 is better :D

btw, the difference between pdraw=0.1 and pdraw=0.045 is negligible

Re: trueskill parameter tuning results

PostPosted: 23 Feb 2016, 22:04
by Softly
The most important criteria is whether your tuned version of trueskill better predicts results.

What sort of success rates does it get vs the current version?