Hi Softly! I do believe that the L metric shown above is a direct measure of how well trueskill predicts results.

Success rate is only part of the story. Recall that trueskill doesn't really predict an outcome, it only gives liklihoods of 3 outcomes. eg 5% probability of draw; 47% probablity player 1 wins; 48% probability player 1 loses. You could say in this case that its predicted a loss for player 1, but the level of confidence isn't very high and that single statement doesn't really capture what trueskill is really telling you.

The L metric is just the sum of the log of the probability of the actual outcome. eg with Pwin=0.9 and Ploss=0.1, a WIN outcome would contribute ln(0.9) = -0.1 to L whereas a LOSS outcome would contribute ln(0.1) = -2.3

In order to get a larger L, our trueskill must not only make a greater number of correct predictions, but make those predictions with higher confidence. ie if it calls Pwin=0.9, Ploss=0.1, a WIN outcome is much stronger in this case than if it had called Pwin=0.6, Ploss=0.4.

That said, I can appreciate that the average FAF player doesn't care about probabilities, he just wants to see something he can relate to. When I get time I'll look at % success rate as a function of these tuning parameters and post results.