Rating System Formula Changes

nolan · May 13, 2013, 10:25pm

In case people miss it, though it is on the home page:

Smythe_Dakota · May 14, 2013, 2:53am

Thanks. I usually skip the home page, and go straight to the forums, or to player/rating lookup, etc.

Bill Smythe

DENTONCHESS · May 14, 2013, 3:39am

So, as I understand it, the volatility of ratings change depending on performance,
could increase by as much as 40% for those rated 1900-2100. Interesting. It could
mean both more of the old guard crashes to their floors, and also, that we have more
on their way to becoming masters than ever before. Further, that there is a decrease
in the K factors for those above 2200, meaning slower growth upward, but also, that
decline is slowed as well. So many variables to consider. Much food for thought.

Rob Jones

ericmark · May 14, 2013, 2:18pm

The decrease in K for masters only applies to games played at Dual-rated time controls.

For Regular-rated-only games, K will go up slightly for 2200-2360 players, and stay the same for players above 2360…if I read the chart correctly.

nolan · May 14, 2013, 2:54pm

I believe the upper bound is 2355.

pminear · May 16, 2013, 12:17am

Curiosity - what is the motivation for the first part of this change (raising the K-factor for sub-2355 players)?

nolan · May 16, 2013, 2:17am

There are several ways to explain it, here’s mine.

Under an earlier, much simpler, set of formulae for the ratings system, players under 2100 were essentially in a battle for 16 points (for a win) plus 4% of the ratings difference, up to another 16 points. ((That’s a linear approximation, but it is close enough for discussion purposes, and it was fairly easy to estimate your new rating on the back of an envelope, at least to within a few points. It is difficult to do all the computations in the current formula without a lengthy computer program. I once wrote an Excel spreadsheet to rate a quad, it took me several hours to get it working.)

That total (32) was called the K.

For players 2100 or higher, K was reduced to 24.

For players 2400 or higher, K was reduced to 16.

When the formulae were changed in about 2001, K became a variable factor based on someone’s ‘effective games’ and rating. Rather than a three-step function for K, the higher your rating, the lower your K.

High rated players (especially 1700-2099) felt, rightly or wrongly, that it was harder to gain points under the newer, much more complex, set of formulae.

The changes to the ‘effective games’ computation results in a higher K for players under 2355. This will mean bigger ratings gains for strong performances and, yes, bigger ratings losses for weak performances.

dfan · May 16, 2013, 1:49pm

Thanks for the alert. I wish this had been implemented a few months ago, before I had a few great tournaments…

Does anyone know when/if the Rating Estimator page (http://www.uschess.org/content/view/9177/679/) will be updated to use the new formula?

nolan · May 16, 2013, 2:01pm

Updating the ratings estimator program (written by George John in Javascript) to use the new ‘effective games’ formula should not be too difficult, but it is not clear how to deal with the other change to the formula, which only applies to players rated 2200 or higher, and then only to regular ratings in dual-rated events. (A checkbox to invoke the reductions in K might work.)

We will (presumably) need to create another version of the program with the ‘current’ formulas, with links to past versions for those who want to check events that ended before May 18th, 2013.

We may want to wait until the Ratings Committee has decided if the bonus factor needs to be adjusted again before modifying this program. (That decision is usually made around this time of the year. Perhaps if any adjustment is deemed advisable it too can be made effective on May 18th, so that we don’t need a separate version of the Ratings Estimator to cover a few weeks of time.)

This also may not be the last set of changes made to the ratings formula this year. Any changes to quick or blitz ratings, e.g., to deal with stale ratings, may introduce complications that are beyond the capabilities of the ratings estimator program to deal with.

dfan · May 16, 2013, 3:28pm

At first glance this looked kind of crazy. Increasing volatility by 40%, really? But you can also look at it as taking the curve of K-factor vs rating and moving it to the right, rather than moving it upward. So you can either say “a 2000 now has a K 40% bigger than it used to be” or say “a 2000 now has the same K that a 1750 used to have”. I actually hadn’t really been aware of the K curve (I knew that higher-rated players’ ratings changed more slowly but I thought it kicked in at 2200 or something). Now I understand one reason why my rating wasn’t shooting up quite as fast as I expected.

I guess anything that increases volatility will inflate ratings a bit, since players will bounce off their floor more, which donates ratings points to the global pool. I’m sure Glickman et al have taken this into account, though. I know that they were actively trying to inflate ratings for a while to counter some earlier perceived deflation.

nolan · May 16, 2013, 3:48pm

As someone who has looked at the raw data a lot, it is far from clear to me that the 2001 formula changes impacted players in the 1700-2200 range as much as those players think it did. (But perception is often more important than reality, especially when it comes to making decisions with political implications.)

Yes, I’m sure someone could list a dozen players for whom the data suggests they were held down by the variable K. I’m sure someone else could list a dozen players for whom there was no discernible pause in their rating.

The Ratings Committee was charged by the Delegates some years ago with the task of ‘reinflating’ ratings to their 1997 level. That task is not yet complete, hence the annual review of the bonus factor. For more details on how this works, I suggest people look at Mark Glickman’s reports to the Delegates, about 20 years worth of which are on Mark’s website, glicko.net.

dfan · May 16, 2013, 3:50pm

Oops, I just checked the 2013 Ratings Committee report, and they were against it:

(Said untested formulas have at least been tested since, apparently.)

nolan · May 16, 2013, 5:12pm

Well, as Goichberg pointed out, the effective games formula was tested, just not all the actual parameters for it, since they changed the upper bounds. Whether that change will have much effect, possibly in conjunction with the lowered K for masters in dual rated events, is still being tested on a backup server along, hopefully, with a measurement of the impact those changes will have on the reflation task. We hope to have that data to the ratings committee by early next week.

pminear · May 16, 2013, 10:05pm

nolan:

There are several ways to explain it, here’s mine.

Under an earlier, much simpler, set of formulae for the ratings system, players under 2100 were essentially in a battle for 16 points (for a win) plus 4% of the ratings difference, up to another 16 points. ((That’s a linear approximation, but it is close enough for discussion purposes, and it was fairly easy to estimate your new rating on the back of an envelope, at least to within a few points. It is difficult to do all the computations in the current formula without a lengthy computer program. I once wrote an Excel spreadsheet to rate a quad, it took me several hours to get it working.)

That total (32) was called the K.

For players 2100 or higher, K was reduced to 24.

For players 2400 or higher, K was reduced to 16.

When the formulae were changed in about 2001, K became a variable factor based on someone’s ‘effective games’ and rating. Rather than a three-step function for K, the higher your rating, the lower your K.

High rated players (especially 1700-2099) felt, rightly or wrongly, that it was harder to gain points under the newer, much more complex, set of formulae.

The changes to the ‘effective games’ computation results in a higher K for players under 2355. This will mean bigger ratings gains for strong performances and, yes, bigger ratings losses for weak performances.

Thanks for this explanation. I wonder if the same people who will enjoy seeing faster rating gains will also enjoy the faster losses?

Also interesting to hear that one of the goals of this new system, which seems clearly inflationary as others have pointed out, is to re-inflate ratings back to the pre-1997 level. Is this inflation desired partly (or perhaps entirely) for marketing reasons?

I am reminded of an old-timer at a local chess club who has been inactive in USCF play since 1970, but believes that his 1700 rating from forty years ago would be worth more in today’s rating scale, due to inflation since then. Can anyone here with knowledge of ratings both then and now comment on whether this is actually true? If it is, how much have USCF ratings inflated in the past forty years, and due to what factor(s)?

panchess · May 16, 2013, 11:30pm

Depends on the pool. The USCF median is much lower now than in 1970, but there are far more players as well, so it is easier to find your more appropriate ‘point on the curve.’ Becoming a Master in New York City is more likely than in North Dakota because you have a lot more opportunity to do so, both in terms of tournaments and players to get points from.

nolan · May 16, 2013, 11:38pm

The task of reflating the rating system to the 1997 levels has been one the ratings committee has been working on for over a decade. The extent to which these new formula changes will impact that task is yet to be determined. It seems likely that they will speed up the rate of inflation, but possibly not (as much) for the core players that are the focal point of their measurement process.

I don’t think there is any objective way to compare the skill set of a 1700 player from 50 years ago to that of a 1700 player today. Was Mickey Mantle a better clutch hitter than [your favorite current player]?

wintdoan · May 17, 2013, 4:11pm

The rating formulas went through many changes over the years, typically alternating between inflation and deflation. Sometimes changes were made based upon misconceptions of how ratings should work. For instance, a not insignificant number of people firmly believed that ratings in population were supposed to be roughly Normal with a mean of 1500, so when the mean dropped below that, we had to “inflate”. When it was clear that was wrong, there were conscious efforts to deflate. Prior to the major changes in 2001, the only “counterdeflationary” mechanism in the system were the 200 point floors, which don’t really put many points into the system. The 1997 benchmark was chosen by consensus as a time when ratings seemed to be at the Mama Bear level. (That level couldn’t have been maintained with the rating system in place at the time because the counterdeflationary force then were 100 point floors which caused huge distortions in local pools).

kevin_bachler · May 17, 2013, 4:32pm

wintdoan:

The rating formulas went through many changes over the years, typically alternating between inflation and deflation. Sometimes changes were made based upon misconceptions of how ratings should work. For instance, a not insignificant number of people firmly believed that ratings in population were supposed to be roughly Normal with a mean of 1500, so when the mean dropped below that, we had to “inflate”. When it was clear that was wrong, there were conscious efforts to deflate. Prior to the major changes in 2001, the only “counterdeflationary” mechanism in the system were the 200 point floors, which don’t really put many points into the system. The 1997 benchmark was chosen by consensus as a time when ratings seemed to be at the Mama Bear level. (That level couldn’t have been maintained with the rating system in place at the time because the counterdeflationary force then were 100 point floors which caused huge distortions in local pools).

Tom - I’m sorry - Mama Bear level? Not quite sure I get that – that means that ratings were too low? (I would think Papa Bear - Too inflated, Mamma Bear, too deflated, Goldilock’s Principle (Baby Bear level) - Just right.)

ericmark · May 17, 2013, 4:56pm

wintdoan:

The rating formulas went through many changes over the years, typically alternating between inflation and deflation. Sometimes changes were made based upon misconceptions of how ratings should work. For instance, a not insignificant number of people firmly believed that ratings in population were supposed to be roughly Normal with a mean of 1500, so when the mean dropped below that, we had to “inflate”. When it was clear that was wrong, there were conscious efforts to deflate. Prior to the major changes in 2001, the only “counterdeflationary” mechanism in the system were the 200 point floors, which don’t really put many points into the system. The 1997 benchmark was chosen by consensus as a time when ratings seemed to be at the Mama Bear level. (That level couldn’t have been maintained with the rating system in place at the time because the counterdeflationary force then were 100 point floors which caused huge distortions in local pools).

Peak-rating-based floors changed from 1xx to 2xx in 1997. (As of 1/1/97, I think.) A chess papa bear of the day who attended the 1996 U.S. Open told me the Delegates approved the change since “lots of folks are banging against their floors real hard, especially the old guys.”

It did not take long for some players to sink below their 1xx floor—including me. I recovered eventually, but I know lots of players roughly my age and playing strength who are stuck between their floor and “floor + 50 points” for good. Would be interesting to see stats on players who peaked in the 2000s or 2100s but now float between 1800-1850 and 1900-1950, respectively.

Some of these guys will jump for joy over the higher k factor…at least till their first terrible result. Of course, that’s what the floors are for.

wintdoan · May 17, 2013, 6:58pm

I guess I watched too much “Fractured Fairy Tales”.

Topic		Replies	Views
New rating system RFP Running Chess Tournaments	20	439	August 2, 2023
Rating deflation & the New Rating Formula Running Chess Tournaments	5	349	December 23, 2004
Ratings Question Running Chess Tournaments	12	504	August 22, 2006
deflation? 16 vs 10 for an equal win? Running Chess Tournaments	20	1196	July 30, 2005
Rating inflation? Running Chess Tournaments	32	2673	October 10, 2018

Rating System Formula Changes

Related topics