Rating inflation?

Allen · June 4, 2018, 12:23am

sloan:

relyea:

I believe, and please correct me if I’m wrong, that Mr. Smythe is suggesting that if we were to, say, add 200 points to everyone’s rating (so that those lifelong experts could call themselves masters, for example) it would backfire as a marketing tool.

Alex Relyea

Mr Smythe has the advantage of having witnessed history.

This has been tried. It failed. It also created a huge mess, which took nearly a decade to unwind.

Yes - the USCF Rating System is one of the best marketing tools that US Chess has. But, the point which has stood the test of time is: it’s only useful if it is accurate. To the extent that the Rating System is manipulated for short-term political or (perceived) marketing advantage, it becomes the object of derision and DETRACTS from the organization.

In my opinion, the recent changes made by the EB were ill-advised. The main reason that I don’t worry TOO much about them is that we now have an established, understood, and respected mechanism for monitoring and CORRECTING any gross deviations from accuracy and stability.

I don’t think there have been any plans or proposals over the last 7 years to simply add or subtract ratings points in bulk.

I asked Ken what he meant by recent as I was not aware there had been recent changes in the ratings formula. He has told me it was 2013.

BIRDSNAN · June 6, 2018, 6:44pm

I believe that the intent of the current rating system is to get you to your current strength as quickly to prevent deflation?

I have also seen 2300-2400 players return from the 70’s and settle around 2150. Age, style, ability to change?? Also, I believe it amount of info avail

nolan · June 6, 2018, 10:55pm

To quote from Arpad Elo’s book:

The obvious purpose of any rating system is to provide a ranking list of whatever is being rated. In a competitive activity like chess, tournament standings provide tentative rankings, but because individual performances vary from time to time, a ranking list based upon a single event is not always reliable. Furthermore, it may be necessary to compare performances of players or teams who never met in direct competition.

A rating system therefor attempts to evaluate all the performances of an individual or team on some sort of scale, so that at any given time the competitors may be listed in the probable order of their strength. Furthermore, a proper rating system should go a step beyond mere ranking and should provide some estimate of the relative strengths of the competitors, however strength may be defined.

Individuals with not very many games or whose strength is changing (usually improving) will be less reliably ranked. There’s not much that can be done for a player with only a handful of games, statistical theory is based on having a sufficiently large sample, with a small sample reliability is not assured. For players whose strength is changing (usually improving), there are factors in the current system that can move those players towards more reliable ratings, eg, the bonus factor and the factors that contribute to a player’s K.

My personal observation after having worked with the US Chess ratings system for about 15 years, is that it takes about 25-30 games for someone’s rating to catch up to their current strength. The challenge with rapidly improving players is that their strength keeps changing, so you’re chasing a moving target.

The changes that were made to the ratings formula back in 2013 appeared to result in overshooting someone’s new strength (which would result in over-rating an unusually strong performance for established players), which would be somewhat inflationary. Lowering the bonus factor appears to be bringing that under control.

I’ve told this story before, but some years back there was a young player who was clearly about 1300 strength in December, based on his play in one of my tournaments. But by late February he was drawing and occasionally defeating A players and experts, though his rating did not yet reflect that. He went on to finish in the top 10 at the National Elementary in April. By December he had an expert rating and the following spring he was one of the six competitors in the Nebraska Closed Championship. (I don’t think he won that year, though I think he did the following year, when the average rating in the event was around 2176.)

jwiewel · October 1, 2018, 6:07pm

Using the ratings calculator seems to indicate that K is also related to the length of the tournament (going down as the event gets longer). Thus the K is lower in a good 9-round tournament including a round one win over a player more than 650 points below you, versus the same tournament with only eight rateable rounds after getting a forfeit win over the same opponent. The 9-0 result can give less of a rating boost than the 8-0 result that doesn’t include a forfeit win. Since you cannot rate a game for only one of the players it doesn’t look like there is any legitimate change to make to the process.

Also, it would not be a common occurrence where a tournament is long enough and the K change is large enough so that the difference would be particularly noticeable. Even using an 8-0 result (instead of 9-0) would only raise a rating by maybe as much as 8 points (before any bonus, so 16 afterwards) and those type of results would probably happen much less than merely very rarely.

nolan · October 1, 2018, 9:20pm

K is based on both effective games and the number of completed games in the current event.

While K goes down as the number of completed games goes up, the potential for a ratings change often goes up because the maximum actual score goes up more than the expected score.

jwiewel · October 1, 2018, 11:17pm

When the K drops from 21 (8 games assuming a forfeit win) to 20 (9 games with an extra win versus an extra expected score 0.975 in that extra win) you end up with a noticeably smaller gain from the 0.025 you exceeded the expected score than from the extra 1 point of K you would have been given over the remaining 8 games if that first game had been a forfeit win.

This would be an unusual situation and not something you’d normally have to consider worrying about. On top of that, if you do consider worrying about it then one of the likely ways to try to handle it (skipping that outlier game for that person) would run into a brick wall of requiring both players in a game to have a rated result from that game (and I don’t think we even want to begin considering a change in that).

It is, interestingly enough, an outlier that helps reduce rating inflation from unusually good results.

nolan · October 1, 2018, 11:27pm

In order to have an expected winning percentage of .975, you need to be rated about 650 points higher than your opponent. I wouldn’t expect such a win to improve your rating much under any circumstances, so it doesn’t particularly bother me that the gain from actual - expected is less than the drop caused by the decrease in K.

kevin_bachler · October 2, 2018, 3:22am

I prefer a Special K. It’s great for breakfast.

DENTONCHESS · October 9, 2018, 11:31pm

sloan:

relyea:

I believe, and please correct me if I’m wrong, that Mr. Smythe is suggesting that if we were to, say, add 200 points to everyone’s rating (so that those lifelong experts could call themselves masters, for example) it would backfire as a marketing tool.

Alex Relyea

Mr Smythe has the advantage of having witnessed history.

This has been tried. It failed. It also created a huge mess, which took nearly a decade to unwind.

Yes - the USCF Rating System is one of the best marketing tools that US Chess has. But, the point which has stood the test of time is: it’s only useful if it is accurate. To the extent that the Rating System is manipulated for short-term political or (perceived) marketing advantage, it becomes the object of derision and DETRACTS from the organization.

In my opinion, the recent changes made by the EB were ill-advised. The main reason that I don’t worry TOO much about them is that we now have an established, understood, and respected mechanism for monitoring and CORRECTING any gross deviations from accuracy and stability.

And so has the idea of crashing ratings to satisfy the statistical purists, NO pristine “accuracy” should NOT be the key goal. For in the pursuit of this
hundreds of seniors quit when their ratings were slashed. I too, have the advantage of witnessing history. The point is “balance” is the key.

Rob Jones

mregan · October 9, 2018, 11:59pm

Rob,
Balance is the goal of the rating system. The method for measuring inflation/deflation is explained in the Delegate’s Call. Any recent changes are very small.
Mike

nolan · October 10, 2018, 12:29am

The system has been inflating a bit since the 2013 formula changes, which is why the bonus factor has been increased several times since then.

DENTONCHESS · October 10, 2018, 1:25pm

Numbers and data cannot have a goal. People establish goals based on an itinerary and their personal beliefs.

Rob Jones

Topic		Replies	Views
Ratings All Things Chess	7	548	December 10, 2015
USCF Rating Deflation?? Running Chess Tournaments	10	670	April 16, 2005
Rating Question? Running Chess Tournaments	10	894	December 3, 2004
Rating deflation & the New Rating Formula Running Chess Tournaments	5	349	December 23, 2004
Thanks!! Running Chess Tournaments	2	357	June 6, 2008

Rating inflation?

Related topics