A Rating Related Question

One of the ideas I’ve shared in the past few years is that chess moves in a game can be viewed purely as information. Those who frequently produce moves of high quality have very refined, exact, high quality information. At the other extreme are players who make moves essentially randomly, so that the quality of their information is very low. In this context, ratings can be viewed as a measure of information quality, and consistency from move-to-move and game-to-game (or conversely, lack of quality and inconsistency move-to-move and game-to-game.)

This approach, btw, explains why we would expect quick ratings to generally be lower than regular ratings. At the very low end of ratings, the moves are essentially random for either regular or quick, so we would expect those ratings to be similar. At the very high end, the ultra strong players are generally less impacted by faster time controls than more moderate players. So we would expect less impact there. In the middle, players would be more impacted by faster time controls, essentially adding more “randomness” to their moves - resulting in lower Quick Ratings than Regular. This is actually what we observe in practice. This is obviously, not a proof, but is an indication.

But when I saw the above quote from another thread, I began to wonder - if we had a computer that generated legal moves in any position and randomly selected one, what kind of rating would it achieve? Would it floor at 100, or could it do better? And if it does better, then what is it that makes some people play worse than random chance? Maybe in the extreme we can prove false the old adage “A bad plan is better than no plan at all.”

Longest initial drought that I’ve personally seen is a floored 110(P46) that is now 4-76 and above the 137 floor.

There’s some evidence that the longer you give someone to make a decision, the more likely they are to make the wrong one. Testing services advise students that their first impulse is often right. So do some chess coaches.

But that doesn’t necessarily mean quick ratings should be lower than regular ratings, if anything it might argue that they should be higher.

IF you could argue that the higher rated player is more likely to win at faster time controls than at slower ones, then you might be able to say that there would be fewer bonus points injected into an independent quick system due to upsets, and thus lower average ratings.

But you could also argue that in faster time controls the higher rated player is more likely to make a costly mistake, in which case that might suggest that there would be more bonus points injected into an independent quick system, and thus higher average ratings.

But I don’t know which, if either, of these arguments is correct, and there are other factors that might be more significant.

While there is a positive probability that a computer randomly selecting moves might luck into a checkmate, there’s also a positive probability that a monkey banging randomly on a typewriter can write Hamlet. The best you could hope for would be that it would draw if it played someone who similarly had no real ability to create a checkmate. That pretty much describes a player on the absolute floor.

You have used this “randomness” argument before, but it has a serious flaw.

You seem to be saying that, at the low end of the rating scale, randomness does not vary much between regular and quick, because most of those moves are random anyway, and at the high end, randomness does not vary much, because most of those moves are stellar anyway, but in the middle, randomness varies considerably between regular and quick.

OK, so what? Some players’ playing strengths suffer more than other players’, as a result of the faster time controls. When a player in the suffer-more region of the rating spectrum is paired against an opponent in one of the suffer-less regions, the former is likely to perform below expectations and lose rating points, while the latter is likely to perform above expectations and gain rating points. On the average, the ratings remain the same, so there is no ratings deflation.

Or, if both players are in the suffer-more region, the effect will balance out, so again the ratings remain the same on the average.

Example 1: Two players, rated 1800 and 1600 regular, play a long quick-rated match. Both of these players play worse at quick than at regular, so their true strengths at quick are actually 1700 and 1500. But the difference is still 200 points, so the higher still has about 3:1 odds of defeating the lower. Thus, there should be no rating change, in the long run, for either of these players in their match.

Example 2: Same, but now the players are rated 2400 and 1600. The 2400 suffers less than the 1600, so the 2400 exceeds expectations (gains rating points) while the 1600 falls short (loses rating points). On the average, though, the ratings do not change.

So, if your argument “proves” anything, it would be that ratings in the middle of the spectrum tend to decline, while those nearer the ends tend to rise.

In other words, because the ratings game is a zero-sum situation, a reduction in overall playing strength should not result in overall ratings deflation.

Before you point out that ratings are not really zero-sum (because of bonus points, multi-pass calculations, differing K-factors, etc) let me respond that, although this is true, it is also still true that one player’s gain is another’s loss. So, to claim that quick ratings are inherently deflationary, you need a slightly different argument, e.g. that for some reason bonus points don’t work as well with quick ratings as with regular.

Bill Smythe

At the moment, my personal guess (it is not solid enough to be called a theory) is that quick is more likely to suffer from stale ratings (initialized from scholastics or other new players that are more likely to start with faster time controls) and also from the small pool phenomenon (players grow stronger together without much overall rating change because they primarily play each other and quick tournaments are more likely to be small events with a small pool that regularly get together).

The staleness happens when players transition from quick/dual ratings to regular-only ratings and only occasionally play additional quick events. That results in a lot of players that have improved in general and are now stronger at quick than their quick ratings reflect (quick rating average deflation). The small pool is another type of deflation and when those players finally play outside the pool their lower ratings come as a bit of a shock to the other players. If players were as active at quick as they are at regular then I wouldn’t expect much difference in the average.

As far as bonus points go, if quick tournaments trend longer than regular tournaments then the decline in K would mean that a player that did well in a twelve round event might not gain as many bonus points as doing just as well in three four round events (in a 12 game event a 2000 has a K about 19.7 and a 1400 had a K under 29.4 while for four games a 2000 has a K about 24.5 and a 1400 has a K about 41.6).

PS If you get rid of dual rating then I’d expect an increase in staleness (not as much recent activity) but a decrease in the lower initialization levels (quick-only events are more likely to be events that are attached to larger tournaments drawing serious players or to be events at clubs that also draw fairly serious players). Thus removing dual rating might show an overall lower average (a lot of stale and unused ratings) while the active quick players have an average closer to the regular average for those same players.

PPS If the bonus point K factor did not go down for longer tournaments then the bonus point threshold would have to increase even more per round than it currently does to avoid longer tournaments being overly inflationary in awarding bonus points.

I suppose that’s correct. The problem becomes simpler to think about if one considers only final-type positions played randomly.

Your flaw is in this last part.

The comparisons I’m making are between systems and explain difference between systems. You start analyzing within a system and then you get confused in the analysis.

Here is a corrected version of what you were trying to do…

  • Top vs. Middle: When a player in at the TOP end of quick ratings (a comparatively against standard ratings “suffer-less” region ) competes against a “mid-range” player (a comparatively suffer-more region) the mid-range player is more likely to be impacted by playing quick and will play below standard rating expectations, resulting generally in a greater than standard loss of rating points and a rating lower than their standard rating. Keep in mind that K’s aren’t equivalent, so the top-end players stay closer to roughly the same while mid-range players drop. Also keep in mind that players can have volatile results event by event, and game by game (even move by move, although that doesn’t directly impact ratings.) I would anticipate a slide from the top and then at some “cliff point” a steeper drop.

  • Middle vs. Lower: When a player in at the mid-range of quick ratings (a comparatively against standard “suffer-more” region) competes against a “low-end” player (a comparatively suffer-less region) the mid-range player is more likely to be impacted by playing quick and will play below standard rating expectations, resulting generally in a greater than standard loss of rating points and a rating lower than their standard rating (which is what we observe.) HOWEVER, the players at the very low end are so poor as to eventually be random, and there are floors, so while the mid-range drops relative to standard ratings, the bottom will, relative to standard ratings drop relatively little. This is also what we observe.

These are essentially two mid-range players. There’s likely a performance delta between systems, but on a relative basis it tends to be smaller. The result you argue for is close to what I expect.

Incorrect, K is not the same for all. So there isn’t even points exchange.

As a result, your conclusions below are wrong. You even see that, but then hand wave around it instead of thinking it through.

Also you are confused on another point, highlighted below:

I highlighted in red a very incorrect statement. There is no claim that quick ratings are inherently deflationary. On the contrary, if I were to make any claim it would be that the a portion perceived rating decline is REAL, not the result of deflation. (There are deflationary factors - people who play quick, then don’t play for a long time and improve significantly in the interim, and then play again result in the system realizing deflation. This has a definite impact. But it isn’t our focus here.)

Your statement seems to indicate a misunderstanding of deflation. I would suggest you read the analogy here: viewtopic.php?p=273262#p273262

Based on this analogy, each rating system has its own bubble and its own balloon. My argument is that, due to the quicker time control, the overall strength of the quick chess pool is less than the overall strength of the standard chess pool - i.e. the Quick “bubble” is actually smaller than the “Standard” bubble. But since the quick rating system balloon was originally pegged to the standard rating system balloon the resulting loss of rating points appears to be deflation when it actually is not.

I have yet to see any argument that disproves this theory, and what we have observed indicates the theory is likely correct.

My experience with beginning young players is that at first they struggle with making legal moves. It is when they start to figure out a few simple patterns, like the scholar’s mate, that they start to realize that moves have consequences, both good and bad.

Back in the early days of chess computers, the first challenge they had to solve was generating legal moves. Then they had to assign values to those legal moves, and come up with a plan based on those values. In many ways that’s the same process that new players go through as they learn more about playing chess.

In any given position there are usually no more than about 50 legal moves, often far less than that. (This sounds like a ‘Bill Smythe’ question: What chess position has the most possible moves?)

Larry Christensen used to give a lecture on ‘random’ moves, but that’s a completely different definition of random. He classified moves into three categories: good moves, random moves and bad moves. In his system, a random move neither significantly improved or harmed a position. In some positions, there are no good moves, only random ones and bad ones.

Proving or disproving the guess is not going to be an easy task, but it may very well be completely incorrect.

It seems to be based on ratings being a measure of actual strength, as opposed to being a measure of relative strength.

However, the rating system is not designed to measure some absolute strength. It is measuring the relative differences in strength. I remember somebody making an argument that in a closed system where everybody is improving at the same steady rate then their ratings would fluctuate slightly but remain pretty much the some (some bonus points would occasionally be injected from a tournament where a player happens to have his own mistakes overlooked by all of his opponents). The actual improvement in the average strength (not the average rating) is only noticed when an active rated player from outside the pool plays against people in the pool (and the outsider’s rating may drop significantly even though the outsider’s strength is stable in the general populace). Illinois has a number of high school students that obtain K-8 ratings and then play a great deal of non-USChess-rated but serious chess (improving their strength significantly) and then returning to USChess-rated chess after high school while retaining their stale ratings, with the result that rating points are shifted until the returnees reach a rating that stabilizes at their new strength. Bonus points help reach that new level more quickly and also help inject points back into the overall system to keep the overall system fairly stable.

Getting back to the quick system, we have a lot of ratings that were initialized in dual-rated scholastic events and then players who transitioned to primarily regular-rated events. That gave a current measure of their improving relative strength (the changes to their regular ratings) while retaining the lower quick ratings (sometimes very stale and sometimes only slowly changing). They are like players that played in a private regular-rated pool (nation-wide large, but comparatively private in comparison to other more seasoned players) and periodically played against players that were still dual-active. From a ratings standpoint it is the reverse of adults being hesitant to play against under-rated kids since in a quick-only event it is actually kids with fairly current quick ratings now facing adults with stale/undervalued quick ratings.

If regular ratings didn’t exist then the ratings committee would only have quick ratings to deal with and would be tailoring the bonus threshold to inject points back into the system to maintain a stable average for a specifically tracked age group. Since regular ratings do exist that bonus threshold is already set and is apparently too low to maintain quick ratings (but maintaining that system is not the goal of the ratings committee).

If ratings were ever able to be changed to measure absolute strength instead of relative strength then nobody would expect them to be the same between regular and quick (people would expect both systems to rise as chess theory and training increases). As long as they continue to measure relative strengths there will be questions for why they are different - and I feel the primary answer is due to the relative staleness of many players’ quick ratings that were initialized and first stabilized when they were weaker.

This is not at all the case. The analogy above which demonstrates the underlying thought process, shows that.

Our experiences with closed or nearly-closed pools has been that they tend to be somewhat inflationary, occasionally very inflationary. That’s consistent with the statistical theory behind the ratings system, which assumes a large pool of players.

We’ve also seen ratings manipulation by having lower rated players, especially new ones, gain lot of points defeating higher rated players, then start losing games to give other players more points. Variable K and bonus points both factor into that. Several of these ratings manipulation schemes occurred in QR-only games, but probably because faster time controls allowed them to get more games in. (That assumes the games were actually being played, which in at least one instance was questionable.)

The underlying premise of the balloon and bubble analogy seems to be that a particular strength should be tied to a specific rating. That is the premise behind one of the analogy’s four rock-solid 1500s improving to 1700 strength and thinking there is something wrong with final ratings of 1650, 1450, 1450, 1450. For that matter, if the four 1500s started playing quick chess and three of the four were unable to maintain their 1500 strength and played like 1300s the final ratings would again gravitate towards 1650, 1450, 1450, 1450. That is because the rating system measures relative strength, not actual strength. It is an erroneous assumption that the players should be recalibrated by something outside the system to match a certain strength.

The variable-K and bonus points are used to get a targeted age group to have a particular average rating (average relative strength - different from an average actual strength). That helps counteract the natural deflation caused by improving players that were initialized at low ratings. I would expect that the current opportunities for players to improve rapidly, in conjunction with the additional opportunities experienced players have (computer analysis and on-line play) means that an 1800 player of today is noticeably stronger in actual strength than an 1800 player of 25 years ago and if ratings were based on some fixed strength then the adult average would be steadily increasing.

Now it comes time to look into why the quick system has a lower average rating than the regular system even though the rating system only measures relative strength.
In the quick system there is a higher percentage of active players that are young and still working on moving up from their initially low quick rating and a significant percentage of players that have become quick-comparatively-inactive while remaining regular-active so that their quick ratings are more stale than their regular ratings. If the quick system was used as much as the regular system then it would work well as a measure of relative ability. Granted, that measure of relative ability may mean that the average quick-time-control 1800 plays as accurately as the average regular-time-control 1800 from 1995 and significantly less accurately than the average regular-time-control 1800 of today.

If we looked only at the players who initialized their quick ratings as adults, who primarily played other adults in quick events, and who remained as active in quick as in regular, the the quick and regular ratings would like vary a lot less than the average variance. Two years ago my quick rating was 37 points higher than my regular rating and now, after increasing my regular rating by 32 points and after the infusion of some rapidly improving kids into my quick events, it is still only 27 points lower.

Variable K and bonus points aren’t really age-related factors, because age doesn’t directly figure into their computation. It does VERY indirectly with regards to initializing ratings based on age.

It is true that younger players are more likely to have lower ratings, in part because of the age-based initialization formula, but also because they’re still learning.

Variable-K talso urns ratings into a non-zero-sum system, because the lower rated player will have a higher K.

However, there are players who don’t come into the ratings system until they’re adults and they can wind up with low ratings as well.

I was focusing on the still learning aspect because young players are more likely to start when they still have a lot to learn and thus end up receiving rating points from the players they beat as they are improving. On average the starting adults are more likely to have already learned a fair amount and start at a higher skill level or are less likely to have the time available to spend a lot of it on improving.

I have no idea how you got there. That’s completely false. The bubble represents the aggregate strength of a pool. There’s no breakdown to individual ratings.

We can consider an illustrative case of 1 to illustrate the concept of deflation, but within the current context, that’s clearly not the case. Considering 1 is illustrative only.

There are times, Jeff, when I need to explain the concept of mortality risk to someone. If we do it with many people, it’s impossible to follow all the resulting cash flows - there are too many. So the concept is done on one - but clearly we don’t have 10% of a person die in a year. Sometimes concepts are illustrative.

Your second argument misses the point. There is a second rating system for the four players. The second system was only initially pegged pegged to the first, it isn’t replicating the first (if it were, there would be no need for a second system.)

Because of this, its not at all erroneous that the players’ ratings should be recalibrated by an outside impact - although I have to say, I’m not sure that I completely understand where you are going with this point.

I’m failing to see how the above is relevant to the discussion.

Except that what you describe doesn’t match what we see, Jeff. Look at the graphs Nolan did long ago.

If the data changes, I’ll need to change my opinion. Right now what you suggest doesn’t fit. I’ve already acknowledged that it is a contributing factor - but it can’t be the sole factor.

I think it would be very challenging to re-calibrate ratings via an external chess knowledge testing process. That’s not to say it would be an impossible task.

Part of the challenge is that people don’t learn chess skills in any particular order.

I once knew someone who could do the knight-bishop-king vs king mate with ease, but he’d consistently lose to 1800 players who couldn’t do that mate.

I’ve moved a discussion about the largest number of legal moves in a position to Largest possible number of legal moves in All Things Chess.

Everyone in this discussion should read Elo’s book. Especially the parts on calibration and the duty of a national rating system to maintain that calibration.

Cliff’s Notes version: Elo tried to establish the rating of 2000 as the approximate boundary between “amateur” and “professional” players. This is the ONLY anchor point in a system which otherwise floats freely. He left undefined precisely HOW to do this - but it’s clear that he intended that this point on the scale was “obvious” to the skilled observer. He also clearly stated that maintaining this anchor point (and, more generally, keeping ratings stable over time, was one of the responsibilities of a national rating system. Again, he didn’t say how to do that.

Note the use of the word “national” which implies a large rating pool, with a long history. Small pools are much harder to calibrate and keep stable. The main practical solution is to calibrate against a larger, more stable pool.

I don’t know if Elo anticipated what computers would do to chess knowledge. I think it is likely that today’s 2000 players know more about the game than did the Experts of his day.

I remember an article in Chess Life on Art Bisguier, describing him as an ‘ordinary’ grandmaster. I never actually played a game against Art, but I watched him hold court in the skittles room, he knew the game VERY well.