I believe that there are probably some rated players who are better than their rating and are competing at a lower level then they should. I think that the USCF should make a rating test to what a persons real rating should be and take the score and change their rating so it reflects on what type of skill they are so that we don’t have to deal with mis-ratings of players and such.
USCF already administers such tests. They are more commonly known as “tournaments”.
Even if such a creating such a test was possible, how can it be administered? What would keep the player from deliberately answering questions incorrectly or just randomly fill in the blanks? How do you address the problems that ordinary standardized tests face? The skill sets for individual players in every class are so diverse that it is not that easy to establish clear criteria to identify a player’s class or performance except through competition.
Each player has strengths, weaknesses, and gaps in his/her knowledge. For example, a class B player might be very good at tactics or have a good understanding of the opening but be poor in endgames. A C player might be poor in tactics, but is comfortable in simplified positions, enough so to defeat that B player, but he might go down heavily against his peers who know his weaknesses. Ratings and class designations are imperfect measures of the individual’s knowledge. So are standarized tests. At best, they indicate a wide macro range of knowledge and skill. Not only are they imperfect measures of where you stand in the present, they are equally imperfect in predicting future performance. That is why we play the games rather than fill in bubbles.
Moderator Mode: Off
The rating system we use is a performance rating system. It measures a person’s performance in rated Chess games. Chess performance in a game is just that and nothing different.
A person may have a lot of knowledge about Chess. He may know the best way to analyze a position or find the sharpest and best tactics in a position presented to him. This same person may not be able to perform in a complete chess game to the level of his knowledge. His performance rating will be lower than his knowledge would predict. This person would perform much better on a test than he would in a Chess game.
There are some that play very well and have little knowledge of the game. These players have higher ratings than their knowledge would predict. This person would perform much worse on a test than he would in a Chess game.
Because Chess game performance is not the same as Chess knowledge, a Chess Rating Test can not be accurate to assess a person’s rating.
After I read the top post in this thread, I thought surely Micah Smith must be at it again.
Bill Smythe
Yes, but we could have mock tournaments to determine the true ratings.
Ratings reflect performance, not strength. While there are always some issues with any rating system (deflation, for example) that need to be addressed by monitoring and correcting the system the concept that someone is “better than their rating” is fundamentally flawed. They ARE what they ARE.
That doesn’t mean that someone’s performance measures up to their knowledge, or tactical ability, or…etc. Performance ecompasses many factors - of which knowledge is just one.
Back when I played chess regularly, twenty years ago, I would occasionally catch much higher rated players napping with move sequences which were over the horizion they credited me with. Today that happens much less often. I credit the near universal custom of playing blitz and rapid chess intensively starting at an early age and continuing into middle age for the change. The pattern recongition skill of the average player has improved strongly.
I think there is a big difference between over the board skill of players and the ability reason through complicated positions. They are both important in playing chess well. Knowledge of the “Books” adds a third factor. I belive that we will always have days when we hit a sequence of games that fall in our abilty zone and days in which our opponets make strong moves we just don’t understand to our regret.
The current rating system may not be perfect but it does have the virtue of updating basis. To apply a “test” would be like using correspondence ratings to assign over the board ratings. Apples and Oranges anyone???
Lockley
Do you know somebody who is underrated?
Nominate them for higher rating!
/end sarcasm
I can see the cause of OP’s discomfort. I have the answer to this vexing problem.
At the end of each game, the players mark three results (in different columns of course) on the pairing sheet:
- the game result
- the rating that each player played at. You mark how well you played and how the opponent played by assigning a rating to each.
We’ll use the game results for determining tournament winners, but to avoid any appearance of “conflict of interest”, we will use only the subjective ratings in determining a player’s performance rating. For simplicity, consider performance rating being calculated on a game by game basis as
Rp = (Rs + 2 Ropp)/3
where
Rp is performance rating
Rs is self-report of your own playing strength in the game
Ropp is opponent’s report of your playing strength in the game.
These will be averaged and used to update the published rating according to the old Elo formula. As players learn how to estimate playing strengths and the volatility of their ratings decreases, their k factor will decrease, same as it does now.
The new approach has several benefits:
(1) It’s real easy to calculate.
(2) It is expected to be mildly inflationary, in case we need any more rating inflation. Players tend to overestimate their rating, and especially if the player won, his opponent the loser will probably overestimate the winner’s rating as well.
(3) It is expected to be immune to sandbagging, since the player’s self-report is only half as important as his opponent’s estimate.
(4) It avoids the nonlinearity, the curved response to rating difference in the current rating formulas or the sharp cutoff at a rating difference of 350 in the old formulas.
(5) and too many more to mention.
The EB hasn’t said exactly when they are scheduling the formal vote on this plan yet, but maybe they want to let the guys on the Ratings Committee get their expected objections out of their system (this will cancel out a lot of what those stats guys have built, and they’ll probably tell us that we have to listen to FIDE’s objections, and whatever), before they go ahead and pass this much needed reform.
Strictly as a tangent…
I am surprised that Fritz does not already have a feature that reports its estimate of the Elo ratings for both White and Black for any one game that Fritz is asked to analyze via “Blunder Check” or “Compare Analysis” etc.
I feel confident that Fritz will have such a feature eventually.
The feature would be less complicated than many programming feats they have already accomplished. The feature could leverage games/data from actual tournaments, where Elo’s are known.
There would be some value on being able to objectively compare the ratio of Elo rating to Fritz-assessed playing skill over a decade or two, to check for Elo inflation etc. Of course, this would be possible only if the feature offered, in 2042, to analyze with the same algorithm that was first used in say 2027.
An adult who considers return to tournament chess two decades after retiring from chess as a new father, might like Fritz to tell him how much his chess skills have declined, before he decides when to reenter tournament play.
(in all seriousness this time …)
This is an intereting question, a sort of obvious-sounding one. An obvious-sounding research question, that you hadn’t thought of and that is useful to answer, is a fantastic research topic! But if this one been answered in useful form, I haven’t seen it:
How can a computer estimate your playing strength in one game or a few games?
That requires evaluating your rating from your moves, whereas our rating system is much easier and just estimates it from your results.
CL’s solitaire chess column purports to estimate your rating strength based on your score guessing the moves of a game. That’s just what we need, but how do you automate it?
In the case of Fritz or other computer program, the most obvious input data would be one or a few games against Fritz. Again it’s not enough to say “you beat me so you get my rating + 400” and most especially not “I beat you so you get my rating - 400”, because there are a lot of people who can only dream of being 400 points weaker than Fritz. It has to evaluate your moves like Pandolfini does.
One approach is to use the computer’s position evaluation ability, then equate its conclusions to a rating strength. The computer does this evaluation already in its internal calculations, for pruning the variation tree and evaluating positions at the end of variations. The evaluation is supposed to be holistic, incorporating “positional” considerations (pawn structure, piece activity, etc.) as well as material, though in some programs it might be very rudimentary but compensated by pure brute force of calculation speed.
So anyway if you’re playing Fritz, and before your move your position (as evaluated by Fritz) is -0.5 pawns (you have a 0.5 pawn disadvantage), and after it’s dropped to -1.3 pawns, then you’ve lost 0.8 pawns via your move. If you lost 0.8 pawns every move until you were checkmated, what rating level of play does that correspond to? I don’t know, but it might be a way to start on this question.
How about a series of individual tests:
Knowing how the knight moves: 400 points
Calling it a ‘horsie’: - 300 points
Fritz could today be fed a .PGN of thousands of class level games played in 2012, where the Elo ratings of all players are included in the .PGN.
For each move, in automated analysis mode, Fritz could compare its eval (position evaluation number, in centipawns) of its best move versus the eval of the actual move notated in the .PGN.
The eval differences would have relatively distinctive degrees of difference or other patterns. Fritz would be programmed to recognize these differences, along various scales or whatever.
These results would form an eternal standard benchmark database.
Then decades later…
Then 50 years later, a chess player gives Fritz a .PGN of his latest 10 games, and clicks “Guess My Elo”. His eval difference degrees and patterns are calculated, and then compared to the standard benchmark database of patterns generated 50 years earlier.
That’s an interesting line of research with several possible fruitful outcomes. You need several because you need a couple breakthroughs to make this work.
At the gross level, I’d say chess is too complex for this to work. So patterns in the patterns will have to be identified, by a lot of datamining. Those patterns will be interesting.
Evaluating player strength by doing computer evaluation of their individual moves is exactly what Ken Regan has been working on for the last few years. Search for “Ken Regan IPR” and you’ll find a bunch of papers.