ELO Rating Scale

Without getting too technical, is there an absolute or theoretical maximum rating acheivable in the ELO rating system?

Nope, not unless the people running the rating system impose it arbitrarily.

If a new player defeats three 2600’s out of four, his rating comes out about 2800. If a new player defeats three 2800’s out of four, his rating comes out about 3000. etc etc etc. So – 3200, 3400, 3600, 3800, etc.

Bill Smythe

If your wondering about the Elo strength of high level chess engines, here’s a bit of a primer.

  1. Chess engines are no longer rated in engine -vs- human, to get an official ELO rating. As far as I know, that hasn’t really been done since the 80’s or maybe early 90’s. When Fritz 5 came out, it kinda closed the door on Engine -vs- Human matches, but when Rybka and other similar engines came out, it slammed the door shut. Couple of interesting notes here: I think it was a British grandmaster that did the last “Human -vs- Engine” match, and lost badly. Not sure of the year, but at least 5 or 6 years ago, maybe even more. I forget who played, but another GM played a series of games in which he got a pawn odds, and he lost badly also. I remember that was Rybka 3 though.

  2. Chess engines are run with various time controls, but there a few paramaters to make the relative ELO’s somewhat measurable. General, the time controls are fairly quick, although there are some slower controls, none are nearly as slow as what you’d find in a high level FIDE tournament.

  3. In order to both measure the relative strength of a chess engine, there’s has to be some sort of yard stick as a “base” ELO. Whichever chess engine gets to do that particular job is given an arbitrary ELO. That can differ between the various websites that rank chess engines.

2900 ELO seems to be the prefered “yard stick”. So programs that do better than the yardstick will have an ELO higher, and programs that do worse will have a lower ELO.

The Sagarin ratings for sports teams in USA Today use an Elo Chess system based on 100 at the top and 0 at the bottom. There is no real reason they couldn’t use 100,000. The numbers themselves don’t mean anything. It’s the relationship on the scale that does.

In his book, Elo described his rating system as “an open-ended floating scale” (p. 18) and thus theoretically a rating can go below zero and above 3,000.

Steven Craig Miller

At one time I do believe there were negatively rated players in USCF. I believe that’s part of what caused us to introduce a floor of 100.

During the scholastic boom, several big clubs (I know the Rochester Chess Center was one) would have waves of players come into the system in batches. Each batch would lose to the more experienced players above them for the most part, leading to lower and lower ratings.

I remember asking at a NYS Chess Association meeting what the difference was in playing strength between a player rated 50 and one rated 185. There simply wasn’t any to speak of.

Of course, but then both players fall in the same class. Elo wrote: “When all the participants in a tournament fall in the range of one class, good all-around competition results. No one is badly outclassed, and no one badly outclasses the field. In such a class the poorest player on a good day will play about as well as the best player on an off day” (p. 17).

Furthermore, technically speaking the Elo rating system doesn’t measure “strength” (as an internal quality of a person) but rather it measures performance. Thus a player with a rating of 185 simply performed better than a player with a rating of 50 (assuming that both players’ ratings are based on a sufficient number of games).

Steven Craig Miller

I don’t have a statistical argument to back this up, but I would guess that at a low enough rating, the players are making almost random moves, and the outcome of a game is more a random event than a demonstration of greater chess skill by one of the participants.

In looking at the Chessbase’s Mega Database and the games with ratings between 100 to 200, there was nothing “almost random” about those games. Actually, I was surprised. I teach High School chess, and I’ve seen my share of “almost random” games. Perhaps the games in this database are not truly reflective of most people with a rating below 200. When I looked at games played between players with a rating between 200 and 400, I saw many more truly horrible games, but I would not characterize them as “almost random moves.” On the other hand, I can understand you wanting to call the outcome of many of these games “a random event.” If a player drops a queen and then a dozen moves later his or her opponent drops a queen, the result of such a game doesn’t seem to be based on any demonstration of chess skill.

In tests that have been run, the statistical assumptions regarding performance (ie, the expected performance function) appear reasonably valid for play against higher rated players all the way down to players rated around 200, using pre-event ratings.

In other words, a 200 player will do about as well against a 500 player as a 1000 player does against a 1300 player. The curve flattens somewhat for players rated above 1400, probably because there are fewer rapidly improving players above 1400, so fewer upset wins due to being underrated.

Below 400, the large number of players with ratings between 100 and 150 has an impact on the actual performance against lower rated players.

What type of tests?

Actually, most of the gap bewteen 185 and 50 was the difference of opponent’s ratings that they played in the provisional period.
Changing the original number to 250 or even 300 doesn’t change the argument much at all.

If you have seen the actual games, random is pretty close to what was happening.

Most recently, looking at players on 100 point intervals and how they do against players within 10 points of being 0, 100, 200 or 300 points higher or lower.

Have you tested the provisional ratings of adult players? I’ve see at least one horrible example that might have distorted not only the rating for that player, but opponents.

The USCF does not and should not change the ratings formula because of one or even a handful of anomalous cases.

There have been some problems noted with provisional ratings, especially ones involving players who win all their games in their first tournament or two. This can result in ratings that are obviously too high. The ratings committee has not yet made any formal recommendations about them, so what the office has been doing is handling them under the ED’s discretionary authority as such cases are brought to the attention of the ratings department.

You should notify the ratings department about the example you have in mind.

That was Mickey Adams losing 5.5-0.5 to Hydra in 2005.

Rybka won 5.5-2.5 against Jaan Ehlvest in 2007, giving odds of a different pawn each game.

I assume the a- and h-pawns were two of the Rybka wins, because of the open files.

Bill Smythe

Rybka won with the h-, g-, f-, and d-pawns missing, drew with the e-, b-, and a-pawns missing, and lost with the c-pawn missing. The order of the games started with the h-pawn and ended with the a-pawn, so perhaps the fact that it did much worse without the queenside pawns is more due to the fact that Ehlvest learned how to play better in these situations than to something inherent in the starting positions.

There is lots of information about the match at rybkaforum.net/cgi-bin/rybkaforu … pl?tid=519.

I’ve seen problems with provisional ratings on the low end. A player might play in his first tournament after playing online Chess. His chess skills are growing, but he has never had to keep a scoresheet or punch a clock. He goes to a local low cost tournament, held at a G/30 format, and does terrible because he can’t manage his time and is horribly distracted by the scoresheet. He starts out with an incredibly low rating. However, in his second tournament, he is not so flustered, and beats people ranked hundreds of points higher.

(This phenomenon led to two different 1000+ point upsets at my last tournament.)

I don’t think this is actually a “problem” that needs to be “fixed”. It’s just one of those cases where, with insufficient data, the algorithm makes a bad method of estimating future performance based on past performance. For the most part, the system works.

Re: directly addressing the OP, there is no limit to how high a rating can go using the ELO system, but there is a practical limit. It can only be a few hundred points above your nearest rivals, and then only if you consistently beat those rivals.