If the Ratings Committee wants to run tests, it may already have the data it needs, it can request that data, or it can request that someone else perform those tests.
I’m not doing anything more on this issue unless directed to do so by the office. I’m trying to reduce my post-retirement involvement with US Chess, not increase it.
Aside from the staleness issue, I don’t see any problem with differences between our 6 ratings systems or with FIDE ratings.
We do occasionally deal with staleness issues, primarily foreign players who have a stale US Chess rating from years ago. In such cases, we generally manually reset the player’s rating to a higher rating based on the FIDE rating. We have a FIDE-to-USChess conversion formula which is use for that.
I think P0 is a TERRIBLE way to try to indicate that information, and its impact upon ratings calculations is probably different than what you think it is, as it would cause them to use the special ratings formula until they got back to having 9 or more games in their pre-event rating.
It also resets a player’s status to provisionally rated, which might affect their ability to play in certain types of events.
A rating of 1550S (to indicate stale) might be better, but I don’t think it really does much, any TD who wants to check on how stale someone’s rating is can already look that up on MSA, and it doesn’t indicate what, if anything, is being done to make the player’s rating non-stale.
I thought that, when a player with an established regular rating plays in his first quick tournament, his pre-event rating was initialized to his regular rating P/10 (provisional based on 10 games). That would mean that the special formula would never be used for that player.
That same policy could be implemented for a player listed as P/0 due to staleness.
I wouldn’t expect you to. Right now I wouldn’t expect anybody else to do so, either, in these pandemic times that are also causing financial problems. But it can’t hurt to talk about it and exchange ideas, for future reference.
There’s currently no way to tell P0 due to staleness from P0 due to other reasons. And setting to P10 still means the player is considered provisionally rated even if there is a history of hundreds of games in that ratings system. I don’t think people will like that.
This is all true. So, maybe the P number should be divorced from the staleness flag in some way. Perhaps your idea of adding “s” to the rating (when stale) would do at least part of the job.
For that matter, there could be a new way to calculate the P number for quick ratings. For a player in his first quick tournament, his quick P number could be initialized to his regular P number or 26, whichever is less. For each of that player’s subsequent quick tournaments, the P number would (of course) be increased by the number of games played in that tournament.
A player with a stale rating could be allowed to keep his old P number when he returns to the scene. The player’s “staleness factor” could still be used, as in my original post, to determine the extent to which his regular and/or quick rating would be used to adjust his pre-event rating at the start of the rating process.
Has anyone considered that the root cause of the staleness of the quick ratings is simply that it never grew to the activity levels of what we consider regular ratings? I would assume that postal would have that similar issue but I don’t see people complaining that those ratings aren’t in sync.
Correspondence players are a different breed, they seem to have more patience than most members. They’re also a small but fairly stable population, unlike the OTB population, where we have 25K or so new members come in each year, most of them not sticking around more than a year or two. Whether the online chess population is a stable crowd or not isn’t clear yet, and how many of the players active in online chess during the shutdown will stay active in it once OTB chess becomes more available is another unknown.
Here’s a table showing the number of unique IDs in the various ratings systems by calendar year:
[code]rs 2020 2019 2018 2017
R 41286 77623 76538 76742
Q 35576 69327 68424 68065
4 4897 782 713 595
B 2344 8105 8230 7711
5 2289 972 863 698
S 667 0 0 0
C 332 449 446 450
[/code]
I think a very noticeable case is the youth player who plays at fast time controls, and later, when much stronger, plays at long time controls, resulting in 600 point gaps between the Regular rating which shows the player’s improvement, and the Quick rating, which does not.
Whether this is actually the primary or most important case I don’t know, but because of the extreme differences apparent under the situation, it is highly memorable.
It is worth noting that even with huge increases in the number of online players this year, the number of online-quick players is only about 12% of the number of OTB-regular players this year, and 6% of what we saw in calendar 2019.
Yes. The rating system intentionally has an upward bias to deal with improving players. Fewer games means less uptick. Larger bonuses (from larger absolute changes) partially, but not fully, correct for that.
A rating system for correspondence games would not need that level of bias built it because the players are generally players who are more mature both in age and in playing strength.
The bonus system does an OK (but IMHO not great) job keeping up with rapidly improving players who remain active in that ratings system, but it appears to take anywhere from 25 to 50 games to deal with a major ratings gap (say, 400 points) due to staleness. And we’ve seen staleness gaps much larger than that.
Correcting staleness gaps seems to be to be a two stage process. The first stage is identification. The second stage is dealing with closing the gap as quickly and accurately as possible.
In many cases, we have data from other ratings systems, including FIDE, that would help us identify staleness gaps in specific ratings systems, but we don’t always have that to help us. It might be possible to infer a staleness gap retroactively based on multiple very strong performances in one or more ratings systems. I know Mark Glickman has some thoughts on that matter, but it is basically a data analysis task.
There are a number of possible alternatives to the second stage, but that’s not my area of expertise.