Sample XML file for ratings supplements

Here’s my first cut on a sample XML file for rating supplements. I’m posting this one here as opposed to putting it on the web for now.

I haven’t changed the ‘Quick’ system to something else yet, since there isn’t any agreement as to what to call it yet.

I haven’t written a data dictionary file for this one yet, I think I’ll wait until it’s been hacked at a bit before I try to write that. :slight_smile:

Some other notes:

I’ve added a field for the Title/Norm concept that was approved in 2003 and will be implemented later this fall. This would show the highest norm-based title that someone has earned, but it would not show his status towards higher titles because that’s not relevant to the ratings supplement. (We may show that information on MSA at some point, though.)

I’ve also included fields for someone’s floor and their peak rating, though we only have peak ratings information back to late 1991.

I’ve included a field for someone’s FIDE ID but not for someone’s current FIDE rating, country of registry or FIDE title, because those can change and people really should get that information directly from FIDE.

I’m not showing OLM status, is that important in a supplement? (Technically I suppose someone could figure it out in most cases since OLMs have 2200 floors, but due to a variety of policy shifts over the years some people have 2200 floors without necessarily having played 300 games as a master, and I don’t have the authority to change those floors.)

I’ve added a field to show when someone’s USCF status is something other than active . That’s also different from their expiration date. When someone’s status is not active, that will always generate a validation error, even in a membership-exempt event.

In the specific case of an ID that is inactive because that person has more than one USCF ID, for the duplicate IDs I have a field for the correct USCF ID to use, but I am NOT showing any rating or expiration date for the duplicate IDs. (I think it’s better to include those IDs and indicate that they are not to be used, and why, than to just leave them off the list.)

Would it make more sense to deconstruct the name fields into 4 parts, something like:
and

There may be typos in this, I haven’t tried running it through an XML parser yet:

1.0 09/01/2007 08/03/2007 11111111 Smith, Joe NE 2008/05/31 1602 15 22222222 Brown, Max NE 2008/05/31 1944 1800 A 1944 1800 1967 22222222 Duplicate 22222221 Brown, Max NE 22222223 Deceased Marks, Jack NE 2008/05/31 1944 1800 22222224 Inactive Owens, Ricky NE 2006/05/01 1801 1800 22222225 111111 Pawn, Gerald NE Life 2415 2200

To start with a trivial question, why is the main element called a “supplement”? It is a rhetorical question because I know the answer, which is based on the history of the various rating publications that the USCF has released.

Maybe you, me, and lots of TD’s know what a “supplement” is, but actually this is a feed containing information about USCF-rated chessplayers.

Some more questions:

  1. Why is there a quickPeak but no regPeak?
  2. Why does Joe Smith have regRatingGames (of 15) but none of the other
    players have this? Is that a provisional rating?
  3. Why does quickRatingGames not appear anywhere?
  4. What does titleNorm (of A) mean?

It seems to me that you are defining the format on the assumption that someone will only ever have at most three ratings, a “regular” Rating, a “quick” Rating, and a “FIDE” Rating. There could be more. ICC has a half-dozen ratings for everybody for various time controls and different variants of chess. People have proposed a “scholastic” rating. Speaking of ICC, people have proposed an “internet” rating. The CCA maintains its own rating list. The USCF might become an aggregator for various ratings from other organizations besides its own.

In short, a USCF member could have a lot of different ratings. It might be good to design the format with the possibility of more types of ratings in mind.

For example, something like

<player>
<name>Brown, Max</name>
<state>NE</state>
<uscfExpDate>2008/05/31</uscfExpDate>
<uscfID>22222222</uscfID>
<rating org="USCF" type="Quick" current=2030" peak="2118" floor="2000" />
<rating org="USCF" type="Regular" current="1944"  peak="1980" floor="1800"  />
<rating org="USCF" type="Scholastic" current="650" peak="700" floor="400"  />
<rating org="CCA"  current="2420" />
<rating org="FIDE" id="12345"  current="2430" title="IM" />
<rating org="ICC" id="whiteknight" type="Regular" current="2500" />
</player>

The word ‘supplement’ may not mean anything to you, but that’s one word people will recognize, there are others.

This was not intended to be a complete and comprehensive example, just one to show the various fields, especially to someone familiar with the current formats (there are two of them, not quite compatible with each other.)

If there is a there should also be a , I just didn’t show one.

You are correct that the game count was only shown for a provisional rating, there are other ways that status could be indicated.

As to what the norm-based titles are, I refer you to the document defining them:

math.bu.edu/people/mg/ratings/titles2003.pdf

I think it would be inappropriate (and possibly a violation of intellectual property rights) for a list issued by the USCF to include ratings issued by some other organization. There’s also the issue of knowing whether those ratings are current or even accurate.

I wasn’t advocating that the USCF should aggregate ratings without permission, and this was an example of how the need to distribute more than three ratings could arise. I gave other examples. I was trying to anticipate future requirements.

The thing about XML is that element and attribute names are fixed by the schema and these tend to get put into code. However element text and attribute values are not necessarily defined in the schema, although of course they may have semantics defined in a specification.

Thus,

<entity>
<field1>value1</field1>
<field2>value2</field2>
</entity>

is not as flexible as

<entity>
<field name="1">value1</field>
<field name="2">value2</field>
</entity>

With the second approach, it is no problem to add another “field”. With the first approach, it would require an upgrade to the schema because field names are represented by XML elements, and thus new tags would have to be added to the schema as new subelements of “entity”. This would likely require changes to the parsers, too, depending on how they had been written.

So, if we are going to distribute up to three ratings (Regular, Quick, and FIDE) per USCF member, and there is the possibility that it might ever be more than these three, or a different three, or perhaps a transition period to a different three (for example, a new interpretation of Quick), where one might want to provide both the old and new values, then it would be good to have the more flexible approach.

I have NO PLANS to put FIDE ratings in the USCF supplement files, for the same reasons I stated earlier: permission and timing/accuracy issues.

FIDE issues their lists just before the 1st of the month they take effect (January, April, July, October), the USCF now issues its monthly lists 3-4 weeks before they take effect. So when we issue the April 2008 USCF list in early March FIDE will not have their corresponding April list available until several weeks later.

Moreover, since the July 2007 FIDE Ratings List came out, there have been 2 or 3 updates to it, mostly to deal with corrections and missing data. (Though the last time I checked they STILL hadn’t gotten Darwin Yang’s FM title noted on the July FIDE List, though he earned it nearly two years ago.)

We can show the FIDE ID because it shouldn’t change–much.

I’m guessing that if a player’s rating is still provisional then a peak would not be shown, and any peak actually shown would be for a non-provisional rating. Otherwise you run the risk of a provisional peak of 1950 (after a first-time tournament when beating players rated 700, 1000, 1100 and 1550 after the 1550 blundered), a current rating of 1325 and a floor of 100. People would be asking why the floor isn’t 1700.

Showing the number of provisional games rather than simply the status helps if somebody wants to do a calculation of the post-event rating.

Having a provision for multiple USCF ratings might be useful. The pairing programs that pull in supplement data would need to be changed to handle that. I wonder if an additional small XML document would need to be sent that just listed all of the various ratings so that the pairing programs could look at that to see which ones were available.
Some possible USCF ratings would be: Regular, Quick+Plus, Blitz, On-line, Correspondence.

Would suspended be one of the statuses? Or would that be one of the possible reasons for an inactive status?

I noticed that there is a “supplementEffectiveDate” in the definition, but I didn’t realize that this could be a date in the future until you made the comment above. Why is there a lapse between the computation date/issue date and the effective date? Why isn’t the supplement effective right away? Aren’t the ratings in the supplement the latest and greatest when they are computed and issued?

In the period after the supplement is “issued” but before it is “effective”, I gather that TD’s are expected to use the rating from the previous supplement. Why? As I understand it, at any if the current supplement is issued in Week 1, it becomes effective in Week 4, not too long after which, say Week 5, the next supplement comes out, So in Weeks 5 through Week 9, when the next supplement finally becomes “effective”, TD’s are required to use ratings computed in Week 1 (up to two months old), even though they have in their hands ratings that are only between 1 and 4 weeks old. Why? This doesn’t make sense to me. Can you explain it please?

You’re not being realistic. In the first place, there is the matter of letting the player know what rating will be used. To take an extreme case, if a player spends a thousand dollars flying to the World Open, only to discover that his rating has gone up ten points overnight, kicking him up to the next section – well, the results won’t be pretty. Also, how does the TD obtain the “new” rating? He has to download and install the update, or perhaps even obtain a hard copy of the list. Most TDs do not live on line.

TDs do have the option of using “on-line current” ratings if they have Internet access at the tournament, but this must be announced in advance. This is almost a textbook case of a “major variation that might affect a player’s decision to enter.”

Well I don’t know how unrealistic I am being. Suppose a supplement became effective immediately upon issue. You’ve raised two objections. The first is that this would interfere with sandbagging. A player who has recently made Class A, or might very soon, will enter a tournament if he knows he will be put in Class B, but won’t if the new supplement is used and he doesn’t know whether he might already be in Class A in that supplement. My answer to this is: too bad. As you say, TD’s can already use the current rating from the web site if this is “announced” in advance. So the USCF “announces” on behalf of all TD’s that the most current rating available should be used in all USCF tournaments, and players can rely on their ingenuity to figure out how to sandbag without any cooperation from the USCF or TD’s. Players interested in sandbagging will have to keep track of their ratings on the web site and predict what will be in the current rating supplement on the day of the tournament. If they predict incorrectly, that is, again, too bad for them. Cry me a river. As you can tell, I don’t have much tolerance for these kind of calculations and if using the most current rating on the day of the tournament interferes with them and makes it a bit unpredictable as to whether a player’s attempt to sandbag will be successful, I say: fine.

Your other objection is that TD’s need time to download the latest list. Downloading the list takes minutes at most. One of the duties of the TD before a tournament is to download the current list a day or two before the tournament. This doesn’t seem onerous. Certainly they don’t need to be given 4 weeks to get around to it.

We have this whole elaborate, presumably accurate, rating system designed to facilitate fair pairings, and then we come up with reasons why we should use two months out-of-date ratings when the current ratings are instantly available. Sheesh.

I asked above what A means, and you referred me to the title document. As it happens, I have previously read that document, andI did recognize that the titleNorm tag had something to do with this unimplemented title spec.

However, I still have the question. To be more precise: does A mean that Max Brown has obtained all the norms for Class A, and is therefore a “titled” A player, or does it mean that Max has one norm towards being a titled A player. If the former, the tag seems a little misnamed and should perhaps be just . If the latter, how would it appear if Max had two norms towards A, or if he had norms towards several different titles? How would it appear when Max had actually obtained the title?

There can be an argument that a two month span between a tournament result and its applicability to a current rating might actually be too short.

If you are scheduling participation in a major tournament with the expectation of playing in a specific section then you would need to skip all tournaments that might affect your rating prior to the tournament. Since you might need to book a flight and reserve a hotel room multiple months in advance that means that you may be skipping a number of local tournaments. If there is a speeding up of how quickly rating changes become official then there will be a longer period of time that local tournament participation can be impacted by players unwilling to wreck their plans to attend a major event.

Also, there are TDs that are not internet savvy. Some do not use pairing programs. Considering that doing an on-line results upload helps with the speed of getting tournaments rated and problems resolved, and also gets a discount on the rating fee, the percentage of tournaments that are still sent in via the U.S. Snail indicates that there are a lot of TDs that do not go on line every week to update their rating lists.

Occasionally there is an error discovered in a rating supplement. The delay between the supplement first being available and when it becomes official gives time for such errors to be corrected before they are used to adversely impact a player’s tournament.

All things considered, the one to two month lag between a tournament being played and affecting a supplement seems fine. By the way, it used to be one to three months when there were only bi-monthly supplements.
If we ever move to semi-monthly supplements then I could see a half-month to one month delay being reasonable.

  1. The question isn’t sandbagging, it’s what the players will tolerate. Your argument is really against the existence of large class prizes. I have some sympathy with this point of view, but the chance it being adopted in the foreseeable future is just about zero. And it’s pretty much irrelevant anyway, since you will get the same complaints (though perhaps not as loudly) in a scholastic tournament with trophy prizes.

  2. Even if I granted that instantly downloading new ratings was the TD’s “duty” (which I don’t), you are overlooking the still significant number of TDs who don’t use a computer, don’t use the Internet, or still use dial-up connections. TDs are mostly volunteers, not employees supervised by your IT department.

  3. You’re confusing precision with accuracy. Short-term fluctuations in a player’s rating mean almost nothing (in most cases they represent statistical noise). Rating differences have little or no predictive value below 100 points. Some weeknight tournaments last a month or more, so you can easily have a “current rating” which does not include the player’s last ten or twenty games. All your proposal would do is select a different (and less convenient) arbitrary snapshot point.

  4. You seem to be making the implicit assumption that tournaments and the rating system can be treated as mathematical abstractions. They’re not. Things that operate in the real world are messy.

What you are arguing is that large class prizes and Goichberg-style tournaments have so corrupted things that TD’s are compelled to make up sections that are 100 points, or so, different than they appear to be. That “Under 1800” section actually runs from 1650-1700 to around 1850-1900. The players over 1800 in the “Under 1800” section are there by virtue of the TD using a two months out-of-date rating list, which (courtesy of the USCF) is deemed to be the current list. And, of course, the players under 1700 don’t enter because they don’t want to spend big bucks if they don’t seem to have a good chance to win. Any 1700 or under player who does enter will be there only because he thinks he has some angle, and that his real rating is higher. There is no shortage of Class B competition and if all the players were interested in was playing Class B chess, they could do that at a local club for a dollar or two per game or on the Internet for nothing. So take away the lure of the big prize, and the illusion that they have some angle to win it, and they aren’t interested in entering a tournament. The people with the ratings above the section limit think they have gained an edge and are enticed to fork over the high entry fees. Actually they are suckers because everybody in the section thinks he has some kind of edge, and the wise guy with the 1850 or 1900 rating getting to enter the “Under 1800” section will actually be competing against a bunch of other wise guys – a couple of whom undoubtedly wiser than he and in the section by virtue of even shadier ploys than arbitraging the effective dates of ratings.

And you are saying that this effective rating date arbitraging is the reality, and one of the engines behind tournament entries. TD’s and the USCF have to go along with it, you say, or else they won’t have any players. Indeed they have to enable it, by continuing to maintain a two month lag between actual ratings and “official” ratings – even though technically the USCF computes new ratings every night and posts them on its website continuously.

But, of course, nothing stops a TD from making up sections in his tournaments any way he wants. He could base them on player weight if he wanted to, and organize a fat section and a thin section. So, even if the USCF issued a new list on its web site every week and declared it effective immediately, a TD could declare that the list current on the day entries open to be the one that will be used for determining classes. Qualification for the Under-1800 section for a September 1 tournament could be based on having a rating Under 1800 on August 1, when entries open, per the list current on August 1. This would meet the need for the players to “plan” which class they will be in. I don’t see why it is necessary for the USCF to play the “effective date” game. But I guess the TD’s want some cover from the USCF when they use an out-of-date list.

Regarding the “duty” to instantly download the new list: I didn’t say that. The only duty is if you are conducting a tournament to use the current list. You don’t instantly have to download every new list. For each tournament, you have to download the current list a reasonable time (like a day or two) before the tournament. In fact, you don’t really have to do that. As a TD, you can use any ratings you want, or none at all, to pair a tournament, as long as you announce what you are doing. Of course you might get a little heat from the players if you are using two months out-of-date ratings, because the USCF will no longer be giving you any excuse by delaying the “effectiveness” of the newer ratings.

The XML rating list format will not change existing USCF policy, so debating any policy changes in a thread dealing with the format of the list is likely to be ineffective and detract from the primary issues. (It appears to me that less than half of the comments in this thread have to do with how the data is to be formatted.)

Moreover, any proposed policy changes will take months if not years to get approved and implemented.

The change from bi-monthly lists to monthly lists was not without controversy and confusion and is still generating policy ripples.

For example, the Scholastic Council has been looking at what ratings lists to use for Nationals, I believe they’re going with something like the following:

If the tournament begins on or after the 10th day of the month, that month’s ratings list will be used, otherwise the previous month’s list will be used.

This is to give coaches a month (or so) from the point at which the USCF issues the list that will be used at that event to put their teams together based on the ratings that they know will be used, giving them time to make travel plans, etc. This may also affect what section the team is in.

I agree that peaks and floors should only apply to established ratings. Similarly, game counts only apply to provisional ratings.

There is some value to knowing the game count for a provisional rating beyond attempting to compute someone’s updated rating. For example, a rating based on 4 games is not as reliable as one based on 24 games, and I have seen events where a player is not permitted to enter an ‘under’ section with a rating based on a small number of games but is required to play up.

We will probably have the capability of issuing (at least) 4 types of ratings lists, as shown below. The first two of which would be created by the USCF office and made available for download from the website, the other two would be generated on demand only.

I’m not sure of the delivery mechanism for on-demand lists yet. (I’ve seen it take as long as a half hour to generate an on-demand list, depending on the time of day and system workload, most browsers would time out before then and I suspect most TDs would run out of patience waiting that long anyway.)

We could send it by email (as we do for custom lists today), but some custom lists in this format could be 20 MB or larger, and that might be too big for some email providers. I suppose we could store the list on the website for some time period (say, 3 days) and send the person who requested it a URL to download it. Since this would (probably) be behind the TD/Affiliate Support area login, I don’t see that causing problems with web crawlers and web archive sites.

The list types are:

  1. A monthly supplement update

  2. A Gold Master comprehensive list

  3. An update to a Gold Master list, giving only those players for whom any information has changed since a specified date.

This would be so a TD could get a shorter file that updates his database rather than download and process the latest Gold Master file. Because it would be generated on demand, it would also be somewhat more current than the latest Gold Master file, ie it would show memberships added or renewed since that Gold Master was generated.

  1. A ‘custom’ list. (This might, for example, list only the players in one state and would eventually replace the custom ratings list feature that has been available through the TD/Affiliate Area for the past two years or so.)

Here’s one way of indicating in the headers of a list what type of list it is:

Here’s a monthly supplement list:

1.0 USCF Monthly 09/01/2007 08/05/2007 00:05:15

Here’s a Gold Master list:

1.0 USCF Complete 09/01/2007 08/05/2007 00:05:15 ......

Here’s an update to a Gold Master list, in this case it would only includes those players whose information has changed since July 15 2007:

1.0 USCF Update 09/01/2007 08/12/2007 14:21:30 07/15/2007 ......

Here’s a custom generated list:

1.0
USCF
Custom
09/01/2007
08/12/2007 14:21:30
list generation criteria go here

Notes:

The element would always show the date and time when the file was generated.

I’ve dropped the element (for now), in part because I’m not sure it has any relevance or value and we may not always have records to generate that information anyway.

The element would always show how current the ratings shown are, eg, does this list go through the August 2007 list or the September 2007 list?

The element name seems somewhat incorrect at this point, but I’m not sure what’s better.

I wrote:

You then asked:

It seems to me like the answer to your question was in my earlier statement.

Sorry, I overlooked your previous statement. My question was answered. I suggest that you change the tag to .

I think the tag by itself might be confusing, because for years we have referred to ‘Expert’ and ‘Master’ ratings as titles, as well as ‘Original Life Master’. There are also commonly used and well known FIDE titles, even if we are not currently planning to include them in this document.

My goal in using was to make it clear that this was a norm-based title that was earned in accordance with that 2003 white paper.

Even though they aren’t being computed yet, we may want to include tags in the standard. These titles are earned based on playing the requisite number of games at the designated rating and are an extension of the (Original) Life Master concept.