Call for Volunteers: Drafting XML standards

We don’t require reporting game scores (and 99% of the time TDs wouldn’t have them anyway), so the issue of whether or not to use PGN seems irrelevant to me. Both Tom Doan and Thad Suits had earlier responded that they felt XML was the way to go.

I really don’t care what format dates are in, because the code I use will parse almost any reasonably valid date format and turn it into the internal database format (yyyy-mm-dd) anyway, I just specified mm/dd/yyyy to make it clear that we want 4 digit years rather than 2 digit years, even though we’re long past most Y2K issues (for another 93 years, at least.)

Oddly enough, the FIDE rating report format (which is a fixed field size format, not XML) uses 4 digit years (yyyy/mm/dd) in some places and 2 digit years (yy/mm/dd) in others.

FIDE accepts PGN files of game scores (and a strict reading of the FIDE handbook almost implies that game scores must be submitted), but requires the rating report data itself be in their format.

You may not consider things like byes important, I assure you that our players do, because if we don’t have them correct we WILL hear about it.

I’ve made a couple of minor changes to the files, to show that the field can be included at the group level and to change , the player’s USCF rating (as used for that tournament), to an optional (though recommended) field.

Something to consider:

As currently written, the XML format does not deal with reporting things like double round robins, etc.

It would be fairly easy to add that by adding one or more fields, such as:

Result code for game 2
Result code for game 3
Result code for game 4
etc.

The field would need to have appropriate codes added to it so that I know whether to expect more than one result in each round.

I guess the assumption is that colors alternate from one game to the next, though based on a recent thread over in the TD forum that may not be a valid assumption, in which case we would probably also need:

Color assigned for game 2
Color assigned for game 3
Color assigned for game 4
etc.

Whether matches should then be reported in this fashion or as separate rounds is unclear.

Also, I’m not sure exactly how a RR report should work. Reporting
by rounds seems necessary even if the event is a RR, especially for FIDE rated events.)

While PGN has a way of specifying that a game between two named players was forfeited, it was not designed to handle byes because in a bye there aren’t two players. However, this should not be difficult to work around. A variety of ideas occur to me, but a bit of experimentation would be needed to see what would have the least impact on existing PGN parsers. Worst case, they could be handled by the ‘%’ escape mechanism.

USCF Rating and ELO of the players are part of the standard PGN tag set spelled out in the PGN spec. WhiteTeam and BlackTeam are not in the standard tag set, but are widely used. PGN parsers generally recognize (and at least ignore) tags that are syntactically correct but not in the semantically standard set (much like XML) In a similar way DirectorID/ArbiterID and Name is a simple addition, as are the USCFId’s of the players.

It really does not seem advisable to me to invent a completely new standard when PGN is the overwhelmingly dominant format and needs to be extended in only a minor and backwards-compatible way to cover the requirements.

Regarding what the SwissSys and WinTD authors said, did they independently volunteer that XML was what was required, or is that what you asked them? What other alternatives were suggested? Was PGN discussed?

Brian, there’s probably no reason why in the long term we cannot support both XML and PGN, but the pairing software developers asked that we give them an XML format, so that’s what the plan has been for many months.

MonRoi introduced the FIDE-certified, ECU-recommended, MonRoi system that also lets you upload and download any game in PGN.

If there was a seperate format for each industry, I’d be making a lot more money right now.

Just because PGN’s are used for chess doesn’t mean you should use them. The same is true if you’re talking about sending in credit card orders to a bank or title requests to the BMV, you want a common format to send these items so any computer can understand it.

If we were to race to get this done right now, XML would be faster and quicker to do. The tools exist for about any programming language, I’ve yet to see a reusable PGN api.

You keep saying this, and you imply that you are working with XML, but I really wonder whether this can be so. I have also been working with XML from nearly the time that it was invented, meaning more than ten years. And it is not a “common format” that any “computer can understand” which you claim. What is your experience with XML?

XML specifies an extensible markup syntax. It allows data to be marked up with tags in a way which can be parsed by any XML parser. It is in the SGML/HTML line of technologies. But parsing an XML data stream is not the same as “understanding” it. For that, there must be a detailed specification of the tags to be used, their semantics and higher-level structure, as well as the syntax of the tag attributes and the marked up data itself. There are literally hundreds of XML-based standards and specifications. Some of these specs are thousands of pages long, and extremely detailed and complex. XML is only the starting point. You make it sound like all somebody has to do is define a bunch of XML tags and decide whether they should be upper or lower case and the work is over.

It is true that there is a variety of tools, besides parsers, that make working with XML easier than arbitrary formats. For example, there are libraries that will convert simple XML structures into rows in database tables, and vice-versa.

But these tools don’t replace the need to write code, as you imply, and abandoning a widely used industry-standard format, such as PGN, simply because it isn’t XML, is not smart.

The real race, David, is to see who can get a non-trivial real-world event (say 100 players) ready for uploading in the XML format, and whether they can do so before I’m ready to try parsing them. :slight_smile:

I spent 10 years on BISAC, the book industry committee responsible for maintenance of the ASC X12 EDI standard for that industry, a standard is only a starting point.

Brian,

I honestly don’t see any reason to use the PGN format for this. The PGN format was designed to convey game information, not tournament information. To revise this standard to convey tournament information would be hideous and extremely time-consuming at best. If we used this standard, you would have to always have the game annotations, and in order to extract the results of a tournament, you would have to write custom parsers and go thru each pgn file. PGN was not designed for this, XML is. It should not be suprising that nearly every modern application uses XML for this.

Brian, instead of arguing the merits of pgn, may I suggest creating your own pgn standard and developing a web interface to translate the adopted XML standard into your own PGN standard?

Gregory

Is there really a point in arguing this? I’ve been a consultant for almost 9 years. I’ve written my own chess software, I’ve parsed PGN’s, I’ve used XML in hospitals, insurance, education, and any new technologies that are coming out are mostly XML based.

Agree to disagree, who cares…

Lets get to one format, we had this discussion for a month. If you want to compare pocket protectors, have fun.

Within the next few days I expect to publish on the website XML examples and data dictionaries for:

  1. Submitting USCF Memberships

  2. Rating Supplement Files

(I’ll probably write a program to generate Rating Supplement Files using the XML standard first, since I already have a program I can use for that. Also, I’m curious to see how big a Gold Master file is in XML.)

BTW, I mentioned the idea of FIDE using XML for their reporting standard (as opposed to their latest fixed field version) when I was at the FIDE Technical Commission meeting in Torino last June, it was not well-received.

It’s good to see that this is moving forward. I think this is a big step into integration and scalability for the organization!

If you need any testing on this let me know, I should have some of my old stuff around.

As we speak; I am developing a web interface to get the results from SwissSys and WinTd wall charts, web-pages, and excel documents into your XML format. If you post the interface to post the results, I will write a webservice for public use that will allow others to use the page to post the results.

I hope to win this race :wink:

Thank-you Mike!

Gregory

Great. Let me know if you come up with any holes in the format, Gregory.

Will do Mike.

Suprisingly, we made a very similiar format. I see a few things missing from your list, as well as mine.

Some missing attributes to consider are:

team
school
eventInfo
site
tournamentSoftware
tournamentUrl
tournamentSourceDocument

I will update you on my progress and can help to build utilities and place them on the college chess site.

Take care,

Gregory

team
school
eventInfo
site
tournamentSoftware
tournamentUrl
tournamentSourceDocument

I have ‘Team ID’, ‘Team Name’ and ‘Team Info’ fields, I have ‘City’, ‘State’, ‘Zip’ and ‘Country’ Fields, and I have a “Program Version” field, so the only stuff that isn’t already covered are your EventInfo, URL and Source Document fields. I’m not sure how much more event information is needed, and I don’t see the other two as having much value in the rating/MSA system. (I’m pretty sure we do NOT want to include external URLs, for security reasons, since we cannot verify what they point to.)

BTW, I don’t know that a webservice is going to be much use, since uploading of events is done behind the TD/A user login.

Contrary to what others have stated here, the format should be able to accomodate actual game scores. This is another reason why the format should be based on PGN.

While in the traditional weekend OTB tournament, the game scores are not captured, they are in higher-level “pro” events which attract an audience on the internet. New technologies, such as the Monroi and the DGT board, are being introduced to facilitate the capture of the game scores. Moreover, it is short-sighted to consider OTB tournaments as the only source of tournament games which the USCF might wish to rate or capture in its database. What about Internet-based events, in which game scores are easily and routinely captured? Events on both real-time and correspondence style servers are increasingly popular. The number of games being played on these servers already vastly outnumbers OTB games. This is a format for OTB tournament results, but isn’t that a declining part of the chess “market”? We don’t have to worry only about WinTD and SwissSys. What about FICS, or ICC?

One can also foresee the need for both “live” or in-progress results from a tournament and “final” results. Wouldn’t we like to stream information out of in progress tournaments and display both game scores (including in-progress games) and current standings from the tournaments? What about pairings for yet-to-be-played rounds? In many tournament formats, such as round-robins, the pairings for the entire tournament are known before the first round.

There should not be one format for capturing a set of game scores and another format for capturing “tournament” information. There should not be one format for reporting “live” or in progress results and final results. There should not be one format for capturing OTB events, another for correspondence events, and yet another for real-time Internet events.

Some of the posters in this thread have insisted that “tournament” information and “game” information are fundamentally different somehow and that PGN would be a “hideous” mismatch with the requirements. They have not given any technical arguments for this insistence, perhaps hoping that their vehemence, sarcasm, and condescension towards any other point of view will carry the day. (A rather typical rhetorical move on this forum, and amongst technical people generally.)

But what is the profound difference between “tournament results” and a set of related game results? What is a tournament if not a set of related games? There are indeed some tournament level attributes, such as the name and USCFId of the TD, but there is much to be gained from propagating these values into each game, since each game in a file containing multiple game results can then potentially stand alone and be split out, stored independently in a database like Chessbase, or processed independently without any dependencies on external data sources containing the “tournament-level” information. Thus, putting the Arbiter or TD’s name into each game does not bother me. Indeed, it seems to me to have advantages. In database terms, this PGN style approach results in data being denormalized in the feed. The name of the Chief TD doesn’t change during the tournament but with the game-centric approach it will occur multiple times in the feed, in each game. But there is no harm from that in a computer-generated data export, and some advantages. If we have to worry about file size (which is unlikely) it can be taken care of by compression, and anyway, XML isn’t a great choice either if byte-efficient transmission is an issue.

The only problem I see in the PGN approach of propagating most of the tournament level information into each game is that there are some fields that are truly tournament level and which can only be awkwardly propagated to the game level. For example, there are the final tournament standings of the players (with the tie-break calculations), prizes, etc. It seems fairly obvious that you would not want the information about all the winners of the tournament being carted around in each game (though the pre- and post-game tournament standings of the two players in a game does seem like very interesting per-game information and could be represented without problem in a PGN format). Another example is team standings. While the Team names of the players can be propagated to the games, the standings of all the teams in a team-oriented tournament is not information that you want to see each game carting around.

However, I don’t see anything about tie-break information in the proposed XML format, or indeed very many fields at all that could not reasonably be propagated to the game level, and of those very few fields, some of them seem like frills.

These fields that are truly at the tournament level and which can’t be carted around in the games are problems for the PGN approach. But the advantages of a single format capable of representing game scores and “tournament” information argues for me in favor of propagating everything reasonable to a game PGN-formatted game level, with the truly tournament level information handled through the PGN escaping mechanism.

I don’t see tiebreaks as having a lot of positive value (and more than a few drawbacks) what do others think?

It seems like you’re willing to acknowledge that PGN is NOT a hierarchical data structure, but don’t agree that much of the information we need to rate and report on an event is hierarchical in nature.

Also, the format standard is an issue that was debated extensively on and off the Forums some months ago (and has been debated previously elsewhere.)

For now, XML is the standard that is being implemented.

I’ve already heard back from Thad Suits, he didn’t see any serious problems in the sample format and dictionary. I had a discussion with Tom Doan about XML a couple of months ago, I doubt he’ll have any major issues with it either.

Out of curiosity, what fields do you see as being frills?

Hi Brian,

Your paragraph is a stretch and I think that you are getting carried away. I stated ‘hideous’ in relation to trying to programmatically query pgn fields in every pgn document in order to get the entire picture of a tournament. It was not meant sarcastically, but as a statement of what I consider a fact. It would be a hideous query. Furthermore; I would have to develop custom parsers for each query type.

Again, I believe that it is more productive to develop something and share it instead of re-hashing this debate.

Regards,

Gregory

Tiebreaks are chosen arbitrarily by the user from a fairly long list. Would your output be set to display any combination of them in any order? If it displayed all of them, it wouldn’t be telling you anything useful about the tournament. If it displayed none of them, what’s the point?