Call for Volunteers: Drafting XML standards

One problem with the dash is that it needs to be wrapped carefully in the calling program, and there are dash problems in XML that often causes errors.

Type ‘xml em dash’ in google, xml dash problem.
google.com/search?hl=en&q=xml+dash+problem
xml hyphen problem
google.com/search?hl=en&q=xml+hyphen+problem
etc.

It is your call Mike, we can find solutions to get around the dash problems, but I second avoiding dashes and use underscores instead.

I think that I have approached my posting limit.

the mdash doesn’t matter in an XML document. That’s usually caused when copying/pasting word stuff.

Posting limits are one of the reasons why we probably need to move this discussion off the forums for a while.

Unless someone has a major objection, let’s use underscores as a separator in element names.

fine with me, my email is dbfrey (at) gmail.com

Microsoft SQL Server Analysis Services has published the XML conventions they use for interfaces, and I heartily recommend them:

msdn2.microsoft.com/en-us/library/ms128591.aspx

For element names they recommend Pascal casing, so the illegible DEPUTYFIDEARBITER would become

DeputyFideArbiter

IMHO, that is by far the easiest format to read–certainly easier than DEPUTY_FIDE_ARBITER, simply because a word with mixed case is easier to read than a word with all caps.

My $.02

Chris, the speed reading people will tell you that the absence of any kind of break in the letters makes it VERY hard to skim text looking for words. An underscore or dash lets the eye pick up FIDE much faster in DEPUTY_FIDE_ARBITER than in DeputyFideArbiter.

Once you’ve thrown in the underscores, also using capitalization doesn’t really improve things much: Deputy_Fide_Arbiter, especially since in this case FIDE is an acryonym, not a common usage word, and should probably be all caps anyway.

Unfortunately (and stupidly, IMHO) element names are case-sensitive. Thankfully the SQL standard folks got that right.

I realize that’s not the Microsoft Way, but I’ll match my error rate in coding against that of most programmers at Microsoft. :slight_smile:

(BTW, back around 1995 I was mildly famous among some of the programmers at Redmond because just before Windows 95 came out I found a bug in the most used program in Microsoft history–Windows Solitaire.)

I personally use this convention in all my coding.

This issue is not that important to me as it is a matter of personal preference. One person that I completely respect pulled me aside one day and stated don’t get mixed up in ‘programming religious’ wars. he stated that each person will advocate a particular style, programming language, platform, etc, that works for them. However, as long as the system works and you are following logical guidelines, be flexible, and don’t get hung up in the details. This person is by far the most intelligent person that I have ever met, and he is one of the top pioneers in Genome Sciences, and I always try to heed his wise advice. There are many ‘right’ ways to go about things, and it is the end result, not the path to get there, that matters.

Mike will be maintaining this code, so what ever decision he makes is fine by me. Just don’t use the dash :wink:
Take care,

Gregory

Any progress?

How did the XML document project work out?

I’ve just made two files available on the USCF website:

uschess.org/TD_Affil/sample_tournament.xml

This is a sample rating report in XML format.

uschess.org/TD_Affil/xml_dictionary.txt

This is a data dictionary describing the fields in the XML format, most of which are used in the sample.

This is not a full-blown XML standard, but should be a fairly complete illustration of a proposed way to submit tournament results in XML.

Hopefully I haven’t forgotten any close field tags, the simple XML parser I ran it through doesn’t complain about any.

The event doesn’t make a lot of sense (it’s just a 4 player event), but I was more concerned with showing where and how the various elements are used than coming up with a real-world event.

I’ve added a number of fields that are needed by the USCF or by FIDE that aren’t in the current upload format (and removed a few that are no longer needed), and I’ve indicated in the data dictionary file which fields (and field groups) I think should be mandatory.

I’ve also included fields to report prizes won (for players as well as for teams), plus a field for reporting class prize floors that need to be assigned.

I’ve included some fields to report information about teams, with support for both board-order teams, like the Amateur Teams, and for team/individual events, like the National Scholastics.

Comments and suggestions appreciated.

If no serious problems are present, I hope to start work on a parser this month as part of the rewrite of the TD/Affiliate Support Area, which I hope to have completed by the end of September.

I’ve also sent this message to Tom Doan (author of WinTD) and Thad Suits (author of SwisSys.)

Mike, is it a given that tournament data will be submitted in an XML format? Why not PGN? That way, a TD/pairing program that had the ability to capture and dump the games in PGN would meet the USCF feed requirements with no additional work, provided the defined header tags were included.

The file could be PGN headers only. A tournament management system capable of uploading games to a live feed site would support the USCF feed format with no additional programming, provided the USCF feed were PGN.

I know XML-encoded data interchange is all the rage, but chess already has two widely accepted formats for interchanging game data, PGN and Chessbase. Of these, PGN is less proprietary, and is very widespread.
Why not use it, rather than inventing a new format, even if the new format is XML-encoded?

Incidentally, the software most commonly used for rating computer chess tournments, such as BayesElo and elostat, already uses PGN files for the input of game results, with the move lists being optional. (That is, they just look at the PGN header tags and ignore the move lists, which may not be present.)

PGN data doesn’t contain all the required fields and so forth. Plus XML makes development much easier instead of trying to convert from PGN to a type used in development.

PGN has an extension mechanism for header tags. What are the extra header tags that are needed over and above the ones that are standard in PGN? There is a lot of open source code around for parsing PGN files.

While it is true that there are many libraries for parsing XML, for practically all programming languages, you still have to write code. Compared to PGN, which has 90% or more of what is required, and can be extended, an XML-based standard has either 0% or 100% of what is required, depending on how you look at it. 100%, AFTER it is implemented. 0% UNTIL it is implemented.

Format Standard, Program Version, Tracking ID … Do you really want to append these to every PGN game?

I can take a valid XML file and have it parsed into a database in seconds. It’s just another way of taking a proprietary data and making it seperate. If PGN’s now were in XML format, I’d be quite happy. But I had to write my own program to parse them into a database. It’s more work than what’s out there now.

Thank-you Mike!

I will use your conventions and will have a few helpful utilities developed for the community using your standard. I will apply it and add comments this week.

Take care,

Gregory

PS Brian M. XML is far superior to pgn as the xml format is a defacto web standard and all of the tools are available for it.

If the aim is to make it easier on the USCF programmer, then an XML-encoded format is better than PGN. However, PGN is not exactly hard to parse. There is a lot of code out there already for parsing it, and it is the de facto standard for the interchange of chess games. There is a lot of software already written that can generate or consume PGN files.

The USCF can try to get tournament pairing/management program authors to support a new feed format. In the US, it has the power to do that. The programs already support the USCF’s current feed format, which is dBase files. But why make third-parties do work that will only be useful to the USCF rather than putting their effort into something generally useful, such as the ability to export PGN (assuming this isn’t a feature of the programs already?)

Also, some of the fields in the spec are not necessary and are basically just bureaucracy, such as Format Standard, Program Version, etc. Don’t make these an argument for XML over PGN.

If those fields, or others like Tracking ID, are actually needed, they do not have to be in the actual PGN file but can be provided in form data or a file that accompanies the game file upload. That is, they could come from the context. Other fields, such as information about the event, can be placed in the preamble of the PGN file using the “%” escape mechanism. If neither of those is acceptable, it is not ridiculous to define new tags for the PGN headers. Most PGN programs will ignore header tags they do not recognize, and if they don’t it is trivial to filter out fields that cannot be recognized in a preparatory step. Perhaps the feed ends up being a PGN superset, but that is OK as long as it can be trivially reduced to PGN. That is not the case with the XML format.

The argument for PGN is that it is THE DEFACTO STANDARD for chess game interchange. Why invent a new standard?

Because PGN isn’t really a standard.

You’re creating a new type of PGN if you want to try to submit byes and forfeits. Just because scripts exist for parsing PGN’s doesn’t mean they work right.

We’re not exchanging chess games, it’s a universal data format to get different systems to communicate. That’s what XML is for.

There’s no point in creating seperate files for exceptions, that’s what’s happening now in the three files you have to send USCF now.

Have you actually ever tried to parse through PGN’s before? The data isn’t exactly consistant, even if you download from TWIC’s or export from Mega Database. But that should be irrelevant since this isn’t even close to what you’re trying to accomplish. Even the expecting fields will crash PGN parsers.

Regarding consistency of PGN: yes, the import format gives a fair amount of latitude, and people take even more than they are given by the standard. That can happen with XML-based standards also.

I don’t know what you mean by PGN not “really” being a standard. There is a specification, which is generally recognized. IF you mean adopted by an international standards body, PGN has not been adopted by an international standards body. But neither has a hypothetical new USCF standard using XML. XML is a standard, to be sure, but contrary to your statement, XML is not a “universal encoding format”. As the name indicates, it is an extensible markup language affording the possibility of defining arbitrary sets of tags. But the semantics and other constraints on the usage of those tags are not defined by XML, and must be given by some other specification or standard. There are countless interchange formats that are based on XML, and therefore are susceptible to being processed by various XML toolkits, but those formats don’t automatically become standards just by virtue of being defined using XML.

An XML-based specification adopted by the USCF wouldn’t be a standard either. If you want to argue differently, then I will say that PGN is a national standard because it is based on ASCII.

So, I come back to my basic point. The defacto standard for representing chess games and game results is PGN, and the USCF should use that in preference to inventing something new. This will probably require using the PGN escape mechanism to represent some things, such as points scored through non-games, such as byes and formats, since PGN is a format for game results.

When you add items to a format that already exists, that pretty much defeats the purpose. A universal language for different systems is what XML is for.

If you don’t include Forfeit Wins, Loses and Byes, how are results going to be submitted properly? What about team information? USCFID’s? Director ID’s?

The Dates in a PGN are technically not in the American format. There’s so many reasons why you wouldn’t want to do it this way.