Call for Volunteers: Drafting XML standards

nolan · April 2, 2007, 10:13pm

I agree, let’s keep it an informal ‘working group’ for now.

BTW, it isn’t necessary to be a ‘super geek’ to be involved in this, I’m pretty much of a beginner when it comes to XML myself (though I’ve written a few things to generate XML when I had a model to work from.)

thunderchicken · April 2, 2007, 11:38pm

Sure,

I’ll get something together tomorrow when I get a chance. Do you want to continue this here or via email conversations?

nolan · April 2, 2007, 11:44pm

Let’s keep something going here in the hopes that we get a few more people, but transition over to email and see what develops. (The primary reason I don’t think we should post it all here is because that would likely bump too many people into overposting status.)

thunderchicken · April 3, 2007, 12:06pm

Here’s a start… If someone wants to give me an example of a tournament, I could probably parse right through it or write a script to parse it. I just made stuff up and of course, the players don’t match, etc.

This should align with (secure.uschess.org/TD_Affil/fileformat.php)

<?xml version="1.0" encoding="utf-8" ?> <TOURNAMENTHEADER> <FORMAT>2C</FORMAT> <PROGRAM>Chicken1</PROGRAM> <EVENTID>200704030032</EVENTID> <NAME>THUNDERCHICKEN OPEN</NAME> <NUMSECTIONS>1</NUMSECTIONS> <BEGINDATE>20070403</BEGINDATE> <ENDDATE>20070403</ENDDATE> <AFFILIATEID>A6023648</AFFILIATEID> <CITY>INDIANAPOLIS</CITY> <STATE>IN</STATE> <ZIP>46202</ZIP> <COUNTRY>US</COUNTRY> <CHIEFTD>12461707</CHIEFTD> <ASSISTANTTD></ASSISTANTTD> <OTHERID>12464524,16542434</OTHERID> <TOURNAMENTSECTION> <SECTIONNAME>OPEN</SECTIONNAME> <RATINGTYPE>D</RATINGTYPE> <TIMECONTROL>G/60</TIMECONTROL> <TOTALROUNDS>5</TOTALROUNDS> <NUMPLAYERS>10</NUMPLAYERS> <BEGINDATE>20070403</BEGINDATE> <ENDDATE>20070403</ENDDATE> <SECTIONTYPE>N</SECTIONTYPE> <GRANDPRIX>N</GRANDPRIX> <FIDERATED>N</FIDERATED> <FIDEARBITER></FIDEARBITER> <FIDEARBITER2></FIDEARBITER2> <TOURNAMENTDETAIL> <PLAYER ID="12461707" PAIR="1"> <NAME>DAVID B FREY</NAME> <STATE>IN</STATE> <RATING>1625</RATING> <ROUNDS> <ROUND>W5</ROUND> <ROUND>W2</ROUND> <ROUND>D3</ROUND> <ROUND>L5</ROUND> <ROUND>W4</ROUND> </ROUNDS> </PLAYER> <PLAYER ID="12461708" PAIR="2"> <NAME>DAVID Q FREY</NAME> <STATE>IN</STATE> <RATING>1144</RATING> <ROUNDS> <ROUND>L2</ROUND> <ROUND>W2</ROUND> <ROUND>D12</ROUND> <ROUND>L5</ROUND> <ROUND>W4</ROUND> </ROUNDS> </PLAYER> <PLAYER ID="12461709" PAIR="3"> <NAME>DAVID M FREY</NAME> <STATE>IN</STATE> <RATING></RATING> <ROUNDS> <ROUND>W8</ROUND> <ROUND>W3</ROUND> <ROUND>D1</ROUND> <ROUND>L5</ROUND> <ROUND>W4</ROUND> </ROUNDS> </PLAYER> </TOURNAMENTDETAIL> </TOURNAMENTSECTION> </TOURNAMENTHEADER>

gregory · April 3, 2007, 12:58pm

Off the top of my head, is it possible to add a few optional tags?

…
…
…
…
…
…

…
…
…
…

thunderchicken · April 3, 2007, 1:00pm

We can add what we want, but I’m assuming it wont matter what those fields are when trying to rate those tournaments / retrieve data.

nolan · April 3, 2007, 2:40pm

While the most important thing is to get the fields we NEED nailed down, we should build in flexibility for the future.

Things like team names and rosters would be nice for a variety of reasons. That could enable us to pull team standings from the upload file, for example.

The fields that FIDE is requiring are:

FIDE-ARBITER
FIDE-DEPUTY-ARBITER
COLOR
ROUND-DATE
TIME-CONTROL (USCF needs this, too, but for different reasons.)

BTW, I prefer dashes rather than upper/lower case or two words just crammed together, I think it improves readability. Your mileage may vary, but I guess I get to make the rules on that.

Color obviously has to be at the individual result level, and the arbiters are for the entire section. (However, an event that has more than one FIDE rated section could have different arbiters for each section, and FIDE treats them as separate events anyway.)

Most of the time, the time control will be fixed for the entire section, so having it at the ‘section’ level is usually going to be sufficient. However, some events have faster time controls in the early rounds, so having the ability to have that information at the ‘round’ level would help report that.

But, for multiple schedule events sometimes some players will play round 1 at a different time control (and on a different date) than other players, so it might be desirable to be able to include both the time control and the round date at the individual results level.

nolan · April 3, 2007, 2:43pm

Something else we need at the player level is an element to indicate that the player had a result that will earn him a class prize floor.

A way to indicate FIDE norms would be nice, too, though the paperwork will still have to be filled out for FIDE.

More optional elements and groups: prize lists. (Again, this is a convenience rather than something needed for either the USCF or FIDE to rate the event.)

thunderchicken · April 3, 2007, 2:45pm

Seems to me the more information the better. If you guys want to take the data, then we can throw in everything as a standard.

gregory · April 3, 2007, 2:50pm

Hi Thunder,

I would personally like to take this conversation offline in order to brainstorm. However, since we don’t have everyone’s email address, and Nolan wants to discuss this here for the time being, I will write my comments here.

I think that it would be best to draft a full specification to share this important data and make it complete as possible. I understand that this particular format is only intended to be used in the USCF. However, there are tons of custom data transport programs that are used in chess to manipulate custom data. We all write our own custom code to try to parse data from other programs. For example, SwissSys has its own internal codes and I need to parse the SwissSys text files for our own tournaments. There are many things that I can’t find out in Swiss Sys though. The documentation is not available for the internal workings of the program, and I can’t understand the format of some of the internal files. The same is true for ICC. It would be very nice to have a standard published format that we all use for general purpose data-transport.

It is my hope that if we create a standard and published format with a bit more custom data, others may eventually take advantage of it in order to all be on the same boat. If we can get most of the major data elements mapped out in this schema, and add several optional custom elements, this format can be used to transfer data between all programs, not just the USCF.

I am not sure if my ideas are valid for others, but let’s continue to brainstorm. But again, maybe it is better to take our ‘geek talk’ offline.

Take care, and thanks for your work!

nolan · April 3, 2007, 2:55pm

Tom Doan has committed to having a version of WinTD that can prepare the new upload format within a few weeks of when we have a completed standard. I haven’t heard back from Thad Suits about SwisSys yet.

We need to think about a validation suite of some kind.

thunderchicken · April 3, 2007, 2:57pm

I could take a stab at helping, but I’m pretty much limited to Microsoft technologies. Unless USCF had a place for this, I’d have to learn php.

gregory · April 3, 2007, 2:57pm

This is great! It would make my life sooo much easier.

So cool…

gregory · April 3, 2007, 3:00pm

I have exposure to php, and did a few minor projects in it. If we don’t find another php expert, I can try my hand at the php code. I can also make a web-service on my end. If you can get a web-service going Thunder, lets test it using our own databases.

gregory · April 3, 2007, 3:15pm

I believe that I read that you should not use the dash as it can cause quite a few errors in different programming languages. I believe that the issue is related to some programs interpreting the variable into a mathematical formula. Although there are ways to wrap data with dashes to get around the problems; I believe that most people recommended using underscores instead to avoid using the dash in XML. I might be wrong though; I will do some research later today.

g

nolan · April 3, 2007, 3:18pm

Our ISP runs Sun/Solaris servers and the USCF internal servers are running Linux/Fedora Core 5. Both the ISP and the USCF run Apache 2 for webservers.

MSA uses a MySQL 5 database (for now), the internal database is PostgreSQL 8.

We’re about to move up to version 8.2.3 of PostgreSQL tonight when we switch over to the new server. The performance enhancements from the combination of faster hardware (mostly faster disks) and sofware upgrades are significant, on a non-loaded machine rerates are taking between 5 and 15 minutes per month compared to 45-90 minutes on the current server.

BTW, this morning we upgraded the USCF’s DSL connection to 6 megabit inbound, 1.5 megabit outbound. We may not be taking full advantage of it yet, our firewall unit has only a 10megabit port on the ‘Internet’ side, I’m looking into reconfiguring a different firewall unit (one that was salvaged from the NY office), it has all 100megabit ethernet ports. I may work on that while I’m transferring the database over to the new server (a task that takes 4-5 hours.)

nolan · April 3, 2007, 3:25pm

I hadn’t seen any cautions about using a hyphen, but SQL forbids them since they’re treated as mathematical operators, so I guess I’m not surprised.

Here’s what I found on element names at w3.org/TR/REC-xml/#NT-Name

However, a little later on it says there are problems with using colons. I use underscores all the time in PHP code, I’d be comfortable with using them in the XML standard, or with a full stop (aka, a period). (Some people don’t like underscores in URLs, because they tend to get lost when the URL is displayed by most browser, but these aren’t URLs.)

thunderchicken · April 3, 2007, 3:30pm

I’d recommend just keeping it as simple as possible. To me, it’s not about what you name things, it’s the consistancy and a standard you’re following when you’re developing.

The dash and SQL stuff shouldn’t matter since all it would be doing is matching up XML nodes to your data, and not looking column for column. Each side is going to have to map out your fields anyway.

nolan · April 3, 2007, 3:38pm

One of the stated goals of the XML format is to make the data at least somewhat human readable.

Thus the element naming convention is significant:

DEPUTYFIDEARBITER

DEPUTY-FIDE-ARBITER

deputyFideArbiter

DEPUTY_FIDE_ARBITER

DEPUTY.FIDE.ARBITER

Which of these are easier to read? Personally, I think 2 and 4 and 5 are much easier to read than 1 or 3, especially if you’re looking for an element that has FIDE data in it. (Based on my PHP style, I probably can skim through code looking for the one with the underscores in it the quickest.)

Of course, they’re all easier to read than things like:

DEPFARB

thunderchicken · April 3, 2007, 4:03pm

I’d say go with the underscores.

What would you like for me to do now to help? I could probably take a swisssys output or whatever and parse those out programmatically.

Any XML uploading is pretty straight forward and can be checked quickly.

Topic		Replies	Views
RATINGREPORT.XML Specifications Running Chess Tournaments	19	1716	July 26, 2014
XML Status Running Chess Tournaments	12	742	December 3, 2004
Sample XML file for ratings supplements Running Chess Tournaments	39	1294	August 24, 2007
Revised Draft Format for USCF Rating Reports Running Chess Tournaments	11	390	August 8, 2006
XML or downloaded MSA? Running Chess Tournaments	10	961	September 23, 2004

Call for Volunteers: Drafting XML standards

Related topics