Certification of Tournament Pairing Software

Brian, just a suggestion…if you want the delegates to support your motions, I would suggest that it would be beneficial to discuss issues with adjectives that don’t put someone’s back up or infer that if someone has a different viewpoint than yours that it is to be less valued.

If you wanted to speak to the issue as to why you felt that there could be a preferable format than the one currently used for the US Open, then I think you would find more individuals with open minds than by categorizing something they like as “ridiculous”. There are approximately 100 players who play in that event traditionally each year and many of them do so because they actually prefer that “ridiculous” format.

If you would truly like to see your software program accepted, I would sincerely recommend that you vet it through both the Rules Committee and ask those on the TD Committee for their input and to beta test before the annual meeting. If you don’t, then more likely than not what you will find is that the delegates will simply refer your motion for any rules changes to one or both of those committees for their input and you will lose a year in development.

Also, TD’s tend to prefer to work with individuals who are willing to listen to why the rules were written as they were rather than to have individuals tell them that the rules in place are poorly written. There are a lot of people who gave a lot of input to the current rulebook that we have. Again, many of those rules were developed by consensus of TD’s who had issues with the way they were formerly written. If you were at this year’s meeting, it would be easy to see how a rule comes out of the Rules Committee with Version A and the delegates change it to Version A minus or plus other things and there ends up being results with unintended consequences.

I don’t personally like a lot of things in it, but I respect the time and thought that many individuals provided to attempt to get a set of guidelines in place. Also, those rules do change by ADM’s, but they don’t change quickly or easily. Much of it has to do with people needing to understand why a rule change occurs and what the benefit of so doing is.

Although I use and prefer Swiss Sys, I find it to get a bit glitchy when there are less than 8 players in a Swiss and I believe it probably has to do with circular references when an initial pairing doesn’t work out and there are multiple parameters to be decided as one travels down the ladder. The program heavily seems to favor colors. Color preferences in a 4 round event are a lot more significant than a 5 round event where an imbalance is going to occur anyways. Still, like other TD’s, I find the program to be a significant improvement over having to do the pairings by hand and actually I tend to do them out and then see if the computer is “thinking” the same way I am or whether I’ve missed something. If I don’t have time, then I do the pairings, look at the standings and see if the player with 3 points is playing the 2.5 player or someone with only 1 point and basically do they “look” right.

USCF has difficulty in obtaining, training, and retaining TD’s for an obvious reason. Most people who are involved in chess tournaments are there because they love the game and simply would prefer to be playing. Hence the view of USCF as a whole tends to favor educating TD’s so that they will stick with the program and improve their skill sets rather than upon “disciplining” them for rules infractions.

Bad pairings, etc. tend to happen more when the TD has been at an event for several days or hours, has had other things “go wrong” during the event and the stress levels are high. Anything that can be done to make the job less tedious and help to eliminate confusion - such as why the computer chooses pairing A but the TD chooses pairing B would be helpful.

Sincere good wishes that you are able to improve USCF as a delegate. There are a lot of high powered individuals in this organization and if it were possible to work together as a group with the same mission in mind, a lot more could be accomplished.

Donna Alarie
Just a member

Without comment on any merits - or not - of your comment, I would humbly suggest that in my experience as a delegate this is not the best way to begin to work to convince other delegates that something ought to be changed.

I said that Bill Smythe proposed (in the forum, not an ADM) having a point system. However Bill’s proposal was not acted upon.

If you turn on the pairing logic window in WinTD you can see that it goes through iterations that seem to be somewhat similar to a point-count to determine the “best” way to modify the raw pairings.

One thing that I’ve noticed is that people often set the WinTD pairing difficulty level at 10 or lower. They’ve stared at me aghast when I bump it up to 50 to 100 (on an old Pentium II machine and during the later rounds of 100+ player sections) because they think the pairings will take f-o-r-e-v-e-r, and then they are surprised again when it only adds another 5 or 10 seconds to the process. At the higher level WinTD appears to do more calculating and seems to better mimic the look-ahead version of transpositions rather than a simpler top-down version.

Getting back to TDs that have lots of experience with using pairing cards, it seems that we are more likely to recognize when a pairing program has a suboptimal setting and thus correct the “computer-blessed” pairings before they go out (my favorite example is one where the interchange option had been turned off and thus the top score group in the final round of a state open championship was initially paired poorly - that was reviewed and corrected before that pairing was seen by any player).

Thanks for the warning about the tender egos of the Delegates. I figured that already. In the same spirit as my previous comment (which was joking, in fact) let me just say that the amount accomplished on timescales less than decades by schmoozing and flattering the Delegates versus the amount accomplished by confrontation is probably about the same: nothing, in both cases. But flattery and schmoozing aren’t my style.

In any case, you can relax. I am not seriously planning to attempt to change the US Open format, as ludicrous as it may be (like all tournaments with big class prizes)

Considering the speed of computers, I wonder why WinTD even provides a “difficulty level” control at all. Why not just always use the “best” setting, even if it takes a few seconds more? Surely the few extra seconds cannot matter. How long does WinTD take to pair a 500-player section on the highest setting?

I would suggest a cardinal, rather than ordinal, approach to pairing quality. In other words, as WinTD apparently does, some sort of absolute points system for evaluating pairing quality.

Technical Discussion:
It is less efficient to think in terms of “preferences” (this pairing is better than that for player A). It is simpler just to assign points to the quality of each possible pairing for A, letting the point totals determine the preferences. Optimal may be 0 points where it’s an opponent he’s never played, with the same rating, same score and opposite color due. Then start adding penalty points.

Like the ordinally based “Stable Roommates Problem”
en.wikipedia.org/wiki/Stable_roommates_problem
the problem I described is O(n^2). For each of n players in a section (not just scoregroup, because we want the program to decide the dropping among scoregroups for us) we evaluate all the possible n-1 pairings in each round.

But unlike Stable Roommates, the cardinal (points-based) formulation always has a solution, and finds an optimal solution rather than just any one that is stable to one-step perturbations. (The optimal solution will almost always be unique, but not always. Given a finite set of finite real numbers, there is always at least one with the smallest value, but there could be a tie for smallest.) I see Brian mentioned the Gale-Shapley algorithm for the related Stable Marriage Problem (which is related to Stable Roommates) so presumably he has had related thoughts already.

Brief Implementation:
We want each player to have the best pairing, which individually is an opponent with the same score, same rating, and opposite color due. We pair score groups the way we do because we’d rather have two games with a 100 rating point difference than one game with equal ratings and another game with a 200 rating point difference. So for the penalty for rating difference in the opponent should go up more than linearly with the rating difference. Add a penalty for wrong color or more generally an unpleasant color sequence, a large penalty for different scores that’s almost always bigger than the rating difference penalty (and goes up more than linearly with score difference, so that one gets incremental drops rather than a single drop from the top group all the way to the bottom), an infinite penalty for a previous opponent.

I think that’s it for the requirements of standard pairing. Did I forget anything?

One can add penalties for things like coming from the same team, club, city etc. and add “negative penalties” (bonuses) for being in contention for the same prize in the last round.

I have been looking at the Stable Roommates and Stable Marriages problems, as you guessed. However, right now, my pairing engine is based on weighted perfect matching (See S Olafsson, Weighted Matching in Chess Tournaments. J. of Operational Research Society, Vol 41, No 1 (Jan 1990), pp 17-24.) I am using a C++ implementation of Edmonds’ Algorithm that was developed by Vladimir Kolmogorov.

But a lot of the mechanics are similar to those in your discussion. For each possible pairing, one assigns a cost. The cost is a sum of variouis penalties associated with such things as: the number of transpositions and interchanges versus ideal “top half versus bottom half pairing” required to make the pairing, the score distance between the two players (ideally zero), and penalties for not alternating and equalizing colors for one of the players, etc. Edmonds’ Algorithm finds the least cost perfect matching, a perfect matching being one in which all nodes (players) are matched.

Unlike Olafsson’s paper, which paired each score-group separately, I am pairing an entire tournament/section in one fell swoop, so that the effect is effectively “full lookahead” not just within one score group but through the entire tournament table. As you indicated the program is left to find the drop downs that have the best knock-on effects in lower score groups. I don’t think anybody else has attempted this, mainly because the USCF and FIDE rules are so score-group oriented, and also because people generally do not appreciate just how fast computers have become and generally fret too much about performance before they even test it. Kolmogorov’s implementation of Edmonds’ Algorithm is very fast so on my desktop system a 500 player tournament round is paired in 17 milliseconds, which seems like it ought to be fast enough. This is a graph of 500 nodes and 250,000 weighted edges.

With proper tuning of the weights, I believe this approach can be made to operate as I think was intended by Rule 29, though it is hard to tell. The behaviour can be dramatically changed just by adjusting the weights. At present, the weights on the top score group are the same as those elsewhere so this means that the program accepts problems in the top score group in order to avoid greater problems later. I am not certain that this is what was intended by Rule 29, but it could be changed. At present, I have not implemented the 80/200 rule under Rule 29. I like the idea of bonuses for pairings between players “in contention” for prizes other than the “place” prizes which, notwithstanding my sarcasm earlier, is pretty cool.

Incidentally, the Go world has been using Edmonds’ Algorithm for pairing Swiss System tournaments (generally referred to as the MacMahon system in the Go world) for some time. Olaffson’s approach was used to pair the Reykjavik Grand Open as long ago as 1986, and there was a commercial chess pairing program (MATCHMAKER) which used it for a while, though it no longer seems be available.

Speaking of Stable Roommates in connection with chess tournament pairing, you might also want to check out emis.de/journals/DM/v71/art3.pdf There is a short discussion of Olafsson in that paper, also.

I suggest that this thread be transferred to the Tournament Forum as this now interests all TD’s as well as players and not just the restricted forum readers of USCF politics.

Ah-hah, I spill a little technical mumbo jumbo on the page and Brian reveals that he actually has working code! :smiley:

I agree that the computer should handle the assignment (dropping) across score groups. I’ve clarified my writeup a bit also and that’s more explicit.

I see that the Edmonds “blossom” algorithm operates on graphs. Not having Olafsson’s paper handy (and not seeing this answered in the Kujansuu et al paper), could you describe in a few words why that is a suitable formulation for a chess (or go) tournament? Or is it just used in the hope of reducing complexity somewhat vs. listing out all O(n^2) possible pairing sheets, evaluating each one and choosing the best, which is no worse than O(n^2) anyway?

I guess the weights in weighted matching would correspond to penalty-point totals, so the Edmonds algorithm is essentially cardinal. But I’d like to see how the tournament pairing problem is mapped to the graph before having any confidence in that statement. Am I right?

Well I have working code for the pairing program but the Swiss pairing is Rube Goldberg-esque. Basically, we have been using this program to pair our twice-yearly scholastic tournaments for the last four years or so. It is written in Javascript and runs inside a Web Browser, using HTML and CSS for the user interface. It isn’t very full featured in that it does not have a lot of tournament management features (byes, teams, etc). The pairing system has not been Swiss System, but rather “Australian Draw”. That is, 1 vs 2 pairings with replays allowed after N rounds. I developed it because we wanted a program that would make it easy to stage a tournament with “rolling pairings”, in which players do not play in every round, and a new round can be started at will. This turns out to be ideal for scholastic tournaments.

However, I recently I decided I might do more with the program, and that meant implementing Swiss pairings. So the “pairing engine” at this point is simply a process that generates input for Kolmorgorov’s “Blossom5” implementation of Edmunds’ Algorithm, assigning the weights. The pairings are the output from Blossom5, which I examine by hand. It isn’t at all hooked up to the pairing program yet.

A matching for a graph is set of edges of the graph such that no two edges share a common vertex. A perfect matching is a matching where every vertex in the graph is incident to exactly one edge. That is, each node in the graph is associated by the matching with exactly one edge and therefore with the one other node which is the other vertex on the edge. Edmonds’ algorithm will find a perfect matching for a graph. In conjunction with combinatoric code, given a graph and a set of weights on the edges of the graph it will find the maximum weight perfect matching. It is trivial to make this minimum weight. There are efficient algorithms for this, even more efficient now than when Olafsson wrote his paper.

So, in the tournament pairing context, the vertexes represent players, and each edge in the matching represents a game in the pairing.

That’s it.

Sounds like it should produce the identical results to just listing possible pairing sheets with total penalty point values and choosing the best (lowest penalty total.) Maybe it’s faster for a small tournament, but it would be slower for a very large one (how large?) because I saw it’s something like O(n ln(n) m) where n is the number of vertices and m is the number of edges. If all possible pairings are edges, m = O(n^2) so the complexity is O(n^3 ln(n)), much worse than the O(n^2) of my brute force approach. Maybe I’m missing something or computing the complexity wrong …

I think this might be designed modularly as software in two parts as you imply: a pairing engine called by a tournament management engine. The TD interacts only with the TM engine directly.

The TM engine lets you force score adjustments (awarding byes) and exclude / include players (again, byes, withdrawals or new entries) and keeps track of prior round pairings and results. Old pairings and results must be kept by the TM engine because they must be adjusted before being fed to the pairing engine, say by removing a player who withdrew from the list of players for pairing.

The pairing engine implements whatever pairing algorithm you want, using a standard input format so the pairing engine can be swapped out for another.

I don’t think this is right. If each “pairing sheet” for n players is a permutation of 1…n, and the brute force approach consists of enumerating the permutations, computing the total penalty, and determining which pairing sheet has the lowest penalty, then isn’t it O(n!) rather than O(n^2)? What am I missing?

As regards the complexity of the Edmonds algorithm, keep in mind that chess tournaments are relatively small N, say, N<2000, and will generally be much smaller. About N=500 is sufficient to rate the largest section at the Supernationals. Practically speaking, the computation is fast for N=500.

You’ve described 1vs2 pairings rather than Swiss pairings. Tournaments can be run that way, but there is a bit of a risk of having early draws knock the strong players out of the lead. The final results might end up the same (in either 1vs2 or swiss, an 18-player 4-round event with two strong players that draw each other and 16 weak players that always have decisive games will end up with a weak 4-0 winning and two strong players sharing 2nd-3rd at 3.5-0.5 with the occurrence of the draw being in round 1 in a 1vs2 and round 4 in a swiss - also the two strong players would be paired each round against perfect scores in a swiss as opposed to 1vs2 seeing them paired in rounds 2-4 against players with one loss). However, the current mindset of players would generally find it dissatisfying to have the strong players immediately knocking each other off the pace.

I think you’re right, and n! is much worse than n^3 ln(n) even for moderate sized chess tournaments. I don’t know if the difference is seconds or minutes but anyway since you’ve got code that does it with the Edmonds algorithm, I now think you should stick with that.

Nearly 30 years ago, Bruce Diesen pioneered a pairing system in Minnesota, if I recall correctly, that made a virtue of 1vs2-style pairings.

The examples circulated by Gerry Dullea in the Binfos at that time indicated that single-section tournaments of four rounds with well over 16 players produced reasonable winners, and lots of game between players close to one another in rating. The pairing of “same score, same rating” happened in the middles of groups. Higher rated players with low scores got paired up to lower rated players with higher scores.

This is not a Swiss-style pairing system, of course. But I believe Diesen’s system should be dusted off and considered for some uses. Does anyone here know more about it?

Good point!

If one were to have no larger concerns, it’s incontrovertible that the optimal individual pairing is same same score, same rating, opposite color due. Yet this has problems because of … well I guess the structure of Swiss tournaments. (It might be fruitful to identify this problem quantitatively somehow – the real solution may lie that way.)

I don’t want to mandate a procedure (split the score group in half, pair top half vs. bottom half) because that gets modified and we’re back to the world of rule-of-thumb rather than any sort of objective optimization. But maybe we can optimize something different.

Searching for an objective to optimize over a score group, I thought of maximizing the sum of (R * E) where R is rating and E is expected score. (Instead of minimizing the sum of rating differences.) How does it work? Let’s see:

Approximate E = (R - Ropp) / 800 + 0.5. For a given game, the total for the two opponents is

R * E + Ropp * Eopp
= R * [(R - Ropp) / 800 + 0.5] + Ropp * [(Ropp - R) / 800 + 0.5]
= [(R - Ropp)^2]/800 + 0.5*(R+Ropp)

and since the second term, summed across all games (and therefore including each player in the score group once) will always come to the same amount (sum of ratings in the score group * 0.5), we end up concerned with the first terms, the sum of squared rating differences for each game in the score group. But this won’t be maximized by the usual pairings of top half vs. bottom half, but by the bizarre top half vs. reversed bottom half.

(If on the other hand we minimize the quantity, we get the 1 vs. 2 pairings.)

In other words, for a 6 player group, 1 vs. 6, 2 vs. 5, 3 vs. 4. That will indeed protect 1 and knock out 6 – I’ve never heard of it but it makes a certain weird sort of sense – has it been tried?

I left something out though: the bounds that 0 <= E <=1 . So let’s include that too. This “truncates” the objective function somewhat by squeezing the extremes toward the middle. Early in the tournament we are most likely to have a wide rating range in a score group and hence hit those limits. Consider in a 6 person score group where #1 is rated 2000, #4 is rated 1600 and #6 is rated 1000, it won’t make sense to pair 1 vs. 6 because the rating difference beyond 400 is “wasted” in terms of the truncated objective. Better to pair 1 vs. 4 and then instead of 3 vs. 4 we can get 3 vs. 6 which will contribute more to the total. So for this score group, it’s possible that the (truncated) optimization will give the desired top half vs. bottom half 1 vs. 4, 2 vs. 5, 3 vs. 6 pairings or something close to it.

Late in the tournament or if the rating range in a score group is small for whatever reason, it will tend to produce top vs. reversed bottom though. Hm, in the later rounds, we don’t want top vs. bottom so much anyway – we’re ready to start knocking out some contenders. So maybe in the final rounds, switch from maximizing the sum of R*E to minimizing it (or just going to the individually optimal pairings where we want the opponents to have close ratings) and, either way, we’re likely to get 1 vs. 2 pairings.

Even though we’ve discussed the behavior of this objective within scoregroups, our actual algorithm doesn’t have to go scoregroup by scoregroup. As long as we have a large penalty for a game where the opponent has a different score (increasing faster than linear in that score difference) the scoregroups should stay intact, automagically.

But I wonder how the objective works when we have to drop someone down. Have to think about that …

The more complex you make the pairing system the more likely you will have players complain that the weights are biased or random. When players have difficulty in figuring our rational pairings they assume the TD is biased and is ripping them off.

One of the reason for the “rules” on pairings was because of biased pairings in a number of tournaments. In looking at bios of some players in the 1950’s and 1960’s (Joseph Platz is one example), one of the complaints was the bias of some TD’s who would pair an “outsider” against a local champion to guarantee that that player would not win an event or earn a prize. With the advent of ratings and the establishment of clear rules on pairings, the local biases were inhibited. I was told by a former VP of the USCF, William Byland, that in some places in the northeast (MA, CT, NY) as well in other states, the pairings in tournaments were scandalously bad. One local TD in my area used to take delight in making odd pairings whenever he could. It was not uncommon for him for him to find a way to put bitter enemies against each other; or put father vs. son; or two players who traveled together from out of state, even if he could have easily avoided the consequent complaints. I am sure he could have shown how he “weighted” his pairings. The USCF rules and the examples of how to do pairings belong in the rulebook for the benefit of TD’s and the players so that they can understand how pairings are supposed to be made. They are not always perfect, but they do have a reasonable simplicity to them.

So far I see only one person wanting to change the pairing rules. I have seen no evidence or examples which demonstrate the necessity for a change. I doubt that the delegates will approve any change, especially when the running of their favorite event is termed as “ridiculous.”

Once again, this thread belongs under the Chess Tournaments forum which is open to all to see.

In Olafsson’s paper, which was published in 1990, and seems to report work that was done in the mid-eighties, the weights are: (1) the score difference, (2) “the color number” (the number of times player 1 has played White minus the number of times player 2 has played Black), and (3-4) two weights designed to prevent players from floating up or floating down again if they did so in the previous two rounds. (Repeatedly being the “up- or down- floater” is something the FIDE pairing rules strove to avoid in the 1980’s and still do, unlike USCF Rule 29, apparently).

Clearly, there is nothing in these weights to drive the solution towards “top half versus bottom half” pairings. This means that within score groups the pairings will be driven by color alternation and equalization issues and will otherwise be random. Color considerations aside, in a 6-player group, 1vs4, which is the ideal “top half versus bottom half” pairing, will be as likely as 1vs2 or 1vs6.

I don’t know whether that was an oversight by Olafsson or simply a reflection of the fact that ratings were not used in FIDE tournaments of the eighties. The Dutch System (ratings-based Swiss) rules of FIDE were first adopted in 1990, the same year Olafsson’s paper was published, so I suspect that it simply was not an issue for Olafsson.

However, this is not difficult to remedy. In my implementation, there is a “transposition/interchange” penalty. For example in a situation where 1-8 have the same score and 9-10 are in the next score group down, the weights for player 1 pairings are as follows:

1vs2     I+2T
1vs3     I+T
1vs4     I
1vs5     0
1vs6     T
1vs7     2T
1vs8     3T
1vs9     SD
1vs10   SD+T

where T is the “Transposition Weight”, I is the “Interchange Weight” and SD is the score difference weight. SD is very large compared to the others, and I is greater than T. The ideal pairing (1vs6) has no penalty. (Detail: actually each player in the group has a different SD so that player 1 is less likely to drop down than player N. Since these are all player 1 pairings, this is not shown.)

The effect of this is to push the solution towards ideal “top half versus bottom half” within a score group, although the color assignment weights may push it in other directions. The relative importance of color assignment versus rating rank can be set by adjusting these weights.

I’m not asking the Delegates to change the pairing rules. I am asking for rulebook wording that makes it clear what the pairing rules are, and what features a pairing must have in order to be “legal” or “acceptable”. Rule 29 neither specifies an algorithm nor specifies what is an acceptable outcome of the pairing process. It is basically a generalized approach to pairing, with a large amount of tutorial material and chatty discussion of examples. Some of this is labelled as “TD Tips”, but much that is not labelled so, is essentially also “tips”. Each TD (or programmer) is left to turn that into a concrete procedure and is pretty much on his own as to whether that procedure is “compliant”.

According to you, the current Rule 29 was a response to an anything-goes environment where TD’s did pretty much as they pleased with pairings, not always for respectable reasons. Rule 29 may constrain TD’s more than they were in the past, but the rule still does not allow one to deterministically say that a pairing is correct.

I don’t necessarily wish to change the actual rules content, to the extent that is present, but to make it more rigorous – more like what belongs in a rulebook. (Of course, I would be pushing for a rewording that didn’t lock pairing program developers into algorithms based on cards.)

As for my comments about the US Open format being “ridiculous”: I take it from what you are saying that the Delegates don’t make decisions about proposals on the merits of the proposals, but on how friendly they feel towards the proposer and whether they feel insulted or not by something the proposer has said on unrelated questions. You aren’t the first to take this line in this thread, but don’t you think this is rather to the discredit of the Delegates? It may be true, but is it really something you should be admitting in public?