Optimal pairings. USCF rules, WinTD, and a program I wrote.

After doing some struggling with WinTD, and noting the program’s age, I decided it was time to program my own. It’s coming along quite nicely, and I expect it to be good enough for me to use before my next tournament, in January. After that, I might try to polish it up for sale at a later time.

Of course, the hard part was the Swiss Style pairings. The various constraints for players not playing twice, and matching points among players and then, especially, top/half bottom half pairings made things very difficult. (Top half/bottom half is the hard part, because after you satisfy the other constraints, you do top half/bottom half, and then you have violated the other constraints. It’s not very easy,

However, I came up with an algorithm that I thought captured the essences of the Swiss Style system. It would, first and foremost, avoid pairing players who have already played. Its second goal would be to keep players with the same score playing each other, or if not possible, to minimize score differences. Finally, it would favor matches between opponents with wide rating disparities. (I won’t go into technical details, but there are distance functions with sums of squares going on.) This is done because, if there are enough rounds, they have to meet anyway, but their meeting is postponed. The goal is to have the “championship game” between the top two seeded player played in the last round. The algorithm also treats the bottom players the same way. The games for the low ranked players will gradually get easier (assuming they don’t upset their opponents) as the day goes on.

(Edit: When I first posted this note, I gave an example that compared WinTD with USCF rules. It was pointed out that I had made an error in interpreting USCF rules, and WinTD did, in fact, match the rules in that example. I have corrected this below, which also makes the example shorter.)

When it came time for testing, I got stable results, but not what I expected. After doing some evaluation, though, I realized that it was doing exactly what I programmed, but not what USCF rules or WinTD said to do. I’m currently examining things to see if I ought to make changes, or if I like my method better.

For the first rounds, the algorithm produces the top half/bottom half pairings as expected. When “odd players” and previously matched players start showing up, it diverges from WinTD and USCF procedures.

The goal of this thread is to present an example and ask for feedback on whether or not my program is “better” than other method.

(Warning: long example ahead)

The easiest way to show the problem is to consider a small tournament with three rounds and six players. They are all rated. Player 1 has the highest rating, and Player 6 has the lowest rating. Assume that they play, and there are no draws and no upsets, and no transpositions necessary for due color.

The first round is easy

1 vs 4
2 vs 5
3 vs 6

1, 2, and 3 win their games.

Now, in the second round, if we follow USCF procedures, there are three players in the top scoregroup, with one point each. The lowest player of the scoregroup is therefore paired against the top player in the next scoregroup. The remaining scoregroups are then paired, which is easy in this case, because there is only one pair in each group.

1 vs 2
3 vs 4
5 vs 6

1, 3, and 5 win. That leaves players 1 and 3 with two points. Players 2 and 4 have one point. The final round is thus:

2 and 5 have 1 point. 4 and 6 have zero points. However, two and five have played. WinTD swaps 4 and 6, and the result is.

1 vs 3
2 vs 6
5 vs 4.

My algorithm begins the same in round 1. However, it then diverges. In round 2, it is favored for 1 to meet player 3 instead of player 2, because that postpones the final round challenge. (The algorithm doesn’t take that into account. It actually just tries to masimize total distance between players. Having the final matchup in the last round is a consequence of that.) Player 2 will have to play someone with 0 points, and he has already played 5. The matchup 2 v 6 and 4 v 5 is better than 2 vs 4 and 5 vs 6, because there’s more distance between 2 and 6 than between 2 and 4. (The program treats the pairing of the two lowest players as undesirable until there is no choice, just as it does with the highest players.) Therefore, the round is paired

1 vs 3
2 vs 6
4 vs 5.

1, 2, and 4 win. The final round is

1 vs. 2
3 vs 4
5 vs 6.

The round 2 pairing looks a little bit odd, but this algorithm accomplishes the goal of postponing the championship game until the final round better than the standard way. Experimenting with larger tournaments, I find that the “middle” players move around a bit more, but the climactic games at the top and bottom are postponed as long as possible. In general, the most evenly matched games happen in the later rounds, all up and down the tourney tree.

I like mine, frankly. The games should grow more evenly matched as time goes on, which is the point of the Swiss Style pairing system. USCF rulese tend to force players closer to the top more quickly. My algorithm was also fairly simple to implement, although I haven’t done team allocations yet, and I haven’t built in accelerated pairings yet.

The other question is, if I use it at my next tournament, do I have to advertise it in advance because I’m not using a standard method? In some ways, I think I might be using a method that is even more standard than the standard method. It seems to accomplish the goal of the Swiss System better than the USCF rules for the Swiss System.

So, any thoughts on the subject?

Read rule 29D1a. Then, read rule 29D1a again. The odd player from the higher score group is normally paired against the top player in the next score group. The TD is allowed to choose a different opponent to improve overall color allocation. The TD is not allowed to choose a different opponent because he likes the evenness of the matches better.

There is nothing wrong with the “age” of WinTD. The pairing engine has been well debugged and makes spectacularly good pairings. More than once, I have looked at WinTD pairings with a reaction of “what ARE you doing here?” And then my look goes from befuddlement to slack-jawed amazement at how well WinTD has resolved color allocation problems.

You have a long road ahead of you if you intend to write a program that produces pairings that are as good as those from WinTD.

Yes. This isn’t a Swiss System, and players entering an event which claims to use Swiss Pairings expect (rightly) to play who the Swiss System tells them they should play, whether or not the pairings are optimal.

Alex Relyea

Yes if you are not going to follow USCF rules and you intend to advertise and rate the event , then you need to explain how you are deviating from those rules. You need to do that in the advance publicity AND at the event.

There are several rules that you are not following.

I stand corrected. When I was evaluating the program, I had a summary of the Swiss System in front of me, instead of the USCF rulebook. WinTD does appear to follow USCF rules in this case.

Note, though, that the USCF rules do not define the “Swiss System”. My algorithm is most definitely a Swiss System algorithm. It just isn’t the USCF implementation of the Swiss System.

What I was hoping to get feedback on was what players thought of the differences. Regardless of what the rule book says, are these pairings better, or worse?

The real test for the program will come when I’m fully finished with the pairing system, including teams and color matching. At that point, I want to compare what mine produces versus WinTD for my real tournaments. I haven’t been happy with the WinTD pairings, precisely because it tends to shove the high rated players against each other too quickly. When I have a more elaborate example, I’ll post it.

I like what you are trying to do.

For instance, in my club championship, going on this month, the number 1 & 2 players are meeting as the only perfect scores in the 3rd round of 4. That creates a huge incentive for the winner to take a short draw in the last round against a lower rated player.

I kinda like his ideas. When I was on the final downhill slide of my tournament career, after the second round it didn’t matter how well or poorly I had played–I was invariably paired up. It wasn’t that it happened to me sometimes and other people sometimes, I had to climb a mountain to be in contention for a prize (tied for 3rd probably) while the people with whom I was tied were playing each other. My position was TPSSG (Top Player in Second Scoring Group) more times than I care to count.

I had never examined WinTD pairing so closely as I have in the last few days. It does seem to resolve color allocations spectacularly well. Almost too well, I think, but I’ll have to take a closer look. In a sample tourney I created with 19 players with ratings ranging from 105 to 2225, color allocation was almost perfect, but top half/bottom half considerations were very skewed. Since the rules, and WinTD defaults, only allow 80 point swaps for altrnating colors and 200 points for equalization, I would expect to see some due color “failures”. I’ll retry the same entries, but with eliminating default values, and see if the top half/bottom half pairings are more consistent.

My program, meanwhile, once again had a final round matchup with top andbottom players, andgenerally better "sorting of players in the middle, but with more color clusters.

Oh, and as far as the “aging” of WinTD, the biggest thing I see is a single point of data entry. I know score reporting has been a bottleneck at a couple of tournaments. Also, being able to print reports is great, but I would like to look at and/or project them on the wall without printing them.

Not to mention tweeting followers with tourney status updates. OK. So, I don’t “tweet” or “follow” myself, but it seems like it is all the rage these days.

Projecting isn’t really a program thing. I’ve been to tournaments where the pairings have been projected and they were using WinTD. And I know Ron is doing that with SwissSys at his scholastic chess classes.

Going through this exercise has really forced me to think about what constitutes “good” pairings. Yesterday and today, I spent a lot of time studying a set of randomly selected players in an 18 person, five round, tournament, and comparing my program to WinTD.

The due color pairings from WinTD were much, much, better. Or that’s what I thought at first. My color pairings were very skewed. Two players (of 18) ended up playing the same color 4 out of 5 times, whereas WinTD had everyone 3 and 2. That’s bad, right? Then, I started pouring over why my program did that, and I realized that when it came to games “that mattered”, my program didn’t do so bad. What I mean by “mattered” was that the ratings involved were within the 200 point limit to consider color swapping. I actually had fewer cases in which one player received an undue advantage as a consequence of color allocation, if you throw out games in which one player had a 200 point advantage in rating.

Meanwhile, looking at the final round pairings, my program had an average rating difference of 347 in the final round. WinTD had an average rating difference of 569.

I’m not ready to throw out USCF rules and/or WinTD just yet, but it does make me really stop and consider what “good” pairings mean.

You will need to quantify “games that mattered”. Mattered in regards to effect on the top 3 or four places? Cool. But believe me, color allocation will matter to ALL of the players, whether or not they are in the running for a prize.
Also, your sample is still pretty small. Do it for a 50 player section and your problems should decrease.

Do you understand what the 200 point rule means?

Alex Relyea

By “games that mattered” I meant “games where color allocation was likely to affect the outcome”. If there’s a 400 point rating difference, it matters little who plays the white pieces.

I started this exercise after WinTD paired the number 1 and 2 players against each other in round 2 of a ten player, five round, tournament. USCF rules just weren’t producing pairings appropriate for such a tournament. My goal was to push that game until at least round 4, but without any artificial restrictions on who could play whom. I created a function with that goal, but it had a side effect that I realized was even more important. It works by minimizing the variation in disparity among games in any particular round.

In other words, in that 18 person tournament, in the first round everyone will play someone halfway down/up the tree. In the second round, there will be 9 winners and 9 losers. In eight of those games, USCF rules would have opponents playing one fourth of the way down/up the tree. However, the ninth game would have player 9 playing player 10. In other words, eight very easy to predict outcomes, and one very nearly even match, the most even that those players will play in the tournament. I would prefer a system where, in round 2, opponents are still widely separated, but that separation generally decreases as time goes on.
USCF would pair:
1vs 5
2 vs 6
3 vs 7
4 vs 8
9 vs 10
11 vs 15
12 vs 16
13 vs 17
14 vs 18

My program, instead, paired
1 vs 6
2 vs 7
3 vs 8
4 vs 9
5 vs 15
10 vs 14
11 vs 16
12 vs 17
13 vs 18.

Looks screwy, but what it does is eliminate that close match between 9 and 10, and puts a little extra distance between the other matches. It took someone out of the middle of the top scoregroup and someone out of the middle of the lower scoregroup, and paired them. Mathematically, it’s the best way to achive minimum variation in rank differences. There’s a little skew in the bottom scoregroup because the optimal pairing to minimize difficulty variation is 5 and 14, but those players played in round 1, so 14 and 15 get swapped. 9 and 10 will get back to each other if all goes as expected, but not until round 4 or 5, when most players are playing evenly matched games, unlike what would happen with standard pairings.
A lot of people would hate this because it doesn’t follow “the rules”. As Chess players, I suppose we like rules a lot. If you have been playing tournaments for 30 years using a particular method, this method must seem odd. However, rules exist for a reason, and I think my method is actually an improvement.
My program isn’t ready for use, even by me, at the moment. I’m just running test cases without a user interface. When it is, and if I am still satisfied by the results, I’ll make it available for test. Meanwhile, I would like some feedback. What are the characteristics of a “good” set of pairings? I want to see how I do compared to the traditional methods. Here are some things I want:

Tournament drama: You shouldn’t know the standings, especially the winner, until the last round.

Minimize advantages from “lucky” pairings, including color allocation advantages.

“Good” games, i.e. as few as possible obvious blowouts. (Might conflict with other requirements.) If there must be blowouts, put them at the front of the tournament instead of the back.

“Correct” ranks. In the absence of upsets, the final rankings should be close to rating order. No one should play meaningless games that won’t affect the final results.

Two things I don’t care about:

Follows USCF rules/FIDE rules/the traditional way of doing things.

Can be done without a computer. Nope. This is an inherently computationally intense process. If you let your battery die, you’ll have to finish manually, using something like USCF rules that are a bit more amenable to processing with index cards.

Thanks for the feedback so far, and thanks in advance for future input…

You could have just used the rulebook’s Harkness variation (which is available in WinTD and probably in SwissSys) and you would have had (barring color switches):
1 vs 6
2 vs 7
3 vs 8
4 vs 9
5 vs 10
11 vs 15
12 vs 16
13 vs 17
14 vs 18

. . . this round. But color allocation will influence what color you draw next round, perhaps against a player with whom you’re evenly matched. So anything that results in a large number of lopsided color allocations is going to be bad for the system overall.

I like it. It avoids the large 5-15 gap. On the other hand, it pushes 10 into the 0-2 bracket, which might result in unfortunate pairings later.
I’ll have to include this variation in comparisons. What I will be looking for is minimizing overall disparity between players. i.e., lots of close games and few blowouts. My system starts out with more blowouts, but ends with closer games. I haven’t done the stats to see what the overall effect will be.

Yes. I’ve thought of that. I’m toying with changing the way colors are allocated based on “color advantage” as opposed to just what color was played. In other words, if you play white against someone rated far above or below you, it doesn’t have much influence on whether you get white next round. That’s for the future, though. I’ll be evaluating the statistics on how various methods result in players getting advantages from being given white in games where they are playing against a closely rated player.

I have never understood why the rulebook endorses pairing down the middle player (Harkness variation) or pairing down the bottom player (main-line rule) while overlooking everything in between. What about pairing down a player who is somewhere between the middle and the bottom?

I also don’t understand why the mirror-image possibility is not mentioned – pairing up a player from the lower score group other than the top player.

The idea behind pairing the lowest player from group A against the highest from group B is simple. The two score groups usually overlap, rating-wise. Thus, the player from group A will have a higher-rated opponent (which he expects, since he is at the bottom of his group), while the player from group B will have a lower-rated opponent (which he also expects, being at the top of his group).

The Harkness variation is less likely to achieve this logical result. The middle player from group A is likely to be higher-rated than the top player from group B. This is OK for the group A player (who, since he is in the middle, can go either up or down), but not so great for the group B player (who, despite being at the top of his group, will now be paired against an opponent who is not only higher-rated, but higher-scoring as well).

If the traditional bottom-vs-top pairing would result in too large a rating spread for your taste, try something a little more moderate. Pair a player other than the lowest from group A against a player other than the highest from group B. Perhaps the second- or third-lowest from group A could be paired against the second- or third-highest from group B. Play it case by case, without rigidly following any specific rule such as “pair down the lowest” or “pair down the middle player”.

I should mention that the rulebook already allows such moderate transpositions, if done in order to improve colors (29D1b). To do so to improve perceived rating spreads is a enough of a stretch so that you should probably announce this possibility ahead of time, via a posting at the tournament site during registration.


And now for another monkey wrench.

The idea of maintaining a consistent rating spread (though not in the rulebook) is appealing, when both players have the same score. But it is a bit flawed when two score groups are involved. A score difference should be worth something, say 100 rating points for each half-point difference in the score.

If, for example, the rating spread tends to average about 300 points in the same-score pairings, then you might want to strive for a 400-point difference in any pairing where the lower-rated player has the higher score.

Bill Smythe

I was under the impression that the Swiss System tried to get rid of (multiple) perfect scores as quickly as possible. Pairing the lowest rated with a perfect score with the highest rated in the next score group seems to be the most effective way of doing that. I can’t speak to either the Harkness or the Lame systems.

Alex Relyea