Best Tournament Pairing Systems

As an aside, ratings-based SS tournaments in fact try to maximize the rating differences between players. People of similar ratings will probably not meet in a SS tournament until the last round or two of the tournament. This is by design. This means that half the games, at least, in a SS tournament are, as you say, a waste of everyone’s time. Sometimes narrow rating ranges for sections may mitigate this, but this only brings in different problems.

A ratings-based Swiss doesn’t try to maximize the difference in players’ ratings; it is, rather, trying to ensure that players who have the same scoring for the tournament play each other as often as possible, while not eliminating any players. This can be seen simply by reviewing the pairing rules, where score has primary importance ahead of all other factors.
The Swiss system was created to find a way to handle a chess competition when field size makes it impossible to have a round-robin.

I cannot understand how this poster, and the previously quoted poster, just casually dismiss early-round games in an open Swiss. Sure, the higher-rated player is the favorite. But I’ve been on both sides of enough early-round upset wins and draws to know that the games aren’t irrelevant. Besides, it makes a Swiss tournament much more interesting to watch when the top players are seeded to play late. It’s much better than if, say, you pair the top two players in round 1 - a possibility if you don’t initially seed the players by some performance-based criteria.

Class or rating-restricted sections obviously make the potential rating differences between opponents much lower. There are not, as far as I am aware, any “problems” with this. It’s just a different type of tournament than an “open” event.

This year’s Tata Steel had several amateur events where the field was <1600 FIDE. (I accept your general point; just saying that it does happen.)

Also, since FIDE’s floor for rating publication is now 1200, more US players are obtaining published FIDE ratings just by playing in FIDE-rated sections of multi-section events. I see this as a good thing; not everyone agrees.

A ratings-based Swiss is still a Swiss tournament, which means that the main pairing criteria are that players should meet other players with the same score (irrespective of rating) and that there shouldn’t be any rematches. However, what “ratings-based” means (as opposed to a non-ratings-based Swiss) is that within the “Swiss” constraints, players rating differences are maximized. “Top-half versus bottom-half” is all about maximizing the overall rating differences of the games within a score group, so as to increase the probability that the highest rated players will win, and the lowest rated players will lose, delaying the meeting of the highest rated players (and indeed all similarly-rated players) until the last round or rounds. As you say later, this may make the final rounds more interesting for the spectators (and maybe even for the players), but what it doesn’t do is provide a lot of competitve games early in the tournament. You have to wait until the end for the interesting and meaningful games.

The problems I have in mind include (1) sandbagging; (2) the fact that the section boundaries are rather arbitrary and artificial, with excessive significance given to small rating differences. If the section boundary is at 1900, it means that the 1850 player does not meet the 1950 player. What is the meaning of being the best “Under 1800 player” this weekend, going up a few rating points over 1800, then being merely an also-ran “Under 2000 player” at the next tournament? Apart from players interested in the lottery aspect, what most players want, I think, are the maximum number of opportunities for competitive, interesting games, which allow them to develop their skills, reflected in increasing ratings. Doing that is not a goal for the SS tournament.

You’ll have to go a far piece to convince me that it does this better than the “NCAA Tournament” method of pairing 1 vs. n, 2 vs. n-1, etc.

Alex Relyea

(Bolding in above quote added by me.)

The statements in bold are simply not universally true, especially given the multi-section construction of most Swiss events today. In fact, almost the only large tournament in the US where that would be (more or less) true is the US Open, where there is only one section.

In a typical multi-section event, the Open section is fairly treacherous for all but the strongest players, even in the first few rounds. This is even more true when you consider that a lot of fairly strong juniors populate the lower half of Open section crosstables. And, as you move down into lower sections, initial seeding becomes irrelevant, because the top of the section is about 1.5 STDEV away from the bottom.

The rough equating of “interesting and meaningful” with “higher rated” is more likely to be true in the Open (or top) section of smaller regional or local events, especially if there is one player who is clearly a cut above the rest of the field. However, even then, the statement doesn’t always hold water. Exhibit A: the 2012 Cardinal Open.

Problem 1 is essentially eliminated if (a) the prizes offered are low, or (b) the tournament offers significant protection against sandbagging.

Local events don’t offer enough incentive to make sandbagging worthwhile. (a), therefore, is covered.

As for (b), CCA runs the large majority of lower-section events where sandbaggers come to make their “score”. However, there are so many safeguards in place against CCA sandbaggers that it isn’t worth it to try stalling in local events. Examples of those safeguards include the CCA minimum rating list (which takes precedence over USCF rating list for entry into CCA events), prize limits for provisional players, section restrictions for unrated players, extremely low prizes in unrated sections, severe prize limits for players who have been more than 30 points above their section limit at any point in the last year, and extensive efforts before/during an event to find complete rating information on any unknown player - especially foreign players.

(I note here that a player who simply does not play in any tournaments for an extended period, opting to study and improve, should not be confused with a true sandbagger. There is nothing unethical about taking time off to work on your game.)

Finally, there is a HUGE risk in sandbagging, which is obvious when one considers the composition of a typical class section at a big tournament. For ease of analysis, let’s take the World Open U2000 section as an example. In a 9 round event, playing players who are all in your rating class, you are expected to score 4.5/9. To score significant money (defined here as $1000 or more), you need to score 7/9, at a minimum (probably more in the biggest class sections, like U2000). And, to win the serious cash, you typically need 8/9.

So, to score 8/9 in that section, you need a performance that is about 2300 strength. If you are rated 1900, that means 400 points above your rating - or more than 3 STDEV above. (And that doesn’t even factor in the reduced K that players over 2100 experience. It is definitely easier to go from 2000 to 2100 than it is to go from 2100 to 2200.)

Moreover, you have to do all this with virtually no margin for error. One loss, and you’re basically done. You’re hoping all the while that, if you are really sandbagging, that none of the TDs or players recognize you. We do check every single sandbagger report that comes in. Of course, we also start checking for prize restrictions in every section during round 6 or 7, and Bill Goichberg gets a full list of every player who could win a large prize but falls under one of those restrictions. More than one player has come up expecting a large check, only to find their payout drastically reduced.

NOTE: It may be that the use of lifetime titles to determine section eligibility might be the best way to avoid sandbagging. Since lifetime titles only deal with positive results, they force a player to play in the correct section, and can’t be manipulated downward. Of course, this may discourage entries from players whose strength has legitimately dropped from their peak historical performance. YMMV.

As for problem (2), my experience is that most players want competitive games - but also want a real chance to win. Class or rating sections offer this, where a single-section tournament realistically doesn’t. I certainly appreciate and sympathize with the “purist argument”…but in anything outside of the smallest local events, you will lose entries if you don’t offer ratings- or class-based sections. (Note that, even in small single-section tournaments, there are usually class prizes.)

Also, in a tournament whose sections are defined by rating, you can “play up” if you want. So, if that 1850 player wants to play one (or more) sections, up, he is welcome to do so. In many such tournaments, it costs more for such players to play up into the top section; this is especially true if those sections are FIDE rated or norm-eligible, and keeping very low-rated players out is a good thing for both FIDE ratings and norm eligibility.

I think most people understand the plusses and minuses of ratings based Swiss tournaments, but that wasn’t the point I was making. (FWIW, at the last tourney I hosted, I even had two 1000+ point upsets, and 7 500+ point upsets. So, I understand that no game is truly a giveaway. Nevertheless, that was a very, very, weird day.)

The point is that ratings just aren’t very important to Blitz Chess the way they are to regular Chess. Suppose you have six hours available for a tournament, and you like G/60 Chess, and 100 players show up. If there are no ratings, that is going to be one awful day of Chess. With players of mixed abilities all jumbled together, players will be lucky to get 1 or 2 games of decent, competitive, Chess. On the other hand, with ratings you can play a ratings based quad, and most players get 2 or 3 games of decent, competitive Chess.

With Blitz, though, you could play 24 rounds in that six hour period, or 12 rounds in 3 hours. Who needs ratings?

Well, I suppose that isn’t how most people think about ratings. They think about them as a reward for a job well done, or a way to make people other than the top players eligible for cash prizes, or something. I, personally, don’t think much of those other alternative uses for ratings, but most people who are in USCF do think about ratings that way. Therefore, if you like that sort of thing, Blitz should have its own rating system, because it is very, very different from G/30, or even G/15.

The NCAA system is also “top half versus bottom half” (THVBH). The order of the bottom half (or the top half) is simply reversed. But the average “rating difference” would be the same as the version of THVBH that we use in chess tournaments. To see why, consider that In the chess form of THVBH, the average rating difference is (T1-B1 + T2-B2 + … + TN-BN)/N where T1 is the rating of the first player in the top half and B1 is the rating of the first player in the bottom half, etc. In the NCAA form of THVBH the average “rating difference” is (T1-BN + T2-B(N-1) + … + TN-B1)/N. These two expressions are the same, differing only by an reordering of the terms in the sum in the denominator. By the associative law of addition that reordering is unimportant.

While the average rating differences of the two systems are the same, the NCAA system is more biased towards large rating differences for the top players versus the top half as a whole, and while the NCAA doesn’t produce any more slaughter overall, the NCAA system is more biased towards giving the very top players even easier opponents than the chess form of THVBH, while making the games further down the pairing list more even compared to chess THVBH. There are more extreme differences, to the benefit of the top players, but the overall average rating difference is the same.

I am not sure whether there might not be a greater average rating difference sometimes available than that produced by some form of THVBH, but it would surely be more complicated to find it.

Has anyone tried running events using this pairing method? (It is a major rules variant, so it would need to be advertised in all pre-event publicity, and there might be more than the usual number of questions and complaints about pairings. It would also probably not be FIDE ratable.)

How would unrated players figure into this? Normally in the first round the unrated are paired against the middle of the pack, this would put the unrateds on the top boards in round 1.

FIDE had its world championship in a knockout format for a few years. I don’t see why you wouldn’t be able to play a single- or double-elimination tournament, even over a weekend. I also don’t see why it wouldn’t be FIDE ratable, as long as the time control was acceptable. The only thing you’d have to do is come up with some tiebreaking method in case one match/game ended in a draw. The solution would probably involve some sort of Armageddon deal.

Unrated players would probably go at the bottom of the pack, as with normal Swiss seedings. In theory, they are the weakest opponents, so they would be led off to face the expected early slaughter, much like the #16 seeds in the NCAA basketball tournaments.

The USCF had a knockout format for the US Championships in 1990, with the US Championships rounds during the day and the US Open rounds in the evening.

If it had been successful and/or well received, wouldn’t we still be using it?

Knockout formats in ‘ordinary’ USCF rated events have never been popular, either, and that’s probably not what was being suggested for USCF Swiss events instead of ‘top of the upper half versus top of the lower half’

Well, the NCAA method would have one definite advantage, at least. It would avoid the discontinuity that normally occurs at the midpoint of the scoregroup.

Under the conventional system, if the midpoint occurs at, say, 1750, then the 1749 plays a master, while the 1751 plays a foregone conclusion.

And if the TD or pairing program transposes (actually interchanges) these two players in order to improve colors, the 1751 may squawk that he is in the top half and is “entitled” to play a lower-rated opponent. Baloney, of course, but that doesn’t stop the squawking.

I have often argued for the elimination of Stupid Discontinuities in other areas of chess (as in life, as Mike Ditka would say). This particular discontinuity may not be Stupid, but elimination of it still makes a small point in favor of the NCAA method.

Bill Smythe

If the goal is to “eliminate” players and to increase the probability that the last round features match-ups between the highest-rated players, with perfect scores, an argument in favor of the usual method of doing THVBH pairings is that it economizes on the available rating differences. A pairing between the GM at the top and the 1200 player at the bottom is overkill. For the goals of ratings-based Swiss to be satisfied, the GM at the top does not need a 99.99% chance of winning. 99% against somebody near the middle suffices. So the traditional method makes more efficient and economic use of the available first round weak players, and protects more top half players from upsets than just the very top players.

This is not my argument, mind you. (I don’t like THVBH pairing, in case this isn’t obvious). But it’s an argument.

What sort of pairing do you like?

(Asked in a genuine effort to provoke discussion of new alternatives, not as an attempt to be snarky.)

Bill Smythe

Well, I happen to know that he very much likes his wife.

How did blitz rating system turn into pairing systems?

I don’t know. The Swiss System is very unsatisfactory, but it is not easy to find a reasonable alternative that satisfies all the different goals which tournament players have.

In the Carlisle Club, we have been experimenting with the Australian Draw system, which includes 1 versus 2 pairing, and in the final rounds, allows rematches (the so called “King of the Hill” mode).

I thought Greg Shahade’s arguments had merit. He argued in favor of abandoning the ratings-based SS and just using random pairing within score groups, with color adjustments. His argument was on the grounds of fairness. Everybody is paying the same entry fee. In a tournament everybody should be treated the same, or if people must be treated differently then everybody should have an equal chance at receiving the better (or worse) treatment. The ratings-based SS is not blind to who the players are, and is deliberately biased towards smoothing the path of the highest rated players at the beginning, so that they arrive in the final rounds with perfect scores and meet each other. That results in the big last round finish, in theory. But when those players are likely to just agree to a last round GM draw and split up the money, is that really worth the unfairness at the beginning? This just makes most players want to play in sections, where they aren’t cannon fodder for the highest rated players, and have some chance of winning prizes. However the section boundaries are artificial.

I’m afraid I’m largely responsible.

I noted that in my opinion the only really good reason for ratings to exist in the first place was to facilitate reasonable pairings, but that wasn’t necessary for blitz tournaments.

Everyone took off running with a discussion of what constitutes reasonable pairings, instead of the part about how they weren’t necessary for blitz tournaments.

I like the NCAA method in principle. As someone else noted, one really nice feature of it is you don’t get a giant discontinuity of opponent rating halfway through the entrant list. (In an 8-person tournament, the opponents of seeds 1-8 would be 8, 7, 6, 5, 4, 3, 2, 1 rather than 5, 6, 7, 8, 1, 2, 3, 4. As someone who usually plays either the top or bottom seed in the first round of my local tournaments, I’m particularly affected by this.)

It may be suited better to a knockout structure (where players are eliminated and only one player is left standing) than a Swiss, though. I can think of a few aspects that are perhaps non-ideal about it when you use it to run a Swiss:

  1. The players are getting pretty radically different pairing types from each other in the first round. Instead of everyone getting a mismatch, we range all the way from biggest possible mismatch to adjacent seeds.

  2. Players’ pairings will still jump around in a weird way, just at different times in the tournament. In that 8-person tournament, the winner of the 4/5 pairing will probably play seed 1 while the loser plays seed 8.

  3. The bottom seeds are going to get beat on even more than in a regular Swiss. (Assuming the higher-rated player always wins, in that 8-person tournament the #8 seed will play seeds 1, 5, and 7 rather than 4, 6, and 7. In a 16-person tournament, the #16 seed will play seeds 1, 9, 13, and 15 rather than 8, 12, 14, and 15.)

As long as we’re talking about the difference in pairing systems, I have tried accelerated pairings in a small tournament with the idea of getting closer round one matchups and avoiding making it a blow-out round. Because of the broadening of the center that causes, I found that after four or five rounds the players not at the top or bottom edges actually averaged larger mismatches than with non-accelerated pairings.

So I feel that as long as you are pairing within a scoregroup, you are better off using normal Swiss pairings as opposed to other non-round-robin pairing systems. At least, barring a need to accelerate to have a better chance to reduce the field to at most a single perfect score.