Swiss Pairing Thought

A (ratings-based) Swiss actually has a tendency to make predictions which do worse as the tournament progresses. By the way it works, later rounds tend to have quite a few pairings between overrated players in the top half of a score group with underrated players in the bottom half.

Practically, a rating is the most “objective” method of ranking that is not prone to tampering (it can be verified by a third party – the rating list). Going alpha by name is straightforward but some people think it’s inherently unfair somehow if they have a name starting with A or Z…and M & N are always playing each other :slight_smile: . Random pairings are OK…as long as you trust that the TD is indeed being random…which is more difficult as the tournament gets larger.

By now players have an expectation that TDs will follow the established pairing rules using the rating. It can take a few minutes to explain when they don’t understand a transposition or interchange they didn’t account for in their mental estimations.

So my third/final practical point on it is…using the rating is the easiest method at this point b/c it’s straightforward and requires little/no explanation to players. I would feel like jabbing my eye out with a dull pencil if I had to calculate tiebreaks after every round!

I would say that is the fault of the ratings system’s inaccuracy.

We’ve done this at least once, Ken Sloan showed slides at a Ratings Workshop several years ago showing the actual vs expected results curve for several subsets of the data.

I think we did one for players under 1200, another for players between 1200 and 2000 and one for 2000 or up, and I think we also did separate ones for regular-only games, dual-rated games and quick-rated-only games, though the third of these was somewhat sparsely populated.

What all the graphs showed was for each subset there’s a very good match between the expected and actual curves, with the lower rated players generally doing slightly better than predicted. And if we used post-event ratings rather than pre-event ratings, the expected vs actual curves were nearly identical, which makes sense to me, although as I recall Mark Glickman thought that graph was meaningless.

One of the odder bits of data that came out of this was to note that we see slightly more games between players rated about 80-90 points apart than between players with closer ratings. I consider this an artifact of the ‘upper half vs lower half’ aspect of the Swiss System.

I completely understand the practical reasons. “That’s the way it has always been done” and “It is easy to explain” have powerful interia.

However, if there’s no theoretical or practical justification for the ‘normal’ method, then it behooves us to question that method, without fear or prejudice.

I have two thoughts on these suggestion of randomizing the pairings within the score groups, one as a TD, the other as a tournament player. As a TD, I find the theoretical discussion interesting, fascinating, and intriguing in a hypothetical sort of way. I am sure that this could be done with pairing cards or programmed to function on a computer. It would break the monotony of watching the printer spitting our normal pairings. It would add surprise to pairings and be amusing for the TD. However, I don’t think I would like to have to explain my pairings every round to players. What pairing program I used, the logic behind that particular pairing over another, are these pairings within the parameters of the rules, whether I was forcing pairings to prevent someone from winning a prize are questions I would expect to field each round. This would slow down the running of the tournament. If I was impatient and abrasive to the players, then I as the TD could shrug, post the pairings and go to my tournament room to plot more devilish ways to befuddle the players.

As a tournament player, I would hate it. Between rounds, players usually go over to the crosstables to determine who their next opponent will likely be. This gives you a chance to think about your openings, maybe skim notes, or check out some things on your computer before the round. This ability to predict an opponent is important to players. Randomizing might seem fun for the TD, but tournaments are not for the TD, they are for the players. I think players would get very suspicious that there was some bias or other hanky panky afoot. Some of the reasons that the Swiss System became popular was because it regularized the process of pairings and made them more transparent for the players. There had often been disputes about whether TDs were making favorable pairings for friends or locals to guarantee that they would win a prize. Monkey around with the pairings, and you end up with some angry players on your hands and smaller attendances in the future.

What would be interesting to me is a table that has the following…

Rows: Difference in rating between players (in 50 point intervals; 0-49, 50-99, 100-149, etc.)
Cols: Average rating of two players in a game (in 200 point intervals; 100-299, 300-499, 500-699, etc.)
Two entries per cell of this matrix: (1) percentage of games won by higher rated player, (2) percentage of draws

Can you produce such a table for me? :slight_smile:

I remember that several years ago there was a graph that showed when the average rating of the players was close to 100, there were a HUGE percentage of draws (which can be explained by them being unable to checkmate each other)! The percentage of draws went to its lowest point when the players were about 1300, then went back up as the average ratings increased into the master range. It was a really interesting result.

For the record, I am totally not in favor of this idea. Your reasoning why this is a silly idea is spot on.

You could say that. You would be wrong. It may come as a great shock to you, but no one’s “true” rating is stamped on their forehead. People play chess, from the results, we infer their ratings. It’s actually good that a Swiss tends to locate under- and over-rated players, as it will give them quicker paths to more appropriate ratings.

If one is over-rated or under-rated, then by definition the rating system is inaccurate for that individual.

I suspect using ‘average rating’ makes no sense in your table. A game between a 200 player and a 600 player would show up in the 400 group, where it clearly does not belong, since neither player is within 200 points of that number.

That’s why we prefer such requests come from the Ratings Committee.

Again, incorrect. The rating is inaccurate, which is why it’s adjusted by the rating system when the results show that to be necessary.


Rows: Difference in rating between players (in 50 point intervals; 0-49, 50-99, 100-149, etc.)
Cols: Average rating of two players in a game (in 200 point intervals; 100-299, 300-499, 500-699, etc.)
Two entries per cell of this matrix: (1) percentage of games won by higher rated player, (2) percentage of draws

I’m not sure you understood my request. For any particular game, there are TWO variables of interest.

  1. The average rating of the two players
  2. The difference of the rating of the two players.

So, for a 200 versus a 600, it would fall in the (1) average rating 300-499 category, and (2) the “difference” 400-449 category.

The measurements of interest are the percentage of wins by higher rated player, and percentage of draws.

There would be some very sparse entries in this matrix, especially at the larger “difference” categories.

P.S. I am on the Ratings Committee.

And my point is that I don’t think the ‘average rating of the players’ is a useful measure, certainly nothing we’ve EVER used in the past!

But since you’re on the committee, you can send in a request through Mark for research data, I’m sure Mark will clean up any problems with it.

Done! I’ve made the request of Mark.

And there can be multiple reasons for the inaccuracy.

  1. A rising player has not yet had the rating catch up to the increasing strength
  2. A player is have a good or bad day and playing differently from normal for that player (sick, tired, confident, whatever)
  3. The time control is better or worse for the player than the rating would indicate
  4. etc.

In addition to players who have good days and bad days, there are players who have good opponents (ones they routinely do well against regardless of the ratings difference) and bad opponents.

I have sometime suspected that with younger players their playing strength in any given game could be very highly correlated with how recently they ate, especially something like a candy bar, though I would not be surprised if a sugar rush actually resulted in lower performance.

Perhaps this could be tested next month in Nashville. But I suspect that any player assigned to the control group will be unhappy, almost regardless of their result.

Good luck getting an informed consent form from the parents of the players!

AND, even if

  1. All players have a static underlying playing strength and
  2. The predicted probabilities of the logistic are correct and
  3. Results of games within a tournament are independent of each other

that is, even if the underlying assumptions governing the Elo rating system are mathematically “perfect”, so that the only deviation in a rating from the “truth” is random, it is still the case that the ratings-based Swiss, by its nature, tends to produce matchups between overrated and underrated players in the late rounds.