Team Tournament Pairings

For team tournament pairings, the major pairings programs currently just take the average rating of each team and uses that to pair the teams like a normal individual swiss. However, just because a team has a higher rating average, they might have lower rated players on the majority of the boards and, for example, be expected to lose to another team with a lower rating average 4-1. What do you think about factoring this into the pairings?

I’m not sure I understand the question. Are you saying that the pairings should be done in such a way that the team with the higher average rating is paired against a lower-rated team (with the same match score) only if a majority of the higher-rated team’s board are also higher rated than the corresponding boards on the lower-rated team?

I don’t know about SwissSys, but WinTD has the option of using an entered rating rather than an average rating, and it also has the option of using local ratings (if they exist) instead of USCF ratings. So a TD could assign a lower local rating to reflect anticipated performance (a team with a GM parent and young kids having ratings of 2700, 1100, 900, 300 could be assigned a rating of 1000 instead of 1250). However, seeing how much the lower ratings reduce the average, I’m not sure how often such a TD-instituted reduction would plausibly be needed.

This proposal looks to modify Rule 31E in some way, as WinTD and SwissSys both derive their pairing algorithms directly from USCF rules. Ergo, changing the pairing algorithms would first require a rules change, which means a proposal would have to be made to the Rules Committee. This means that said proposal will need to be quite detailed, including an argument as to why this should be considered in the first place. My initial thought is that pairing teams by anything other than average team rating is just begging for trouble.

I don’t think SwissSys has that option if you’re doing a fixed-roster team event (at least, I haven’t seen it). More to the point, though, I can’t imagine a reasonable scenario in which a director might assign a team a lower rating than its actual average. I also don’t think there’s anything in Rule 31 that allows such an adjustment. Rule 31C1 is the only variant in that section that mentions anything about adjusting ratings, and it only has to do with an alternate method for factoring an unrated player into his team’s average rating.

Sure, a TD can use his judgment - but woe be unto the director who drops a team like the hypothetical one above into a U1200 prize category and then has other U1200 teams demand justification.

The only times I’ve done it is in IHSA (not USCF) team events with only overall prizes where some teams are forfeiting boards. A simple average does not take into account the impact of the forfeited boards. The U1200 issue doesn’t arise because there are no U## sections or prizes.

An example of what he is saying:

Consider if these two teams met:

Team A:
1: 2800
2: 1000
3: 1000
4: 1000
Avg: 1450

Team B:
Team A:
1: 2850
2: 1400
3: 1400
4: 100
Avg: 1438

Based on average rating, the likely result would be a draw between these two teams. But if we calculated winning expectancy board by board, we would solidly expect team B to win the match by a score of 2.5 - 1.5. In effect, using the average rating “smushes” expectancy scores across the boards, while doing it board by board keeps clear that each board is a discrete unit.

How one would take this into account doesn’t easily occur to me.

That makes sense. But then again, I wasn’t thinking of an event like the IHSA team championship, because it’s not USCF rated. :slight_smile:

(Side note: in the 1994 version of the event, there were approximately 100 teams entered. That year, I was helping to coach my alma mater, East St. Louis Senior HS. We were seeded eighth going in. Belleville East HS, which plays in the same conference as ESLHS, was seeded seventh. Both teams were 2-1 going into round 3. They were paired against each other in round 4. Oddly enough, no one could provide me with anything resembling a logical justification for this pairing. After that, I officially gave up on IHSA. If you’re directing the tournament now, I have renewed hope.)

I’m not involved with the IHSA state championship, but rather two to four other IHSA tournaments that many schools use to qualify to play in the state.
However, it looks like the pairings for the most recent states have been done decently based on the seedings that were assigned. I’ve never been involved with the seeding process so I don’t know how that is done.

I actually asked the late Ola Bundy about it at the '94 tournament. I couldn’t understand how my team could be seeded 18th in the state the previous year, with me (~1900) on first board, and then be seeded 8th in the state the next year, with me having graduated and the remaining team being rated ~1300 or less.

IIRC, the procedure she laid out was something like this…

  1. Teams were organized into seven-team “pods”, based on the past year’s conference records, etc.
  2. Each pod was then ranked.
  3. The teams in each pod were then randomly ordered from 1-7.
  4. The teams were finally placed in their wallchart order, based on their pod ranking and random intra-pod sort (so pod 1 had teams 1-7, pod 2 had teams 8-14, etc.).

The details of the explanation left my head reeling. I wanted to debate it with her, but decided not to, as the Ron Burgundy Rule usually applied to interacting with her. I had to retreat to the Baskin-Robbins at NIU for emergency therapy. I’ve managed to block most of it out, thanks to intensive counseling.

Have directed dozens of team tournaments and a chess league for quite a few years. In each case team average was used for pairing purposes. It is possible to adjust the initial team averages before the event if the lowest board is significantly lower than the others to account for board stacking. With an unrated player on board 4, that rating can be treated arbitrarily as 400 points, or whatever figure you choose beforehand, lower than the board above. Once again, this takes into account board stacking and evens up the competition. The greatest difficulty is to assign ratings to teams with two or more unrated players.

It is my understanding the USAT rules were changed at one time to do the above to lessen the chance of having a team with 3 GMs and a player rated under 1000 and still fit for the purposes of being U2200. Not sure if this rule is true for all regions. In any event, unless there is a cutoff of U2200 for all teams, I would treat the teams as having the average of their top 4 and continue normal pairings. Teams with a high rated 1st board and three low rated boards suffer, of course, but they could have set themselves up differently. Teams like this usually are there for fun than for trying to win a prize. In most of the team events I have done, the teams with a consistent rating across all four boards and the teams with the best internal chemistry had the most success. Trying to do pairings to account for the internals during the event is messy, causes technical problems and questions of TD bias, as well as creates odd pairing scenarios. I have tested this and found that one would have to break pairing rules and come up with an intricate and not easy to follow set of criteria for pairings. There were too many inconsistencies evident from using the actual ratings rather than just the team average.

Then there is team C that is 1425 accross the board, expected to beat team A 3-1 and be slight favorites against team B (maybe averaging an edge of something like 2.07-1.93).

The following might work:

Define a team’s Caveman Average Rating (CAR) as that rating which would give that team a 50% winning expectancy against a team consisting of four identically-rated players with that rating.

For example, Team Ninja with players rated 2001, 2000, 1999, 400 would have a CAR of about 1870, because Team Ninja would have a winning expectancy, board by board, of about 66%, 66%, 66%, 1% (averaging 50%) against Team Pure rated 1870, 1870, 1870, 1870.

By contrast, the (linear) team average rating for Team Ninja would be only 1600, which doesn’t “feel” right at all.

By averaging the expectancies rather than the ratings, the CAR does a better job of discounting ratings far from the average, which tend to distort the picture a great deal.

In practice, it may be difficult to calculate the CAR, at least with a four-function calculator. One would have to guess the CAR, run it through the rating system, and see how close the result comes to 50%. If too low, guess higher and try again. If too high, guess lower and try again. After a few iterations, you might be able to nail it, at least within 5 rating points or so.

Bill Smythe

It’s similar now, but the top two groups (16 this year because of the number of entrants) are seeded directly as 1-16, while 17-24, 25-32, etc. are randomized within groups. The seedings at the top specifically and (with a few exceptions) all the way down are generally fairly accurate now. At the 2012, the 4-0’s after Friday were seeds 1 to 7 and 9. What happened to 8? It was Harkness pairings, and 8 played 1 in round 4 with 9 dropping because of color.

Is is possible that in the case you mentioned that they were re-randomizing within groups of 7 each round (for pairing purposes)? 7 and 8 would be in different seed packs, and so could be 1 and 14 in one particular round.

RE: factoring recalculated average team ratings based on win expectancy into pairings…

I believe there are serious practical problems with implementing anything like this for the purpose of making pairings. (Remember, that was the original idea.)

First, even if you figure a team’s “effective” average rating against another team, taking win expectancy on a board-by-board basis into account, you can’t use that “effective” average rating to make pairings, because it isn’t calculated until a pairing is made.

Second, I imagine the win expectancy percentages have changed over time, because the standard deviation (STDEV) of the rating system has changed. If one wanted to do this accurately, therefore, the win expectancy calculations would have to be adjusted on a regular basis along with the STDEV. (This assumes that you’d be able to get a reasonable “effective” average rating to make pairings with in the first place, which is a pretty big assumption.)

Third, even if you can overcome the first two problems, you would be looking at regular adjustments to the pairing algorithms.

It is possible that they were being re-randomized each round - though that almost seems to defeat the purpose of wallchart order. Why not just throw darts to pair each round? :laughing: My main point to Mrs. Bundy was, if IHSA believes it has seeded teams correctly, normal Swiss pairings are in order. And, if IHSA isn’t sure it has seeded teams correctly, it needs to do a better job of seeding them. From what you describe, and from what Jeff Wiewel said earlier in the thread, the pairing process seems to have improved a good deal since I was last involved with the event.

If you have players with identical ratings (or multiple unrateds), the order on the wall chart is irrelevant to the pairing process - they are supposed to be re-randomized each round. (Maintaining a fixed wall chart order is useful for record keeping only). So if your seeding process says these are 1-7 in some unspecified order, they should probably be treated as if they had identical ratings.

I think Smythe’s idea has some merit, though Boyd.

His approach asks:

What single average rating, if applied individually to each member of a hypothetical team, would provide a 50/50 expectancy against this actual team? In other words, a " hypthothetically equivalent average" rating is calculated using a hypothetical team.

For years teams in the amateur team have tried to have 3 strong players and then a vastly under-rated guy on board 4. (If they are able to find someone under-rated for 1-3 so much the better.) First of all this provides a chance of winning 3-1 or 2.5-1.5; But also because board 4 is under-rated bd 4 may perform better than expected. Bill’s approach deals with the first half of this issue. (The second half is probably a stale rating discussion, and I so look forward to those. :slight_smile:)

So, its not a matter of doing the calculation against a REAL team (I was thinking that way originally, and couldn’t see a way to do it) but rather against a hypothetically equivalent team.

It would be interesting to run a comparison against some past amateur team regional winners and see if in general their HAR (Hypothetical Average Rating) exceeded their AAR (Actual Average Rating).

A goal seek Excel spreadsheet should be able to calculate HAR. An online HAR calculator might be necessary because once people became aware of this, the Amateur Team entry rules would likely change.

Hasn’t anyone made a proposal banning identical ratings?

Throw “stale ratings” and provisional ratings into the mix. That should impact the expected scores and make pairing more a mess. So much easier just to use match and game points to pair and ignore the internal team ratings.

What to do to account for those who play better in a team event than an individual event? Or who play in only the USAT every year as their only event, thus having “stale” ratings to start? Can we include biorhythms, playing on one’s birthday, and other factors in making pairings? For example, I always play terribly on my birthday, well below expected score. So many variables, so little time for the TD to calculate them all in order to make perfect pairings.

Not that it will happen - but to make the theoretical point: Consider this team:

1 Kamsky 2845
2 Nakamura 2834
3 Gareev 2756
4 McDonald 355 (last played rated chess in 1996)

AAR 2198