Tuesday, March 09, 2010

Tourney Predictions

We here at TPS have a healthy appreciation for college football rankings, but it's college basketball time, so let's turn our attention that direction. In my life, the number of people that have filled out a bracket far outweigh the number that have not, but what interests me are the folks that attempt to predict what the bracket itself is going to look like. The most popular iteration of this process is probably Joe Lunardi, whose Bracketology has shown a remarkable gain in popularity in recent years.

Naturally, I wondered how accurate these predictions end up being. There are four smaller brackets of sixteen teams each and, all put together, there are 65 teams invited to the tournament (there is a play-in game for one of the 16-seed lines). That means four 1-seeds, four 2-seeds, etc., all the way though the 16-seeds. So the question was: How accurate is Joe Lunardi?

Naturally, whenever I have sports questions, I fire an email to TPS Sports Correspondent Rob Holub, and he sent along the following (excellent) assessment of bracket guesses.

It turns out there's a lot more people besides Joe Lunardi guessing! But there's something I found interesting in that rundown. Note that every guess is scored and then compared to the others within that year. Some years may be harder than others to figure, so the variance calculation controls for that. But note the variance of the variance-- the dispersion of within year performance by guesser. Ultimately, people don't consistently perform better (or worse) than average. To be certain, when you average these variances, you can generate an order of who has done best-- you can argue about how appropriate a simple mean is for determining the most successful guesser from the underlying ratings, as well as the scoring matrix in the first place, but it's a legitimate first stab at the issue-- but outside of a few individuals, most people have comparatively successful years followed immediately by comparatively unsuccessful years. The range within each guesser is generally over 10, and (just glancing) more often than not over 20.

How to account for that?

- The NCAA Selection process is notoriously secretive, though strides have been made in recent years to try and provide a bit more clarity to the process. (Individuals have been invited to mock selection meetings, though I don't have any links at the moment.) Could it be that there's a degree of information concerning the selection process that can never be incorporated? If the selection process remained the same from year to year-- namely, standards for selection into the tournament and the individuals making the decisions-- then this information should be accessible. Yes, teams and the distribution and nature of performance changes year to year, but so long as the standards stay the same, this shouldn't matter in the long run. The problem, as I see it, is that standards may stay de jure similar, but the individuals making the decision change, and the former is necessarily a function de facto of the latter. So as long as new people as continually cycled into the selection process, there may be reason to believe that there's a degree of uncertainty that can't be overcome (and if we believe the previous calculations, it could be a significant amount).

- The bracket itself is very sensitive to errors. In theory, you could have seeded every single team correctly yet have none of the match-ups correct. Also, you could have correctly identified every team selected for the tourney but actually earn zero points according to this ranking system. This could be a factor for the large dispersion in variance-- very minor errors can lead to big point differences.

- As a side note, I am curious if there is a betting market for bubble teams-- teams on the cusp of making or not making the tourney-- and I'm curious how well they incorporate information as the time gets closer. (Though that's not exactly what the above rankings measure.)

Any further thoughts?

No comments: