Thurstone models

In 1927, L. L. Thurstone proposed a deceptively simple idea: each item being compared has a latent discriminal process — a random variable representing its momentary psychological magnitude — and one item is judged greater than another whenever its draw exceeds the other's. He framed this purely in pairwise terms, leading to his Law of Comparative Judgment, and laid out five cases distinguished by what is assumed about the variances and correlations of the discriminal processes.

Case V, the most restrictive, takes the processes to be independent and identically distributed up to a translation (typically normal with a common variance). It reduces the pairwise comparison probability to a probit: $\Pr(i \succ j) = \Phi\!\bigl(\tfrac{\mu_i - \mu_j}{\sigma\sqrt 2}\bigr).$ The natural extension to N-entrant contests — whoever draws the smallest (fastest, best) wins, so winning probabilities are probabilities involving the minimum of several i.i.d.-up-to-translation random variables — is not in Thurstone's 1927 paper but is now the workhorse generalisation we'll call a Thurstone Class V model throughout this site. The same lattice machinery also handles copula-correlated variants (loosely, Cases I–IV) when the modeller wants performance dependence between contestants — we return to those in the later chapters on rank probabilities.

The pairwise case

For two contestants with performances $X_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$, $\Pr(\text{$i$ wins}) = \Pr(X_i < X_j) = \Phi\!\left(\tfrac{\mu_j - \mu_i}{\sqrt{\sigma_i^2 + \sigma_j^2}}\right)$ where $\Phi$ is the standard normal CDF. This is Thurstone's Case V (equal variances) when $\sigma_i = \sigma_j$. The link function is probit.

Bradley–Terry (1952) is the same story with a Gumbel performance distribution, giving the logit link $\Pr(\text{$i$ wins}) = \dfrac{e^{\mu_i}}{e^{\mu_i} + e^{\mu_j}}.$ Easier to fit; nearly indistinguishable in practice on pairs.

The multi-entrant case: a hard problem

With $N \ge 3$ contestants the Thurstone winning probability is $\Pr(\text{$i$ wins}) = \int_{-\infty}^{\infty} f_i(x) \prod_{j\ne i} \big(1 - F_j(x)\big)\,dx.$ Each new race shape requires evaluating that $N$-dimensional structure. Historically this is where the trade-off bit:

Thurstone (probit): theoretically pure, computationally heavy beyond pairs.
Bradley–Terry / Luce (logit): efficient, but inherits the rigid Gumbel-family shape it implicitly assumes, and its independence of irrelevant alternatives (IIA) leads to outright logical inconsistencies when used to price combinatorial outcomes — the same event computed forward versus backward gives different probabilities (see the reversibility demo).
Harville / Plackett–Luce: fast listwise formulas, but inherit IIA-style weaknesses and handle ties poorly.
Mallows / RIM: elegant priors on permutations, no latent performance scale, scaling problems with $N$ (Boehmer et al., 2024).

The tractability vs purity dilemma: until recently practitioners had to choose between believable random-utility semantics and fast multi-entrant computation.

Why ties matter (even when they don't)

In the continuous Thurstone problem two performances tie on a set of measure zero; bookkeepers and theorists can ignore them. The moment you discretise performance onto a lattice, however, ties become events with positive probability — the same grid that makes the algorithm fast also forces neighbours onto the same cell from time to time. Pretend they aren't there (or split 50/50 and move on) and the lattice answer drifts away from the underlying continuous Thurstone model.

The trick at the heart of the fast ability transform is to keep a conditional multiplicity $m(j)$ alongside the field-min density: at every lattice point $j$, $m(j)$ is the expected number of contestants tied for the win given that the winning score lands at $j$. Each contestant's tie share is then $1/(1+m(j))$ — the exact dead-heat payoff, applied to every grid cell, summed into the expected payoff. This is what makes the discrete approximation faithful to the continuous model, even when dead-heats in your application are vanishingly rare. The real-world fairness of paying dead-heats by fractional shares falls out as a bonus.

Why the base distribution matters (realism)

Bradley–Terry and Luce's logit aren't “distribution-free” — they are mathematically equivalent to assuming each contestant's latent performance is drawn from an i.i.d. Gumbel (extreme-value) distribution. Pick BT/Luce and you have picked exponential tails and a fixed shape, whether you meant to or not. That is a strong assumption about how the world generates results.

Real performances rarely look Gumbel. Finishing times have soft floors (you can't run a negative split) and fat right tails (catastrophic blowups). Player ratings are often skewed. Markets carry context-specific shape. Thurstone's random-utility framework is genuinely distribution-general: choose normal, skew-normal, heavy-tailed Cauchy, a mixture, or an empirical bootstrap of historical performances — the lattice algorithm just convolves whatever density you hand it. The model is as realistic as the base distribution you can defend, not as restrictive as the link function you can solve in closed form.

Why coherence under scratches matters

Add a horse, scratch one: the model should still be the same kind of model on the new field. Luce/BT struggles here because of IIA: removing a runner reshuffles probabilities by a fixed ratio regardless of which substructure the runner belonged to. Thurstone-class models naturally recompute the field minimum.

Where this lands today

The lattice algorithm we describe in the next page resolves the dilemma: $O(N)$ per race for a fixed grid with arbitrary base distributions, exact tie handling, set-coherence, and a tractable inverse (prices → abilities). It is the basis for the thurstone package and (since you are reading this) for the JavaScript port that powers the interactive demos.