Two of the more numerate shops in the prediction business published 2026 World Cup probabilities within days of each other. Goldman Sachs ran 50,000 Monte Carlo tournaments off an Elo-anchored Poisson goal model. Nate Silver’s Silver Bulletin ran 100,000 off PELE, the successor to the soccer ratings he built in his ESPN days. In addition, we have the wisdom of crowds: roughly half a billion dollars of real money on Polymarket and Kalshi, alongside the sportsbooks.
Almost every gap between the models and the betting line traces back to specific modeling decisions. The forecasts aren’t fighting over the teams. They’re fighting over models.
Here is the board, in championship probability:
|
Team |
Market |
Goldman |
Silver Bulletin |
|
France |
~17 |
18.9 |
11.7 |
|
Spain |
~16 |
25.7 |
18.5 |
|
England |
~11–12 |
5.0 |
10.4 |
|
Argentina |
~9–10 |
14.3 |
18.7 |
|
Brazil |
~7–8 |
7.6 |
6.1 |
|
Portugal |
~6 |
4.8 |
4.9 |
|
Germany |
~5 |
4.5 |
6.6 |
|
Netherlands |
~3–4 |
5.2 |
2.9 |
|
Norway |
~3 |
1.6 |
4.0 |
A caveat before the autopsy: “the market” isn’t a single number. France trades anywhere from about 12 to 17 percent depending on whether you check Polymarket, Kalshi, or a devigged sportsbook. That five-point spread, on the favorite, turns out to be roughly the size of the “edges” the models claim over the line — a fact worth holding onto until the end.
The disagreements are a map of assumptions
Spain is Goldman’s Elo bet. Goldman builds its goal expectations on Elo, the rating system devised for chess, and Elo loves Spain — 52 points clear of Argentina, 84 clear of France. So Goldman lands at 25.7 percent, about ten points over the market, and the report candidly describes the position as overweight Spain. PELE blends ratings with squad market value and a “tilt” parameter for attacking-versus-defensive lean, and that blend pulls Spain back to 18.5 percent — basically the market. The largest single disagreement in the whole exercise is, at bottom, a referendum on one input.
England is Goldman’s grudge. Goldman gives England 5 percent, less than half the market. This isn’t an input; it’s a coefficient. The model hands a “mentality” boost to major footballing nations and then explicitly exempts England, stacking on a geographic penalty for the likely altitude-and-heat ordeal against Mexico in Mexico City. PELE has no England-chokes term, so it leaves the Three Lions at 10.4 percent, on the market. As historical priors go, Goldman’s isn’t crazy: the last time England’s World Cup died at altitude in the Estadio Azteca — the 1986 quarterfinal — it died at the Hand of God, Diego Maradona’s fist getting to the ball before Peter Shilton’s gloves, with Argentina the beneficiary. Whether you can hard-code divine intervention into a Poisson regression is another matter.
France is Silver’s draw bet. Silver does to France what Goldman does to England, from the opposite direction. PELE rates France around third and then docks it to 11.7 percent, well under the market’s 17, largely for the brutal group it drew alongside Norway and Senegal. Goldman lands at 18.9 percent, near the line. The France gap is about bracket geometry, not talent: Silver thinks the draw is underpriced; Goldman and the market think France is good enough not to care.
Two kinds of forecaster
It helps to know who is doing the forecasting, because the two camps were trained in different schools and it shows.
Goldman’s are the bank’s economists. Their professional habitat is the factor model and the benchmark: decompose the world into exposures, attribute every deviation to an interpretable variable, manage the risk, and stand behind a coherent house view. The language gives them away. They describe their forecast as overweight Spain and Argentina, underweight England and Portugal — the vocabulary of a portfolio positioned against an index. To a portfolio manager, the market is the benchmark, and skill is expressed as deliberate, attributable active bets away from it.
That habitat carries an incentive. A research desk has to have a view. A model that reproduced the betting line would be unpublishable — there’s no product in “we agree with Polymarket.” So institutional gravity pulls toward manufacturing a differentiated, narratable position: Elo says Spain, so plant the flag on Spain. And the feedback is weak. No money rides on the call, and you cannot meaningfully grade a 26-percent Spain forecast against a 16-percent market from a single tournament. The discipline, then, is internal coherence and not-looking-foolish — not calibration.
Silver came up through a different door: baseball projections, election models, and — the part that matters here — poker and sports betting. The bettor’s native posture is respect for the closing line. In a liquid market the closing price is famously hard to beat; “closing line value” is the gold-standard scorekeeper precisely because beating it is rare. So a sharp’s default is humility toward the price, with disagreement reserved for specific, nameable mispricings. Skin in the game and fast, merciless feedback breed calibration the way nothing else does. The bettor is also fluent in the single most underrated move in forecasting: no bet.
Now the irony. You’d expect the gambler to make aggressive, bold bets and the bankers to rely on cautious tilts from the market. Instead the bank takes the largest absolute swing on the board — Spain, plus ten — while the bettor sits near the market on the headline favorite and saves his deviations for elsewhere. But the flavor of each one’s deviations fits the training perfectly. Goldman’s big bet is a commitment to a factor: Elo as ground truth, the model-builder trusting the variable over the price. Silver’s bets are about spots: a vicious group for France, a loaded roster for Argentina, Haaland dragging Norway upward — the bettor’s focus on matchups and lineups. One trusts the model; the other trusts the read.
Neither is obviously the better instinct. The quant’s discipline is a genuine guard against overfitting and storytelling; the bettor’s market-respect is a genuine guard against falling in love with your own model. I’ve spent many decades on both sides of the desk and like to hear from both before making bets.
When experts agree, trust the crowd more — not less
Here is the part that runs against intuition. The natural reading of the table is that Argentina is the live contrarian play: two expert models, built on entirely different machinery, both sit well above the market on the defending champion. Two experts agree against the crowd — that’s confirmation, isn’t it? Back the experts. Not.
Start with how the gaps are distributed:
|
Team |
Market |
Goldman − mkt |
Silver − mkt |
Pattern |
|
Spain |
~16 |
+10 |
+3 |
Both above |
|
Argentina |
~9.5 |
+5 |
+9 |
Both above |
|
England |
~11.5 |
−7 |
−1 |
Both below |
|
France |
~17 |
+2 |
−5 |
Split |
|
Netherlands |
~3.5 |
+2 |
−1 |
Split |
|
Germany |
~5 |
~0 |
+2 |
Split |
|
Norway |
~3 |
−1 |
+1 |
Split |
Take the clear agreement cases first — Spain and Argentina, where both models lean the same way against the market. The two models are not independent witnesses. Both lean heavily on historical and roster strength: Elo and ratings, transfer values, past results. Agreement among positively correlated estimators is weak evidence, not strong — their shared inputs guarantee shared blind spots. Meanwhile the market has seen everything the models have seen – including the model’s results themselves – and more: inside information, details like weather forecasts and travel schedules too small for modelers to bother with but valuable for specialized bettors across many betting markets, price action on the prediction markets and bookmaker lines. So when both models float above the market on Argentina, the likelier explanation isn’t that they’ve jointly spotted an edge. It’s correlated error — both overweighting paper strength — while the crowd prices the things they structurally miss: Messi at 39, current form, the well-documented discount the betting public applies to reigning champions. The same logic quietly indicts Spain, where both models again sit above the line. When the experts cluster on one side, the market’s refusal to follow is itself the signal.
This is a Bayesian lean, not a law, and it is only as strong as the market is efficient. The World Cup winner market is half a billion dollars deep, which makes it about as sharp as sports markets get, which makes the lean strong here. In a thin, sleepy market a good model genuinely can beat the price — that’s how sharps eat. Know which kind of market you’re standing in before you decide whom to trust.
Now the clear split cases — France, the Netherlands and Norway — where Goldman is above the market and Silver below, or vice versa. Here the market price is doing something much less impressive than aggregating wisdom. It’s sitting at the centroid of a disagreement you can already see: the crowd is averaging the same two camps you’re staring at. The price adds no information you don’t already have. You cannot outsource the judgment to the market, because in the split case the market has handed the judgment back to you. To take a position on France you have to decide, yourself, whether you believe Elo’s verdict on team strength or PELE’s verdict on the draw. The midpoint reflects no consensus; it’s the resting point of a tug-of-war.
So the asymmetry, stated plainly: trust the crowd most when the experts agree against it, because their agreement is probably correlated error the crowd has already corrected. Trust the crowd least when the experts straddle it, because the price is just the midpoint of a fight you’ll have to adjudicate on the merits. That is exactly backwards from the instinct that expert consensus is a strong signal and expert disagreement leaves you stuck with a noisy average.
The precision is hiding the noise
Step back and the disagreements shrink. Goldman’s 26-percent Spain against a 16-percent market looks like a confident, money-making call. But recall that the market itself disagrees with itself by about five points on the favorite, depending on the venue — roughly the magnitude of most of these “edges.” Over a single 104-game tournament you cannot distinguish a 26-percent forecast from a 16-percent one; the gaps live comfortably inside the noise.
Which is the whole point. When you prefer one model to another, you are not buying a better estimate. You’re buying a prior — Elo supremacy, an England jinx, an underpriced draw — and a bet that your prior matters more than a crowd that has already half-priced it. The models are good. The probabilities are sincere. But the disagreements among them map their builders’ assumptions, not anyone’s edge.
Enjoy the tournament. Bet if you are so inclined and can afford to lose. Bet with the bankers, or the poker player or pick for yourself. And if England really does go out to Mexico in the Azteca, resist the urge to call it destiny. It was only ever a coefficient.
Read Full Article »