World Cup 2026 · Ratings Investigation
A two-lens statistical read of the 48-team field, stress-tested through an owl, a potato, and an owl until only the true claims were left standing.
The spreadsheet's "ratings" are Jeff Fogle's market-consensus goal-supremacy ratings: higher means stronger, the scale runs from Curacao at −1.1 to Spain and France at 2.8, and 2.5 is roughly championship caliber. Crucially it is a goal scale, the spread between two teams is a predicted goal margin.
Confederation and Host columns and the "Congo → Congo DR" fix were added by the analyst, not in the source. Eight of the fourteen columns are arithmetic functions of the one Fogle number, so the sheet holds far less independent information than its width suggests.
Click any column header to sort. Filter by confederation, or isolate the three hosts. The Lean column compares each team's rank under Fogle vs under Elo, computed inside the 35-team set that has both (so it is an apples-to-apples comparison, see the audit for why that matters).
Tier dots: Elite (≥2.3) Strong (1.9–2.2) Solid (1.4–1.8) Fringe (0.7–1.3) Minnow (<0.7). ★ = host. Elo rank is a global world rank; Lean re-ranks within the shared 35.
Descriptive shape, normality, outliers, correlation, and group tests on the 48-team cross-section.
Mean 1.367, median 1.60, SD 0.835, range 3.9. The mean sitting below the median is moderate left skew (−0.72): a dense cluster of decent teams with a thin tail of weak ones dragging the average down. Natural tiers fall out of the 0.1 grid, though heavy ties mean the tier lines are softer than they look.
Shapiro p = 0.062, D'Agostino p = 0.067, Jarque–Bera p = 0.114 all fail to reject; Anderson–Darling rejects (A² = 0.79 > 0.73). Verdict: approximately normal with a real mild left tail. Honest caveat surfaced by the audit: running normality tests on a variable rounded to 0.1 with 36 ties is close to decorative. The descriptive skew is the real takeaway, the p-values are not.
Only one team clears the 1.5×IQR fence: Curacao (−1.1), also the most extreme z-score (−2.95). Every flagged tail is at the weak end. The elite at 2.8 sit comfortably inside the fence, the ceiling is compressed, the floor runs long.
Mean rating by confederation: UEFA 1.97 and CONMEBOL 1.97 are statistically tied at the top, well above CAF 1.06, AFC 0.81, and CONCACAF 0.62. Because variances differ wildly (Levene p = 0.001, CONCACAF spans Curacao to Mexico), the right test is Kruskal–Wallis: H = 24.6, p = 6e-05, a large effect (ε² = 0.49). Caveat: this is ecological, it describes pools, not individual teams.
Hosts average 1.70 vs 1.34 for the field, and a Welch t gives p = 0.048. Do not trust it. With n = 3 and two hosts identical at 1.8, the host SD collapses to 0.17 and manufactures the "significance". A permutation null beats the host mean 26.6% of the time, Mann–Whitney gives p = 0.47, and Canada (1.5) is below the field median. There is no provable host effect here.
Across all 12 groups the means are statistically indistinguishable (ANOVA F = 0.16, p = 0.999). That is not boredom, it is the fingerprint of FIFA's pot system balancing every group. The difficulty lives in the variance, see the next section.
Profiling, structure, and the two pictures that carry the most meaning: the Fogle–Elo scatter and the group-difficulty spread.
Hover any point. The diagonal is rough agreement, teams above the line are rated higher by Elo than by Fogle, below the line the reverse. Both axes are ranked inside the 35-team set that has both, the only fair way to compare.
The biggest real gaps once the rank scales are matched. Green means Fogle is more bullish than Elo, blue means Elo is more bullish. USA is the cleanest signal (host optimism the market prices and Elo ignores).
Groups F and I tie on mean rating (1.675), but they are nothing alike. Group F (Netherlands, Japan, Sweden, Tunisia) is a balanced grinder, SD 0.37, no soft team. Group I (France, Norway, Senegal, Iraq) is top-heavy, SD 1.03, France strolls while three teams fight for second and Iraq absorbs the hardest single schedule on the board.
Missingness is not random. Elo is absent for 13 teams, none in the top six, they are upper-middle UEFA (Norway, Czech Republic, Turkey, Sweden, Scotland, Bosnia), four AFC sides, and the three weakest minnows. Polymarket exists only for nine favorites. Nothing was imputed.
The process Michael loves. First read the numbers like an owl. Then turn into the potato and attack with no mercy. Then return to the owl to weigh which blows landed and render the real verdict.
Spawned as an independent agent with no stake in the analysis, told to verify every attack against the actual files before making it.
Every potato hit was re-checked against the data before being accepted. Most landed.
UEFA and CONMEBOL dominance, the left-skewed expansion field, strong Fogle–Elo order agreement, the variance-not-mean reading of the "group of death," and one actionable market gap (USA). The gauntlet earned its keep: it killed a headline that was a bug, downgraded three stories from insight to texture, and left a smaller, truer set of claims.
Tied back to the project's own betting principles.
The pipeline ran exactly as requested. One canonical cleaned dataset was built first (single source of truth, reconciled to control totals: 48 teams, 12 groups of 4, six confederations, three hosts). Two skill agents then analyzed it in parallel and blind to each other via the parallel-skill-orchestrator: one applying statistical-analysis, one applying exploratory-data-analysis. Their findings were synthesized, read by an owl, attacked by an independently spawned potato, and reconciled by a second owl. This page is the output-html step.
ratings_analysis/A correction worth keeping: the statistical agent's rank-divergence table compared a 1–48 Fogle rank against a global Elo world rank, which manufactured a false "Fogle loves CAF" signal. The EDA agent computed it correctly within the shared set. When two agents disagree on a derived number, that disagreement is the audit doing its job.