A Review of Superforecasting by Philip Tetlock

A Review of Superforecating by Philip Tetlock

“All who drink of this treatment recover in a short time, except those whom it does not help, who all die. It is obvious, therefore, that it fails only in incurable cases.”

– Galen

Before the advent of evidence-based medicine most physicians took an attitude like Galen’s toward their prescriptions. If their remedies did not work, surely the fault was with their patient. For centuries scores of revered doctors did not consider putting bloodletting or trepanation to the test. Randomized trials to evaluate the efficacy of a treatment were not common practice. Doctors like Archie Cochrane, who fought to make them part of standard protocol, were met with fierce resistance. Tetlock contends that the state of forecasting in the 21st century is strikingly similar to medicine in the 19th. Initiatives like the Good Judgement Project, a website that allows anyone to make predictions about world events, have shown that even a discipline that is largely at the mercy of chance can be put on a scientific footing.

More than once the author reminds us that the key to success in this endeavor is not what you think or what you know, but how you think. For Tetlock pundits like Thomas Friedman are the “exasperatingly evasive” Galens of the modern era. In the footnotes he lets the reader know he chose Friedman as target strictly because of his prominence. There are many like him. Tetlock’s academic work comparing random selections with those of professionals led media outlets to publish, and a portion of their readers to conclude, that expert opinion is no more accurate than a dart throwing chimpanzee. What the undiscerning did not consider, however, is not all of the experts who participated failed to do better than chance.

Daniel Kahneman hypothesized that “attentive readers of the New York Times…may be only slightly worse” than these experts corporations and governments so handsomely recompense. This turned out to be a conservative guess. The participants in the Good Judgement Project outperformed all control groups, including one composed of professional intelligence analysts with access to classified information. This hodgepodge of retired bird watchers, unemployed programmers, and news junkies did 30% better than the “pros.” More importantly, at least to readers who want to gain a useful skillset as well as general knowledge, the managers of the GJP have identified qualities and ways of thinking that separate “superforecasters” from the rest of us. Fortunately they are qualities we can all cultivate.

While the merits of his macroeconomic theories can be debated, John Maynard Keynes was an extremely successful investor during one of the bleakest periods in international finance. This was no doubt due in part to his willingness to make allowance for new information and his grasp of probability. Open-mindedness, an ability and willingness to repeatedly update their forecasts, a talent to neither under nor over react to new information by putting it into a broader context, and a predilection for mathematical thinking (though those interviewed admitted they rarely used an explicit equation to calculate their answer). The figures they give also tend to be more precise than their less successful peers. This “granularity” may seem ridiculous at first. I must confess that when I first saw estimates on the GJP of 34% or 59% I would chuckle a bit. How, I asked myself, is a single percentage point meaningful? Aren’t we just dealing with rough approximations? Apparently not.

Tetlock reminds us that the GJP does not deal with nebulous questions like “Who will be president in 2027?” or “Will a level 9 earthquake hit California two years from now?” However, there are questions that are not, in the absence of unforeseeable Black Swan events, completely inscrutable. Who will win the Mongolian presidency? Will Uruguay sign a trade agreement with Laos in the next six months? These are parts of highly complex systems, but they can be broken down into tractable subproblems.

Using numbers instead of words like possibly, probably, unlikely, etc seems unnatural. It gives us wiggle room and plausible deniability. They also cannot be put on any sort of record to keep score of how well we’re doing. Still, to some it may seem silly, pedantic, or presumptuous. If Joint Chiefs of Staff had given the exact figure they had in mind (3 to 1) instead of the “fair chance” given to Kennedy the Bay of Pigs debacle may have never transpired. Because they represent ranges of values instead of single numbers words can be retroactively stretched or shrunk to make blunders seem a little less avoidable. This is good for advisors looking to cover their hides by hedging their bets, but not so great for everyone else.

If American intelligence agencies had presented the formidable but vincible figure of 70% instead of a “slam dunk” to Congress a disastrous invasion and costly occupation would have been prevented. At this point it is hard not to see the invasion as anything as a mistake, but even amidst these emotions we must be wary of hindsight. Still, a 70% chance of being right means there is a 30% chance of being wrong. It is hardly a “slam dunk.” No one would feel completely if an oncologist told them they are 70% sure the growth is not malignant. There are enormous consequences to sloppy communications. However, those with vested interests are more than content with it if it agrees with them, even if it ends up harming them.

When Nate Silver put the odds of the 2008 election in Obama’s favor he was panned by Republicans as a pawn of the liberal media. He was quickly reviled by Democrats when he foresaw a Republican takeover of the senate. It is hard to be a wizard when the king, his court, and all the merry peasants sweeping the stables would not know a confirmation bias from their right foot. To make matters worse, confidence is widely equated with capability. This seems to be doubly true of groups of people, particularly when they are choosing a leader. A mutual fund manager who tells his clients they will see great returns on a company is viewed as stronger than a poindexter prattling on about Bayesian inference and risk management.

The GJP’s approach has not spread far — yet. At this time most pundits, consultants, and self-proclaimed sages do not explicitly quantify their success rates, but this does not stop corporations, NGOs, and institutions at all levels of government from paying handsomely for the wisdom of untested soothsayers. Perhaps they have a few diplomas, but most cannot provide compelling evidence for expertise in haruspicy (sans the sheep’s liver). Given the criticality of accurate analyses to saving time and money, it would seem as though a demand for methods to improve and assess the quality of foresight would arise. Yet for the most part individuals and institutions continue to happily grope in the dark, unaware of the necessity for feedback when they misstep — afraid of having their predictions scrutinized or having to take the pains to scrutinize their predictions.

David Ferruci is wary of the “guru model” to settling disputes. No doubt you’ve witnessed or participated in this kind of whimpering fracas: one person presents a Krugman op-ed to debunk a Niall Ferguson polemic which is then countered with a Tommy Friedman book, which was recently excoriated by the newest leader of the latest intellectual cult to come out of the Ivy League. In the end both sides leave frustrated. Krugman’s blunders regarding the economic prospects of the internet, deflation, the “imminent” collapse of the euro (said repeatedly between 2010 and 2012) are legendary. Similarly, Ferguson, who strongly petitioned the Federal Reserve to reconsider quantitative easing, lest the United States suffer Weimar-like inflation, has not yet been vindicated. He and his colleagues responded in the same way as other embarrassed prophets: be patient, it has not happened, but it will! In his defense, more than one clever person has criticized the way governments calculate their inflation rates…

Paul Ehrlich, a darling of environmentalist movement, has screeched about the detonation of a “population bomb” for decades. Civilization was set to collapse between 15 and 30 years from 1970. During the interim 100 to 200 million would annually starve to death, by the year 2000 no crude oil would be left, the prices of raw materials would skyrocket, and planet would be in the midst of a perpetual famine. Tetlock does not mention Ehrlich, but he is, particularly given his persisting influence on Greens, as or more deserving of a place in this hall of fame as anyone else. Larry Kudlow continued to assure the American people that the Bush tax breaks were producing massive economic growth. This continued well into 2008 when he repeatedly told journalists that America was not in a recession and the Bush boom was “alive and well.” For his stupendous commitment to his contention in the face of overwhelming evidence to the contrary he was nearly awarded a seat in the Trump cabinet.

This is not to say a mistake should become the journalistic equivalent of a scarlet letter. Kudlow’s slavish adherence to his axioms is not unique. Ehrlich’s blindness to technological advances is not uncommon, even in an era dominated by technology. By failing to set a timeline or give detailed causal accounts many believe they have predicted every crash since they learned how to say the word. This is likely because they begin each day with the same mantra: “the market will crash.” Yet through an automatically executed routine of psychological somersaults they do not see they were right only once and wrong dozens, hundreds, or thousands of times. This kind of person is much more deserving of scorn than a poker player who boasts about his victories, because he is (likely) also aware of how often he loses. At least he’s not fooling himself. The severity of Ehrlich’s misfires are reminders of what happens when someone looks too far ahead while assuming all things will remain the same. Ceteris paribus exists only in laboratories and textbooks.

Axioms are fates accepted by different people as truth, but the belief in Fate (in the form of retroactive narrative construction) is a nearly ubiquitous stumbling block to clear thinking. We may be far removed from Sophocles, but the unconscious human drive to create sensible narratives is not peculiar to fifth century Athens. A questionnaire given to students at Northwestern showed most believed things had turned out for the best even if they had gotten into their first pick. From an outsider’s perspective this is probably not true. In our cocoons we like to think we are in the right place either through the hand of fate or through our own choices. Atheists are not immune to this Panglossian habit. Our brains are wired for stories, but the stories we tell ourselves about ourselves seldom come out without distortions. We can gain a better outside view, which allows us to see situations from perspectives other than our own, but only through regular practice with feedback. This is one of the reasons groups are valuable.

Francis Galton asked 787 villagers to guess the weight of an ox hanging in the market square. The average of their guesses (1,197 lbs) turned out to be remarkably close to its actual weight (1,198 lbs). Scott Page has said “diversity trumps ability.” This is a tad bold, since legions of very different imbeciles will never produce anything of value, but there is undoubtedly a benefit to having a group with more than one point of view. This was tested by the GJP. Teams performed better than lone wolves by a significant margin (23% to be exact). Partially as a result of encouraging one another and building a culture of excellence, and partially from the power of collective intelligence.

“No battle plan survives contact with the enemy.”

-Helmuth von Moltke

“Everyone has a plan ’till they get punched in the mouth.”

-Mike Tyson

When Archie Cochrane was told he had cancer by his surgeon he prepared for death. Type 1 thinking grabbed hold of him and did not doubt the diagnosis. A pathologist later told him the surgeon was wrong. The best of us, under pressure, fall back on habitual modes of thinking. This is another reason why groups are useful (assuming all their members do not also panic). Organizations like the GJP and the Millennium Project are showing how well collective intelligence systems can perform. Helmuth von Moltke and Mike Tyson aside, a better motto, substantiated by a growing body of evidence, comes from Dwight Eisenhower: “plans are useless, but planning is indispensable.”

Adam Alonzi