Christian Holland

Why Superforecasting Works: One Simple Trick To Out-Forecast the CIA

There's a man called Phillip Tetlock who says he can predict the future.

More specifically he says he's discovered a set of techniques that allow ordinary people to forecast future events with incredible accuracy across nearly any domain, from politics to sport and from pandemics to finance. These ordinary people are so good they even outperform the CIA in forecasting revolutions, wars and other geopolitical events, even when the CIA have access to secret intelligence reports that they don't share with the ordinary forecasters.

Tetlock calls these techniques superforecasting, claims anyone can learn them and has written a book explaining exactly how they work.

You're probably sceptical of this.

That's reasonable - there are lots of people who think they're smarter than the CIA but lots of them also think Elvis is still performing in South America and the moon landing never happened. I certainly wouldn't recommend reading their books.

But the difference is that Tetlock isn't a fraud or conspiracy theorist: he's telling the truth. Occasionally we do come up with simple new ways of training people to think that, over time, generate staggeringly powerful results. Sometimes there is just 'one simple trick' that, when applied consistently and properly, can help you out-forecast the CIA.

A Difficult Pitch

So how can superforecasting cut through the noise and reasonable scepticism to get people to take it seriously - beyond the minority who've read the book and papers today? I think we should certainly expect it to be hard. Imagine you were an early scientist trying to convince your local ruler that this new thing called the scientific method was a good idea. You carefully explain that it's a very simple process. All you have to do is build an understanding of the world, use it to make falsifiable predictions, test those predictions and update your understanding of the world based on the results.

You explain that if you go through this loop enough times you'll learn so much about the world that you'll be able to start changing it in a way you never could before. This will culminate in you being able to fly all the way to the moon (maybe that's a bit of a stretch for your first sales pitch but it's not that far off). You'd probably be laughed out of the room. At the very least you'll come second to the very convincing alchemist and witch burner who provide much more intuitive and immediate solutions to your ruler's problems. [1]

But it's definitely worth trying to make more people understand superforecasting because better forecasting matters so much. Being able to accurately predict the future and the likely results of our actions is vital in nearly every area of human activity, from development and philanthropy to government planning and investment. Superforecasting could be transformative in almost all of these areas. But it first needs to be understood.

Superforecasting Under the Hood

So how does superforecasting actually work?

At its core it's a very simple set of rules. If you want to forecast something accurately you take a lot of people, ask them to forecast lots of things and keep track of whose forecasts are the most accurate.

In order to do this you set forecasting questions with clear outcome conditions. For example: 'Will there be more than 100,000 deaths from COVID, as reported by source X, in location Y, by date Z'? or 'Will Google's total global revenues, as reported by source X, exceed $Y on date Z?'.

Then you start to work out who the best forecasters are by comparing their forecasts to what actually happens. The best forecasters are the most well-calibrated: the things they say have an 80% chance of happening do actually happen about 80% of the time. [2] You use a simple bit of maths that converts this difference into a number called a Brier score. [3]

Once you've started to work out who the best forecasters are you can generate the best possible forecast for any future question. You do this by averaging together the forecasts of all the forecasters while giving more weight to those with the stronger track records. This is called an aggregate forecast.

For example, imagine that two forecasters predict that that there will be no new lockdown in the UK in 2022 with 80% and 60% certainty respectively. If the first forecaster has a better Brier score than the second, the final aggregate forecast will be closer to 80% than 60%. If their Brier scores are identical the aggregate forecast will be 70%. If the second forecaster has a better Brier score the aggregate forecast will be closer to 60%. The greater the difference in Brier scores the more the aggregate forecast will be weighted towards that of the better forecaster.

And that's it. That's superforecasting. As you add more and more forecasters and as their track records get longer and longer the aggregate forecast gets better and better and the strikingly impressive results outlined above start to emerge.

It's worth pausing here because I think it actually feels a bit disappointing to pull back the veil. Behind the word superforecasting it feels like you should have unimaginably complex processes that are too difficult for a mere mortal to understand. You might imagine hundreds of AI models churning away on supercomputers with the results being fed to a genius in the middle of a situation room that looks like something you'd find at NASA. [4]

But when you break it down there are really just two core ideas upon which the idea of superforecasting is built. The first is a tight feedback loop between forecast and result and the second is a clever twist on collective intelligence/the wisdom of the crowd.

The Feedback Loop

The tight feedback loop means you don't focus on what you think a good forecaster should look like and then try to find these people. And you don't try to work out what a good forecast should look like and then train people to make forecasts this way.

Instead you allow as as many varied people as possible to make forecasts in whatever way they'd like and then let the results speak for themselves. You then close the loop by using these results both to identify good forecasters and to identify techniques that produce good forecasts. [5]

This obsession with results over process and credentials is important because a good forecast can often look like a bad forecast. And a good forecaster can often look like a bad forecaster. Therefore judging a forecast or forecaster by anything other than accuracy is a risky business. It's likely to bias in favour of people who seem convincing and confident and who have an impressive sounding process over those who are actually accurate. [6]

By instead focusing only on results you increase the talent density of a pool of forecasters both by identifying the best forecasters and also by identifying and sharing the best forecasting techniques.

A Wiser Crowd

The second thing that's striking about superforecasting is the way it puts a clever twist on the wisdom of the crowd in producing aggregate forecasts.

The wisdom of the crowd isn't a new idea. We've known for a long time that averaging guesses or opinions from lots of people can give surprisingly accurate results. The classic example is getting a whole village to guess the weight of an ox at a fair, first described by Francis Galton in 1906, where the average of all the guesses was 0.5kg off the actual weight of 544kg. [7]

But not all crowds are wise. Mobs certainly aren't. And diluting the forecast of a superforecaster by averaging it with forecasts from a bunch of dart throwing monkeys would only make it worse.

The clever twist is introducing a qualifying step - not anyone can get into the crowd and not everyone has an equal voice. Bad forecasters are excluded and the best forecasters receive the greatest weight in the aggregate forecast. Forecasters are also asked to reason independently, before comparing their predictions, to avoid groupthink creeping in.

Together these small adjustments are enough to reduce the risk of a wise crowd losing its collective mind and instead ensure that we wring every last drop of intelligence out of the heads of those involved. [8]

Once we've removed this risk we can also benefit from increasing the size of the crowd: as the number of forecasters in the crowd increases so does the accuracy of the aggregate forecast. Every new member has a chance of having some new information about the problem that no other member has (signal), while they will also likely introduce some random error (noise) into the mix. But the beauty of the crowd is that while the noise is random, i.e. equally likely to overshoot or undershoot the truth, the signal is not. So as you add more and more people to the crowd the noise that they introduce tends to cancel out while the signal tends towards the truth. [9]

There's a lot more detail to superforecasting than these two ideas. But these details all drop out of the core feedback loop. For example we know that the best forecasters (called superforecasters) are often found in the most junior levels of organisations. We know that the these superforecasters are usually accurate across lots of different subject areas - i.e. the core skill of forecasting is generally more important than subject matter expertise. And we know that anyone can be quickly trained to become strikingly more accurate at forecasting very quickly. But the only reason we know these things is because of Tetlock's obsession with tracking what works in forecasting over a long period of time.

This is why I think most summaries of superforecasting get the emphasis wrong - rather than trying to memorise a bunch of seemingly unconnected details about what superforecasters look like and where they're found, it's much more important to focus on why we know these things in the first place and how the core ideas of superforecasting work. Once you understand those then the details follow naturally.

Unrecognised Simplicities

It's worth zooming out for a second, because at some level all of this feels a bit too good to be true. At its core so much of superforecasting does just boil down to 'Make specific forecasts and test them, use the feedback to get better', and that still seems a bit underwhelming, especially compared to the incredible results we get from it.

But this tension is actually very exciting as it points at a much bigger idea: people consistently underestimate how simple powerful ideas can be.

There isn’t one novel thought in all of how Berkshire [Hathaway] is run. It’s all about … exploiting unrecognized simplicities - Charlie Munger

Some of the most incredible parts of human progress have very simple ideas at their core. From the scientific method (make specific predictions and then test them) we eventually got space stations and vaccines. From Colonel Boyd's OODA loop we got high performance teams and far more effective military action. And from superforecasting we get retired schoolteachers from Idaho outperforming the CIA. There's just one simple loop at the centre of each of these ideas and all that matters is how fast you can hurtle around it, spinning off these incredible results. [10]

But, unlike Charlie Munger, we consistently underestimate the power of these simple ideas because we don't recognise how unclear and muddled most thinking is.

That's why we look for the extra special sauce when trying to work out what makes science and superforecasting so effective, and find it so hard to believe that one simple idea can generate such incredible results.

I always found the phrase evidence based medicine amusing. It's so simple and obvious that you want to ask what the opposite is. Bullshit based medicine? Conjecture based medicine? Flip a coin and see based medicine?

Superforecasting is similar. The opposite of superforecasting is making imprecise forecasts without any idea of how accurate you are and not checking if you were right or not.

And that's the average standard of forecasting in politics, punditry and most of life! It's less that superforecasting is super, and more that most forecasting is awful, that explains why superforecasting is so impressive. And that's a large part of why it can be explained by such a simple idea: the average standard of forecasting is so bad that the results of one clear feedback loop are spectacular by comparison. [11]

What's truly exciting is that most of us don't realise this. We might have a slight sense of unease that maybe our predictions aren't all that accurate but we don't get anywhere near understanding how bad we truly are.

It's only once we're exposed to the gold standard of superforecasting that it throws the poor current standard of forecasting into sharp relief. And then we suddenly wonder why we never noticed it before.

While it might feel depressing to realise you've been walking around with such a huge blind spot for so long it's actually incredibly exciting because it implies the existence of other blind spots. And if one of these is large enough to revolutionise the whole field of forecasting then what might the others contain?

Which leaves us with one final intriguing question:

What other incredible low hanging fruit is sitting in plain view that most people are missing and that the next Tetlocks will soon discover?

And hints at least one way to find them:

Start poking around at ideas where you feel a slight sense of unease that something isn't right but no one seems to acknowledge it. See where it gets you.

Perhaps unsurprisingly these aren't new ideas - they're part of the core thinking of two of the most successful startup founders and investors of all time. [12]

Notes

1 - There's a parallel to the situation today where I think most people would be more willing to accept 'AI/ML/Big Data' as a solution to a forecasting problem over superforecasting, even when the problem really isn't well suited to these methods.

2 - Note that there's a bit of nuance here - being a good forecaster doesn't mean being very certain about everything, it means being very certain about things that you should be very certain about and very uncertain about things you should be very uncertain about.

3 - Strictly some of what I say here about Brier scores is a simplification. There are some caveats about how exactly Brier scores and other metrics are used in different contexts in forecasting but that's a rabbit hole for another day.

4 - There is definitely a place for using AI/ML in superforecasting. Generally it takes the place of either (A) a source of information a forecaster can consider when making their own forecast or (B) a forecast in its own right that is added into the aggregation or (C) both. In forecasting problems that are full of machine legible data that are a good fit for this kind of technique then this can be a very useful boost to forecasting accuracy.

5 - If you're noticing a similarity between superforecasting and science then I agree with you. At its simplest superforecasting is simply the application of the scientific method to the field of forecasting.

6 - This is particularly important because the best techniques for forecasting actually sound quite counter-intuitive and the best forecasters often look unconvincing and unimpressive.

7 - The broad idea of collective intelligence goes back far further, at least to the 4th century BC when Aristotle wrote this: "it is possible that the many, though not individually good men, yet when they come together may be better, not individually but collectively, than those who are so, just as public dinners to which many contribute are better than those supplied at one man's cost"

8 - Note that these adjustments also rely on the feedback loop of forecasting. You need a tight feedback loop between forecast and result to tell who's good in order to (A) decide whether to let them into the crowd and (B) to decide how loud their voice should be.

9 - This is a simplification and doesn't cover systemic biases that mean all people are likely to be wrong in the same direction (e.g. things like the planning fallacy where we consistently underestimate how long something will take). But it is still a useful way to make the wisdom of the crowd intuitive.

10 - Again, the distinction between superforecasting and the scientific method feels arbitrary because it is. Superforecasting is what happens when you apply the scientific method to forecasting. The OODA loop is not dissimilar.

11 - I wonder if this is what it feels like to be thinking about a field when science crashes into it for the first time? Maybe doctors throughout the 20th century would have felt the same. Most medicine was awful before evidence-based-medicine started to appear simply because it involved doing various things to patients and not rigorously checking whether this was making them better or worse. I.e. there was no feedback loop.

12 - Peter Thiel is the more explicit in his thinking. He's built a large part of his success around the search for what he calls 'secrets', areas where the prevailing wisdom is wrong, around which successful businesses can be built. Nowhere is this more clearly summarised than his now famous interview question: "What important truth do very few people agree with you on?" In his words "This question sounds easy because it’s straightforward. Actually, it’s very hard to answer. It’s intellectually difficult because the knowledge that everyone is taught in school is by definition agreed upon. And it’s psychologically difficult because anyone trying to answer must say something she knows to be unpopular. Brilliant thinking is rare, but courage is in even shorter supply than genius."

If you're interested in his thinking on this idea then you might enjoy the rest of Zero to One, where he explains more of his thinking on the idea. Chapter 8 is particularly interesting, where he also describes why he thinks most people think the world is full of solved problems, with no secrets left to discover - an idea he calls 'flatness'. For a shorter intro to the idea try this article by David Perrell.

I think similar ideas run through Paul Graham's writing, although he's less explicit about them than Thiel. He writes particularly well about ideas that are overlooked in many fields, not just business, because they're in the shadow of heresies, or in areas where you're not meant to think that hard. Try this essay, and this, particularly the section beginning 'Great work tends to grow out of ideas that others have overlooked'

There are, of course, many other investors who share the same ideas but Thiel and Graham were some of the earliest and clearest in presenting them to a wide audience, not to mention some of the most successful in using them.

Thanks to Ruairidh Forgan, Jamie Strachan, Jules von Nehammer, Frank Hawes and Lydia Field for comments.

If you're interested in reading more essays like this I post new writing to my mailing list here