Michael Mauboussin uses a very simple exercise to illustrate how difficult it is to statistically distinguish skill from luck. Investors are often deemed geniuses or fools based on small samples. Think of those hedge fund legends who got 2008 ‘right’, raised vast amounts of assets, and have subsequently got virtually everything wrong. Mauboussin uses a simple example to illustrate the point. What would the returns to a short series of coin tosses look like? If we relied on small samples the coin would initially behave as if it had a tendency to produce either heads or tails – only after a large number of tosses would it be clear that the probability of heads or tails with every toss was 0.5.
Mauboussin is illustrating the law of large numbers, discussed with relentless insight by Paul Samuelson.
When a colleague of mine presented Mauboussin’s observations, I was struck by a different interpretation: the inefficiency and misleading use of frequency-based statistics in finance. Without any ‘evidence’ I am willing to take a coin out of my pocket – or yours – and bet that the probability of heads or tails is 0.5. I’m willing to do this with no (statistical) evidence. And I don’t think anyone would take me on and bet otherwise. My view has been formed with a sample of none.
For certain problems, relying on statistical frequencies is highly inefficient and misleading. Worse still, the law of large numbers is often misused to analytical detriment. Applying this law to coin tosses produces the correct result because each observation is independent (the probability of any coin toss is unaffected by a prior toss) and the probability of heads or tails is the same with every toss. Accumulating a large series of coin tosses should replicate the true distribution. But the same conditions do not hold in most of the things we care about in finance. Using monthly rather than 5-yearly frequencies in order to increase the sample size is useless unless the portfolio materially changes every month. A manager could have a single bet on for the entire sample period. Certain properties of that portfolio may be apparent, but ‘skill’ will be empirically elusive.
Even though frequentist statistical methods are valid when applied to coin tosses, they are still inefficient. There are competing approaches. One could weigh the coin, measure its edge, carry out a series of scientific analyses to empirically confirm consistency with the properties necessary to ensure fairness, perhaps by checking if it is physically identical to a known fair coin. But the most efficient approach is purely deductive. Did anyone know we were going to be testing coins for fairness when I took the coin out of my pocket? This is highly unlikely if I had no prior knowledge that we would be discussing Mauboussin’s example. If I have selected the coin from my own pocket and am willing to bet on it being fair, I have no incentive to cheat. Relatively straightforward deduction is more efficient and accurate than a statistical series of repeated coin tosses.
Now these are relatively trivial examples of a recurring problem in statistical finance. I think it applies to beliefs about mean-reverting PE ratios (false), and style effects (‘the only style at Berkshire is ‘Smart’”), and Value-at-Risk (VAR) analysis. VAR is the prevailing approach to risk analysis in financial markets, which has not just survived the financial crisis, but thrived. For the uninitiated, VAR analysis is a statistical attempt to estimate the probability of a given percentage loss over a defined period, such as a day or a month. Most VAR models use daily frequencies and estimate probabilities for daily and monthly losses. Now virtually no investor, other than day traders, or the dying breed of bank prop trading desks and specialist hedge funds, would claim to have a one-day investment horizon. So why do they use VAR models with daily frequencies? It is a mis-application of the law of large numbers. Daily frequencies increase sample size – but volatility clusters so daily observations are not independent. The correlations which are assumed based on 6 or 12 months of data are not stable.
The question, ‘how much can you lose investing in the S&P500 over one month or one day?’ can be addressed very pragmatically. As can more important questions, such as ‘what is likely to determine sustained losses?’ Show me a time series which answers that.
Does any of this matter? Straightforward deductive reasoning has been neutered in an industry seduced by data and blinded by frequentist induction. How else can pension funds be making record purchases of gilts, when the entire universe of UK government bonds has negative real yields. Do we really need a long times series of historic returns and correlations to know how that will end?
Statisticians will be aware of the distinction between frequentism and Bayesianism. That’s not my point. Bayesian reasoning is often presented as a procedure which is particularly relevant to situations where there is a shortage of observations. A prior probability based on ‘reasonable expectation’ derived from other objective or subjective sources, can be adjusted in response to new data. I am arguing that frequency-based judgements are inefficient and misleading in certain contexts, and the true probability distribution, or a more accurate estimation, may be readily identifiable by other methods. For those of a deeper philosophical bent, Quine might reasonably dispute the entire basis of the distinction – but that’s another story.