### From the fat cats to long tails: when all is not normal

To choose appropriate models you need to understand both the maths and the business environment. Media industries and financial institutions have both been unsuccessful in marrying these two skills.

Like many people, I found James Joyce’s masterpiece, Ulysses, tough going. So I am full of admiration for George Zipf, who sat down 50 years ago to count Joyce’s words. The main reason for admiration, however, is not his accomplishment of a tedious task. What the professor discovered changes the way we think about business and finance, and illuminates both the commercial strategy of Amazon and the way banks messed up their risk models.

The most common word in the English language is “the”. Zipf’s thesis was that this word appears twice as often as the second most common word, “of”; and 10 times as often as the 10th most common word, “was”, and so on. We know now that Zipf was misled by the exceptional diversity of Joyce’s vocabulary, but the basic idea was right.

Those who are not very interested in Zipf’s, or any, critical study of Ulysses may prick up their ears when they learn that his analysis helps determine how many items a retailer should stock or what to expect when they buy derivative securities. His theory of the distribution of word use was an early example of what we call a power law. The language of other authors also follows a power law, but a different one. So does the incidence of earthquakes.

More than two centuries ago, mathematicians discovered that a small group of statistical distributions – the normal, or bell, curve is the most famous – had very wide application. These distributions are now used almost everywhere risk is quantified in business and finance – in quality and inventory control, in building investment portfolios, in calculating value at risk. Some people talk about “six sigma management” without realising that “sigma” is a parameter of the normal distribution.

But – and this is where Zipf’s contribution was important – we now know that this group of distributions is drawn from a larger family that includes power laws. However, the power law distributions, as Zipf showed, have different properties from those modellers have become used to.

In classical statistics, the extremes generally do not matter very much to the aggregate outcome. In many power law distributions, they do. Benoit Mandelbrot, the mathematical scientist, argues that security price movements follow a power law rather than the bell curve. That explains why it is so damaging to performance to be out of the market even for a few days, if they happen to be the wrong days. And, more relevant in current circumstances, why it can be so damaging to be in the market when things fall apart. “Fat tails” proved the downfall of fat cats. Extreme events, especially extreme adverse events, happen much more often than they are supposed to in the world of classical statistics.

Fat tails are rarely long tails, and vice versa. But power laws are relevant to both. If book sales are governed by a power law, then if 10 American books sell 1m copies in a year, and 400 sell more than 100,000, then about 16,000 titles will sell more than 10,000 copies. As with Zipf’s conjecture that the power law could be applied to Ulysses, this seems to be roughly true.

But in practice, the power law runs out in the “long tail”. The rule would predict there would be 640,000 books selling more than 1,000 copies. There are not, and for an obvious reason. Most titles that might sell 100,000 books get published but most titles that would only sell 1,000 do not. Chris Anderson’s acclaimed book, The Long Tail, explores the hypothesis that technological change in distribution has lowered the threshold at which publication becomes viable. This dynamic makes previously unsustainable strategies feasible.

Many distributions – such as book sales – are truncated. The long tail is docked. Companies that would have only a few thousand pounds of sales do not continue to exist: people who would have incomes below a certain level are supported by social benefits. To choose appropriate models you need to understand both the maths and the business environment. Media industries and financial institutions have both been unsuccessful in marrying these two skills.