Article

Sex, lies and pitfalls of overblown statistics

August 24, 2011

1625

A visit to the International Festival of Statistics in Dublin (yes, really) prompted me to offer advice to young scholars on the interpretation and use of economic data.

Always ask yourself the question: “where does that data come from?”. “Long distance rail travel in Britain is expected to increase by 96 per cent by 2043.” Note how the passive voice “is expected” avoids personal responsibility for this statement. Who expects this? And what is the basis of their expectation? For all I know, we might be using flying platforms in 2043, or be stranded at home by oil shortages: where did the authors of the prediction acquire their insight?

“On average, men think about sex every seven seconds.” How did the researchers find this out? Did they ask men how often they thought about sex, or when they last thought about sex (3½ seconds ago, on average)? Did they give their subjects a buzzer to press every time they thought about sex? How did they confirm the validity of the responses? Is it possible that someone just made this statement up, and that it has been repeated frequently and without attribution ever since? Many of the numbers I hear at business conferences have that provenance.

In more intellectual environments, the figures presented may be the product of serious analysis and calculation. Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting flat disbelief?”.

Beware explanations that are tautological: “gross domestic product is a measure of the income of the nation”, “movements of the consumer prices index reflect changes in the cost of a basket of commodities compiled by the Office for National Statistics”. Always probe descriptions – “GDP is not a measure of output, or of welfare” – that define what a statistic is not, rather than what it is. “These figures are not forecasts, and should not be relied on by prospective investors.” If they are not forecasts, then what are they, and if they are not to be relied on by prospective investors what purpose was intended in distributing the information to them?

Be careful of data defined by reference to other documents that you are expected not to have read. “These accounts have been compiled in accordance with generally accepted accounting principles”, or “these estimates are prepared in line with guidance given by HM Treasury and the Department of Transport”. Such statements are intended to give a false impression of authoritative endorsement. A data set compiled by a national statistics organisation or a respected international institution such as the Organisation for Economic Co-operation and Development or Eurostat will have been compiled conscientiously. That does not, however, imply that the numbers mean what the person using them thinks or asserts they mean.

When the data seem to point to an unexpected finding, always consider the possibility that the problem is a feature of the data, rather than a feature of the world. I recently saw a study of comparative productivity in financial services in which Italy came top and Britain and the US bottom. You might have thought alarm bells would ring, but no: the authors went on to comment that this divergence was serious because of the size of the financial services sectors of Britain and the US. A little thought might have directed the researchers’ attention to questions such as “what is meant by output of financial services?”.

But it is now easy to import data into a computer program without thought. The unwarranted precision of the projected growth in rail traffic – a 96 per cent increase, rather than a doubling – is a clue that the number was generated by a computer, not a skilled interpreter of evidence.

Statistics are only as valid as the sources from which they are drawn and the abilities of those who use them. When I discover something surprising in data, the most common explanation is that I made a mistake.