How big data lets us see a little further into the unknown


It is Monday, and it is raining again in the south of France. But it was sunny yesterday. And it was also dry last Wednesday, although it then rained almost continuously from Thursday through to Saturday.

A small consolation is that these intervals of storm and sun have been accurately predicted several days in advance. That is why I am writing this column today rather than yesterday. The quality of weather forecasting has improved considerably.

The BBC has re-released the worst weather forecast in its history. In 1987 Michael Fish went on television to reassure viewers rumours of an imminent hurricane were unfounded. A few hours later the most severe winds in decades lifted roofs and felled trees all over Britain.

But such a blunder is much likely now. Short-term weather forecasting is one of the triumphs, perhaps the greatest triumph, of big data – the opportunity supercomputers provide to process data sets of unbelievable size and complexity. I understand that the latest machines can handle an exabyte of data, which is about 20m times the capacity of my Apple Mac. The British Meteorological Office claims that its three-day forecasts today are as accurate as its one-day forecasts were in the heyday of Mr Fish (which is perhaps not the most reassuring way of describing their improved performance).

It is still true, however, that accuracy declines rapidly as you look further ahead. There is a clear contrast between the ability of weather forecasters to give us a reasonably accurate description of today and tomorrow; and their continued inability to make good longer-term forecasts. The exceptional weather conditions of this winter were not anticipated.

Short-term weather forecasting is possible because most of the factors that determine tomorrow’s weather are, in a sense, already there. If you turn to the YouTube video of Mr Fish’s disastrous message, you can see on his charts the area of extreme low pressure that delivered that 1987 hurricane. The forecasters simply made a mistake in analysing the available data, and greater analytic capacity can make that mistake less likely. But when you look further ahead you encounter the intractable problem that, in non-linear systems, small changes in initial conditions can lead to cumulatively larger and larger changes in outcomes over time. In these circumstances imperfect knowledge may be no more useful than no knowledge at all.

Much the same is true in economics and business. What gross domestic product will be tomorrow is, like tomorrow’s rain or the 1987 hurricane, more or less already there: tomorrow’s output is already in production, tomorrow’s sales are already on the shelves, tomorrow’s business appointments already made. Big data will help us analyse this. We will know more accurately and more quickly what GDP is, we will be more successful in predicting output in the next quarter, and our estimates will be subject to fewer revisions.

Hedge fund managers will be able to achieve the privately profitable but socially useless goal of predicting accurately what the Office for National Statistics will announce before the ONS itself knows. Big data will give them access to information as comprehensive as the Monetary Policy Committee has when it fixes interest rates. But big data will not help them to know what the Monetary Policy Committee will decide. Or how US Treasury secretary Hank Paulson and Lehman Brothers’ chief Dick Fuld will react to the imminent prospect of the US bank’s collapse.

Big data can help us understand the past and the present but it can help us understand the future only to the extent that the future is, in some relevant way, contained in the present. That requires a constancy of underlying structure that is true of some physical processes but can never be true of a world that contains Hitler and Napoleon, Henry Ford and Steve Jobs; a world in which important decisions or discoveries are made by processes that are inherently unpredictable and not susceptible to quantitative description.

Nor can it be true of a world in which a battle can be lost for want of a nail – a world in which minor differences in problem specification can have discontinuous effects on the results which emerge. But, thanks to big data, I know that it will be sunny again tomorrow.

Print Friendly, PDF & Email