Uncategorized

How big data lets us see a little further into the unknown

It is Monday, and it is raining again in the south of France.  But it was sunny yesterday.  And it was also dry last Wednesday, although it  then rained almost continuously from Thursday through Saturday.

A small consolation is that these intervals of storm and sun have been accurately predicted several days in advance.  That is why I am writing this column today rather than yesterday.  The quality of weather forecasting has improved considerably.  

The BBC has recently re-released the worst weather forecast in its history. Michael Fish went on television in 1987 to reassure viewers that rumours of an imminent hurricane were unfounded.  A few hours later the most severe winds in decades lifted roofs and felled trees all over Britain.

But such a blunder is much less likely now.  Short term weather forecasting is one of the triumphs – perhaps the greatest triumph – of ‘big data’ – the opportunity which modern supercomputers provide to process data sets of unbelievable size and complexity.  I understand that the latest machines can handle an Exabyte of data, which is about 20 billion times the capacity of my Apple Mac.  The British Meteorological Office claims that its three day forecasts today are as accurate as its one day forecasts were in the heyday of Michael Fish (which is perhaps not the most reassuring way of describing their improved performance).

But it is still true that forecasting accuracy declines rapidly as you look further ahead.  There is a clear contrast between the ability of weather forecasters to give us a reasonably accurate description of today and tomorrow and their continued inability to make good longer term forecasts.  The exceptional weather conditions of this winter were not anticipated.

Short term weather forecasting is possible because most of the factors that determine tomorrow’s weather are, in a sense, already there.  If you turn to the YouTube video of Michael Fish’s disastrous message, you can see on his charts the area of extreme low pressure that delivered that 1987 hurricane.  The forecasters simply made a mistake in analysing the available data, and much more detailed information about the impending storm and greater analytic capacity can make that mistake less likely.  But when you look further ahead, you encounter the intractable problem that in non-linear systems, small changes in initial conditions can lead to cumulatively larger and larger changes in outcomes over time.  In these circumstances imperfect knowledge may be no more useful than no knowledge at all.

Much the same is true in economics and business.  What GDP will be tomorrow is, like tomorrow’s rain or the 1987 hurricane, more or less already there:  tomorrow’s output is already in production, tomorrow’s sales are already on the shelves, tomorrow’s business appointments already made.  Big data will help us analyse this.  We will know more accurately and more quickly what GDP is, we will be more successful in predicting output in the next quarter, and our estimates will be subject to fewer revisions.  

Hedge fund managers will be able to achieve the privately profitable but socially useless goal of predicting accurately what the Office for National Statistics will announce before the Official for National Statistics itself knows.  Big data will give them access to information as comprehensive as the Monetary Policy Committee has when it fixes interest rates.  But big data will not help them to know what the Monetary Policy Committee will decide.  Or how Dick Fuld and Hank Paulson will react to the imminent prospect of Lehman’s bankruptcy.

Big data can help us understand the past and the present, but it can help us understand the future only to the extent that the future is, in some relevant way, contained in the present.  That requires a stationarity of underlying structure which is true of some physical processes but can never  be true of a world that contains Hitler and Napoleon, Henry Ford and Steve Jobs –  a world in which important decisions or discoveries are made by processes that are inherently unpredictable and not susceptible to quantitative description.  Nor can it be true of a world in which a battle can be lost for want of a nail –   a world in which minor differences in problem specification can have discontinuous effects on the results which emerge.  But, thanks to big data, I know that it will be sunny again tomorrow.