The use of algorithms to award examination grades has led to chaos. Readers of the chapter on Probability and the Law in Radical Uncertainty, the recent book by Mervyn King and myself, would have understood why. Justice and statistics do not mix easily – either in the courts or at the examination board.

The standard of criminal proof is demanding because the injustice from punishing the innocent is perceived to be greater than the injustice of freeing the guilty. And the injustice of denying someone a deserved university place more serious than the injustice of admitting someone who should not really be there. We might all agree that in judging guilt it is more important to minimise the statistician’s type II error – accepting the hypothesis when it is false – even at the expense of the type I error – rejecting the hypothesis when it is true. Medical testing similarly distinguishes the sensitivity (avoidance of false negatives) and specificity (vulnerability to false positives) of its procedures, and attempts to measure them.

The injustice of convicting the innocent is greater, but how much greater? Some lawyers talk of ‘Blackstone’s ratio’; the great jurist wrote ‘better that ten guilty persons escape than that one innocent person suffer’. But why should Blackstone’s ratio be ten to one? Rather than five to one as proposed by another great jurist, Sir Matthew Hale, or one hundred to one, as favoured by Benjamin Franklin? The French scholar Condorcet concluded that the probability of an innocent man being convicted should be exactly one in 144,768 and he believed that this ratio could be achieved if guilt were determined by a panel of thirty judges of whom twenty-three must vote in favour. (Condorcet himself committed suicide to escape the less considered but hardly less arbitrary justice of the revolutionary guillotine.)

The ‘rodeo problem’ set by Oxford philosopher Jonathan Cohen in 1977, and the similar ‘blue bus’ problem described by the American legal scholar Laurence Tribe, illustrate why courts can never base decisions on algorithms alone.[1] There are 1,000 seats at the rodeo and 499 tickets are sold. But there is a hole in the fence and the arena is full. The rodeo organiser sues each of the 1,000 attendees and wins every case on the balance of probabilities. Most attendees did not buy a ticket.

But no court would make such a ruling, and we imagine few people believe that it should – though legal scholars have debated the issue for forty years.[2] As Cohen observed, ‘the advancement of truth in the long run is not necessarily the same thing as the dispensation of justice in each individual case. It bears hard on an individual like the non-gatecrasher at the rodeoif he has to lose his own particular suit in order to maintain a stochastic probability of success for the system as a whole. So if the system exists for the benefit of individual citizens, and not vice versa, the … argument fails’.[3] Tribe makes a similar point: ‘tolerating a system in which perhaps one innocent man in a hundred is erroneously convicted despite each jury’s attempt to make as few mistakes as possibleis in this respect vastly different from instructing a jury to aim at a 1% rate (or even a 0.1%rate) of mistaken convictions’.[4] And we might give exactly the same advice to an examination board or university admissions office.

The issue identified by Cohen and Tribe has wider significance than its – important – application to the judicial process. “Statistical discrimination” is the term used to describe the practice of judging people by reference to the overall characteristics of the group to which they belong. For example, the once common practice of redlining – charging more for services, such as credit, to people who live in a particular area without regard to their own specific credit history – was outlawed in the United States by the Community Reinvestment Act of 1977.

Injustice to individuals is inherent in any application of statistical discrimination. Even if it is true that the redlined district displays higher rates of default than the general population, some, perhaps many, individuals who live there could be relied on to pay their debts. Moreover redlining certainly had the effect, and may have had the intention, of discriminating against African-Americans. Statistical discrimination may in practice be a mechanism for indirectly implementing policies which if instituted openly would be illegal or otherwise unacceptable. We may wish the police to be more effective in clearing up crimes, but we don’t want them to do this by “rounding up the usual suspects”.[5] A civilised society treats people as individuals, not as drawings from a statistical distribution.[6]

The availability of big data, which enable us to learn far more about correlations – although not necessarily about causation – creates new opportunities for statistical discrimination and new dangers from its use.[7] Related issues arise from the development of machine learning – computers trained on historic data will develop algorithms reflecting past patterns of selection which may no longer be either appropriate or acceptable.[8] Even if the explicit use of information such as gender or race is prohibited, the algorithms may have that consequence – without conscious intention for such discrimination on the part of anyone at all.

And yet it would be impossible to abandon statistical discrimination. Employers need to select job candidates from hundreds of applications; universities choose students from thousands who would wish to attend. They identify CVs on the basis of criteria that have in the past been correlated with success. Employers look for relevant experience, universities for high examination grades. Profiling and stop and search techniques in policing have undesirable consequences, but no one could reasonably quarrel with the need to focus police resources on locations where crimes are likely to be committed.

I recall the complaint – made in a university of course – that it was ‘discriminatory’ to advertise for a qualified accountant. It was not too difficult to secure agreement with the proposition that people with an accounting qualification were more likely to have the relevant skills for an accounting position than people who did not, even though some qualified accountants are incompetent and there are people without accounting qualifications who are nevertheless knowledgeable about accounting. Most of us prefer to consult a qualified doctor rather than interview a random selection of individuals to assess their medical knowledge, and the complaint that this discriminates against shamans, quacks, anti-vaxxers  and witch doctors, although true, is not persuasive.

In fact such discrimination is the point. We benefit from prior selection of medical advisers by people who are better qualified to administer a test of professional knowledge and competence than we are, although that begs the question of how the people who administer these tests are in turn selected. Discrimination is unavoidable – discrimination is the essential purpose of university admission procedures – and the issue is how to prevent inappropriate discrimination. But what kinds of discrimination are and are not appropriate is a controversial issue and views may change over time – and plainly have.

So it proved harder to deal with the argument that people with accounting qualifications are not representative of the population as a whole in respect of gender, ethnicity, age and other characteristics. They are not, and there is a history, now mostly behind us, of inappropriate discrimination in the selection of those who were trained to become qualified accountants. But the omission of what should have been an uncontroversial requirement for an accounting position would have wasted the time of those making the appointment and of many unsuccessful applicants. And an intention to exclude minority groups was not remotely in the minds of those who drafted the advertisement.

There is little alternative to a pragmatic approach which evaluates cases on their merits. Statistical information and probabilistic reasoning will often give guidance to those merits, though they do not absolve us from the overriding requirement to ask, ‘what is going on here?’ And that enquiry tells us where the finger of blame for the current fiasco should be pointed. Statistical reasoning and artificial intelligence are useful tools, but can never achieve individualised justice. Sensitivity to context is what students, not unreasonably, expect. And is the type of judgment which politicians rather than statisticians or computers are deputed to make.

[1] Cohen (1977) pp. 74-81 and Tribe (1971).

[2] For a survey of the response of legal scholars to the rodeo problem see Nunn (2015).

[3] Cohen (1977) p. 120.

[4] Tribe (1971), p. 1374, footnote 143.

[5] Captain Renault in Casablanca (1942).

[6] There is a large and growing literature on statistical discrimination as applied to legal issues – for example, Harcourt (2007) and Monahan (2006).

[7] The use of statistical discrimination to infer the likelihood of a convicted person committing another crime – both to incarcerate high-risk and release low-risk persons – has expanded significantly, especially in the United States where risk assessment is used in sentencing and parole decisions.  See Monahan (2006), Monahan and Skeem (2016), and the tools produced by the Laura and John Arnold Foundation (2016).

[8] See, for example, O’Neil (2016) and Noble (2018).