Algorithms and fairness: lessons from the class of 2020

In this most strange of years, the problems with A-Level and GCSE results may seem like just another short-term political crisis.

But the combination of big data and algorithms, and their potential discriminatory effects on individuals, gave us a powerful insight into one possible (dystopian) future. Algorithms are increasingly becoming part of our everyday lives, employed by many businesses and increasingly by governments. Used appropriately, they can improve decision-making and increase efficiency. But when they go wrong, they can have a profound adverse effect on individuals, as the class of 2020 has found out.

The A-Level and GCSE results problems affected hundreds of thousands of young people across the UK. When the coronavirus pandemic forced the closure of schools and the cancellation of exams, a new system was needed to allow students who would have been sitting their A-Levels or GCSEs to be graded. The authorities proposed collecting teacher assessments, which would then be moderated centrally to ensure a consistent approach and to prevent so-called ‘grade inflation’. An algorithm was developed which would amend the teacher assessments to ensure that the 2020 grades were broadly comparable with those of previous years, using information including the past performance of schools and colleges.

The algorithm appeared to work perfectly at this macro level, ensuring that broadly the same percentage of students received the top grades as in previous years. But it proved catastrophic for individual students, as around 40% of grades were lowered, and some individuals received grades substantially below their teacher assessments. This seemed to particularly affect high-achieving students in schools which had traditionally performed less well, heightening the appearance of unfairness.

In the face of overwhelming political pressure, the four governments across the UK all decided to revert to teacher assessments. Some of these problems were obvious with hindsight. Because schools had been shut since March, no one had been able to drop out or underperform against expectations, so the algorithm was always going to have to downgrade some students to compensate. And whilst this downgrading rightly reflected the fact that some students would underperform, it felt cruel and unfair to the actual individuals whose grades were lowered.

Before the governments changed their minds, several legal challenges to the grades allocated by the algorithm were launched. Data protection law, which was updated across Europe as recently as 2018, when the General Data Protection Regulation was introduced, contains specific provisions around automated decision-making and profiling. Article 22 of the GDPR provides individuals with a right not to be subject to decisions based solely on automated processing which produce legal effects or significantly affect them. This right is little known and rarely comes before the courts.

England’s exams regulator, Ofqual, argued that decisions about this year’s grades did not engage Article 22, because the decisions involved a human element and therefore were not ‘solely’ made by automated means. Many commentators have disputed this claim. It would have been interesting to see how the courts interpreted the right had the legal challenges proceeded. As automated decision-making becomes more prevalent, Article 22 challenges are likely to become commonplace.

More widely, data protection law requires organisations to process personal data fairly. The concept of fairness is often subjective and can be difficult to define. Nevertheless, it is hard to argue that downgrading an individual, not because of their own weaknesses but because of the past performance of the school they attend, meets this basic test of fairness. The algorithmic results may have been fair to the whole cohort, but they were deeply unfair to some individuals.

Again, we will never know whether a legal challenge under data protection law would have succeeded. Still, there is a lesson here for all organisations that use algorithms to make decisions about individuals. The decision-making must be fair at an individual level. There are parallels with another controversial and ever-growing technology, automated facial recognition software. Whilst such software has important uses, allegations persist that facial recognition performs poorly in respect of certain ethnic minority groups. This can lead to very significant individual unfairness which should not be overlooked.

In a business context, automated decision-making is beginning to be used more widely, especially in recruitment and selection. This creates enormous opportunities for business to improve their efficiency, make better hiring decisions and ultimately increase their profitability. But it comes with risks. Algorithms are not magic. They can only ever be as good as their design and the data that goes into them. Errors in either can lead to unexpected biases being exaggerated and result in more flawed decisions. A considerable amount of work went into getting the exams algorithm right. Still, ultimately it suffered from both a design bias, in that the goal of ensuring fairness at a cohort level led to unfairness at an individual level, and from a lack of robust data, which meant that schools with smaller class sizes appeared to benefit at the expense of larger centres.

Automated decision-making is undoubtedly here to stay, and algorithms are only likely to get more sophisticated. The 2020 exam results scandal doesn’t mean we should give up entirely on automated decision-making. But it should make all businesses pause to consider their fairness and the potential impact on individuals. Otherwise, they could face not only legal challenges but also significant reputational damage.