Data, algorithms and the 2020 SQA national assessment results
Prof. Chris Dent on the issues around award of marks for national high school exams in the UK under Covid.
This article gives my perspective on the issues around award of marks for national high school exams in the UK under Covid. It is directly about the situation in Scotland (where I live and work, and where results are already published), but will also have relevance to the other home nations. While not a specialist in education data (though I work in a university), I have observed and been involved in a wide range of uses of data for decision support in practical circumstances, through my main role as Professor of Industrial Mathematics, and as Director of the Statistics Consulting Unit at the University of Edinburgh and a Turing Fellow at the Alan Turing Institute.
My intent is not to present a complete logical picture – I do not have access to enough information for this, and it is probably too early for anyone to put forward definitive views – but rather to provide a series of thoughts on the present replacement assessments under Covid which I hope people will find useful. I also hope that readers will not regard any of this as being pro-pupils or anti-pupils (much political and media comment has been based on such a dichotomy), my intention being to comment from a balanced perspective.
So here goes…
1. Timeline and available information
The SQA and Scottish Government were in a horrendously difficult situation. They had limited information at their disposal and a tight time schedule, plus much of the data they did have came with considerable uncertainty or bias. Thus, there were almost certainly no solutions that would have met with universal acclaim, and possibly no solutions which would have been free from major controversy.
2. What is being estimated?
In some sense we wish to assess candidates’ underlying ability in a subject – but we cannot do that directly, not least as underlying ability is not a precisely defined concept. Conventionally we use the national assessment systems as a proxy for underlying ability. It seems reasonable to think that there will be considerable uncertainty ahead of time in some candidates’ exam performance due to the topics which come up, or simply how the candidates perform on the day. Sitting in the background, of course, is the truism that sitting exams and performing related tasks in the real world are not the same thing.
3. Estimated Grades
One advantage of estimated grades over actual exam-based grades is that the former avoid the uncertainty around performance on the day, but on the other hand they introduce uncertainty arising from centres having different levels of optimism, and possibly also from different teachers within centres also having varying levels of optimism. Making the situation still more difficult and delicate, there seems to be evidence that the degree of optimism bias is on average greater at centres in less advantaged geographical areas, though of course there will be much variability about this trend. There is also variance between centres in practices around prelims (Scotland) and mocks (England), and thus while results from these might on first thought seem like hard evidence, comparison between centres is not straightforward.
4. Significance of the design of algorithm
I think the importance of the detail of the algorithm used by SQA can be overstated. Any such algorithm is likely to produce poor results on a minority of cases which are in some sense difficult – data driven algorithms can turn in to random number generators for inputs where there is little similar data on which to train, and it is very hard to cover all possibilities in manually designed scaling algorithms such as that used. Thus exception handling (identifying and dealing with cases where the algorithm performs poorly) is vital.
5. Exception handling
In SQA’s original plan, exception handling was to be dealt with through the appeals process, in which evidence such as coursework would be considered. I am not sure why it was necessary to wait for appeals to bring in this evidence for individual candidates or centres where there had been substantial rescaling – it would be obvious that appeals would come, so some form of checking of difficult cases using additional evidence might have been implemented before issue of results.
N.B - After I thought of this point, I discovered that Lindsay Paterson, Professor of Education Policy at the University of Edinburgh, had made a similar point as reported here
6. The interests of candidates
This is more nuanced than has been acknowledged in much of the media and political discussion. Clearly candidates who have university or other offers this year based on their grades will be more likely to meet the offers if grades are overall higher, as the official grades are being compared with offers previously made. However, “users” of the grades such as employers and universities will adjust their behaviour in the future – unless they believe that the 2020 cohort is exceptionally strong across the country, and that this drove substantially higher attainment. Employers might place less reliance on grades in recruitment processes – if there is a move to them setting their own performance tests, that might be beneficial. Universities might change their offers in response to a cohort with overall higher grades – and moreover it is not necessarily beneficial for someone to be enrolled on a course at one institution on which they might struggle and maybe not graduate, versus a course at a different institution which could suit them better.
7. Choice of methodology
In SQA’s methodology report, there is a specific question around why they rejected regression modelling as an approach (“A further limitation of multiple linear regression is that it cannot account for systematic over- or under-estimation and/or bias at a centre level.” on page 15.) The quoted statement is not supported by the information they provide, as given the right data regression could do what they say it cannot – it seems to me that this statement could only be justified if there are other issues not mentioned in that section of the report.
Finally to conclude… or rather to explain why I do not have a grand conclusion saying what I think would have been the only correct way to proceed, as is often the case with comment pieces. I regard my role in decision support analysis as being to identify with decision owners what the goals are and what issues matter, and then to provide an assessment of how well different options meet these needs, possibly also being involved in design of options. It is certainly not my role in this situation to decide what matters – so I hope, in that spirit, readers will find this article helpful.
Acknowledgments: I have benefited from reading the Scottish Qualification Agency’s own methodology report, and comment from the Royal Statistical Society and Guy Nason, as well as various discussions with colleagues and reports in the general media.
Chris Dent is Professor of Industrial Mathematics; Director of the Statistical Consulting Unit at the University of Edinburgh; and a Turing Fellow at the Alan Turing Institute