As the lockdown starts to ease in the UK, reflection on the nation’s response to the COVID-19 pandemic has begun in earnest. A particular focus has been on Imperial’s epidemiological model, originally created by Neil Ferguson, which was widely credited with convincing the government to implement lockdown measures after it forecast up to half a million COVID-19 deaths in the UK if a mitigation strategy was not implemented. Criticism of the model has been levelled by the press: particularly those with an editorial stance against strict lockdown measures and a love of the word ‘draconian’ (Daily Mail, the Telegraph).
What does seem to be clear is the Imperial model drastically over-estimated deaths, was originally intended for influenza, and was written in poor quality code. The code, written in C, was over 13 years old, undocumented and thousands of lines long. When it was made public on GitHub, there was an uproar from professional software engineers. The code was labelled “horrible” and “a buggy mess” on issues created for the repository (which also lacked unit tests). Software engineers were appalled code of its quality was being used to inform the national response to COVID. Some even went as far as to call for the retraction of all papers based on the model.
There were also concerns after external researchers said they were unable to reproduce the results described by Imperial. This further undermined faith in the model and was leapt on by the press. Recently, the scientific community has concluded these differences were due to differing computing environments and the results are now deemed to be reproducible. This is not likely to sway the opinion of people who read the many articles questioning if the model was reproducible. It is unlikely papers will write stories covering the model being shown to be reproducible in the same way they covered fears of the model not being reproducible.
In addition to poor code quality reducing trust in the model, it also inevitably made editing and contributing to the code difficult. Ferguson justified using a model intended for influenza predictions for COVID-19 predictions instead due to urgency at the time for models and the time which would be required to develop a COVID-19 specific model. Whilst this this a valid point, the poor code quality surely inhibited being able to quickly adapt this model for COVID-19. Contributions from GitHub, Microsoft, and individual developers have eventually enabled the model to become specialised for COVID but this has required substantial effort and time.
It would be easy to place the blame for the code quality squarely at the feet of Ferguson, but I think we should instead blame an academic culture which does not value code. In academia, we commonly do not see code as being anywhere near as important as the research paper the code is based on. Code in academia is commonly used simply to validate a model’s effectiveness. In many cases, code is still not made public at all. A current research interest of mine is joint models for longitudinal and time-to-event data. Despite numerous papers proposing implementations of such methods, it is possible to count on one hand the number of public packages available which implement joint models. I do not think it is surprising many academics spend little time ensuring high code quality. Documenting and refactoring code takes time and arguably receives little reward for the time required. Why go through this process when you can be working on a new paper instead which will earn you citations? Some may argue being able to produce code that others can easily use will increase the citation count of the paper the code compliments, but not citing software is already believed to be a problem rife in academia. I believe more needs to be done to promote writing high quality code in academia. The two propositions I’ve presented below may seem naive but hopefully they at least start a discussion.
It should become commonplace to incorporate software downloads as a measure of researcher’s output in addition to metrics based on citations. Software downloads for software stored on public repositories can be easily tracked and gives a good indicator of the impact an academic is making on the community.
Some argue poor code quality is inherent when people who are not software engineers write code. I therefore suggest that universities should hire software engineers to work with researchers to ensure code is well written, suitably tested, and adequately documented.
I think these approaches would at least go some way towards improving code quality in academia and result in more useful output. This would allow more methods to actually be utilised instead of simply being left in papers and supplementary material. And who knows, maybe your code would be used to influence government policy without raising vocal opposition.