The precision of prediction

In March this year, I began a secondment to Information Services (one day-a-week) in which I shall be evaluating the impact of learning analytics data on students’ academic performance. The courses I am involved with are all online, part-time Masters programmes, delivered via bespoke virtual learning environments (http://www.essqchm.rcsed.ac.uk/). We enrol around 400 students annually which allows for robust data sets, some dating back to 2007. I plan to share my progress and reflections on the Learning Services Team blog over the next few months. To be, or not to be [making predictions]: that is the question… Recently, I’ve been performing regression analysis on student data from our oldest and largest Masters programme, with a view to developing predictive models of academic performance. Initially I wondered, of all the ‘digital breadcrumbs’ collected from student activities over the past seven years, which ones are most meaningful? However, as I have done more number-crunching and reading around this I’m now asking myself whether predictive analytics has a place in higher education. The answer, of course, depends on what you are trying to achieve.

Source: http://commons.wikimedia.org/wiki/File%3AFireResearch_007.jpg

Predictive analytics is as varied as it is ubiquitous; from computer models of fires designed in the 1980s by Walter Jones (above) for predicting smoke and toxic gas movement in buildings, to today’s supermarkets providing us with money-off coupons tailored to our lifestyle choices using data from loyalty cards, our everyday lives are influenced by predictive modelling. No less so in the educational arena. A Google search of “predictive analytics in higher education” reveals that there were just 1,130 hits between Jan-Dec 2004 compared to 67,100 hits in the past 12 months to July 2014. Most of the institutions endorsing the burgeoning use of predictive analytics have issues with high student drop-out. There are numerous examples of institutional efforts to increase student recruitment and retention, e.g. Signals at Purdue, USA, and the Open Academic Analytics Initiative led by Marist College, USA. The goal of these predictive models is to target those candidates most likely to apply to the university and then, once enrolled, to identify students deemed to be at risk of failing. These ‘at risk’ students are alerted to increase their engagement and undertake some form of remedial activity, such that their performance improves and they remain on-programme. Some institutional level pursuits hark to the commercial sector roots of analytics – improving operational efficiency – and are not without problems for actual students. Clearly there are dangers in pigeon-holing any student to a predicted trajectory of failure, not least the potential for self-fulfilling prophecy. Encouraging the strategic learner to simply jump though significant hoops to achieve a successful outcome without experiencing the breadth and depth of the intended learning model and acquiring key transferable skills along the way is also a hazard. Moreover, it would be foolish to rely exclusively on a predictive model to identify ‘at risk’ students. Just because a metric or variable isn’t of statistical significance in a regression analysis doesn’t mean that it is not academically significant, and therefore it should be displayed to the student. Equally, some things of key importance are not measured, such as those tacit things students are doing that we cannot monitor easily (e.g. peer communications outside of the learning environment). What is significant? If the purpose of predictive modelling is to improve student performance in individual courses, then it must deliver actionable information from the outset. Unfortunately, that just isn’t the case with data from the postgraduate programmes I’m involved in. For one course, the only significant metric associated with end-of-year score in multivariate regression analysis was exam performance. This is of little surprise given that the examination is a summative, heavily-weighted assessment taken at the end of the academic year. This also highlights the problem of statistical versus academic significance when informing students of their progress in relation to in-course formative assessment and levels of participation. The model I’m now implementing is one of descriptive analytics with simple linear regression to rank the various metrics, whereby a dashboard will enable students (and staff) to monitor their performance and compare it to peers in real-time, with the aim of improving their academic engagement and supporting them to take ownership of their learning and development. To create this dashboard one has to be mindful whether we are monitoring actual learning or simply activity, thus a range of metrics (aptitude data, utilisation of services and resources, student-tutor interactions, and virtual learning environment data) will be examined, not just a single number. We must look for real relationships, with relevant markers/proxies, and remember that each parameter we monitor in the pursuit of learning analytics is simply one piece of the jigsaw; it’s important to see the bigger picture, i.e. the students as people not mere numbers, and not adopt a “one size fits all” approach. I particularly like the Open University’s ethical framework for learning analytics, one of the principles being “Students are not wholly defined by their visible data or our interpretation of those data”. Descriptive and predictive analytics can’t themselves influence academic performance and successful outcomes; rather they can be used as a catalyst to identify ‘at risk’ students, and affect interventions by programme teams, in order to facilitate achievement of their true potential. Additionally, efforts should be made to monitor progress and tailor learning activities of all students in a cohort; we mustn’t fall into the trap of focussing our energies exclusively on poorly performing students. I will finish with this very salient quote:

“Data, by itself, does not mean anything and it depends on human interpretation and intervention. Analytics may provide a valuable insight into a student’s learning, but if teachers do not take actions to intervene, then it will not help to improve the academic performances.” Li Yuan, CETIS

… and I hope to discuss intervention strategies, and their effectiveness, in a future post.

Paula Smith, IS-TEL Secondee