In this post, Dr Sarah Chan and Dr Lee Baylis share the background of their CoDI Fringe show, which considers what happens when data takes on a life of its own…
Data is everywhere in the modern world. Almost every action, in our increasingly digitised environment, leaves behind data footprints that can potentially be observed, collected and used. But by whom, how and for what?
Health is one area in which data science promises transformative benefits. In Scotland, a world-leading hub for health data research, studies using linked health data have already shown tremendous value in, for example, understanding the public health benefits of anti-smoking policy. Combining health data with genomic information can help us further to understand the complex determinants of health and disease. At the same time, however, we may have concerns about who can access potentially sensitive information contained in our health records.
Should you worry about your data being used in research? Firstly, health data science in Scotland involves rigorous measures and security infrastructure to protect patients’ privacy and ensure good governance, minimising the risk of unauthorised access or misuse. Second, however, the nature of data-intensive research means that thinking about information just in terms of individuals, ‘your’ data and ‘my’ data, is misleading. Much data analysis relies on the processing of data, not at an individual level but as part of larger sets. ‘Big Data’ techniques use statistical analysis to look for patterns across an entire population dataset in order to build up pictures of groups as a whole. These techniques require large datasets to yield valid results. Thus ideally, to produce the greatest health benefits, research needs as many people as possible to participate.
Moreover, if your data is not included in the analysis, the results are less likely to be applicable to, and to benefit, you. This is a well-recognised phenomenon in medical research, where the under-representation of women, children and ethnic populations in clinical trials has led to the development of drugs that can be less effective (and in some cases more dangerous) for such patients.
So perhaps, rather than worrying only about our rights to withdraw our data from research, we should also be concerned about our rights to be included!
Another area in which data-intensive science can be applied is the emerging field of Learner Analytics. Universities are routinely able to access and collect data about students’ activity, for example the time they spend interacting with online learning resources, or the extent to which they use the library. In Learner Analytics, educational institutions review this activity in order to improve their understanding of teaching and learning and student behaviour. Questions they might ask include whether there is a better way to design or timetable courses; what are the trigger behaviours which indicate a student is struggling academically and may benefit from some help; and even whether there are student welfare issues such as depression which can be identified from a student’s activity data, and potentially addressed.
The idea that an institution should strive to be a responsible education provider and have a duty of care to their students is nothing new. The insight from Learner Analytics, though, is that hidden in the day-to-day data that institutions already log about their students is a wealth of extra information, just waiting to be unlocked via data analysis. Once an institution has access to this, there is so much more to be done from the perspectives of quality, retention and welfare.
Educators are beginning to realise that they cannot neglect the analysis of this data when it would be of such benefit to students. Students, meanwhile, have both an interest in participating where this research could benefit them personally, by enhancing their learning experience; and perhaps a moral obligation to support research that could benefit their peers.
But of course, we can also imagine more dubious uses of Learner Analytics and its insights. What if, rather than universities using this information to improve the student experience, it were used to control access to education, for example by determining which students should be accepted onto courses? In the area of health data, ethics and governance strive to ensure that research serves the public interest, while individual interests are protected. We should also be aware of the possibility of damaging or prejudicial uses of other types of data and act to prevent these.
The moral of the story? Data, like all things, has many uses, good and bad. Perhaps we should all be more aware of whom we are giving our data to and exactly what they propose to do with it. However, hopefully now you might think twice and realise that it’s not always true that a default stance of withholding your data is the best approach!
The University of Edinburgh is a pioneer in Learning Analytics and has its own department in the field. Dr. Lee Baylis works for Jisc’s “Effective learning analytics” project, and Dr. Sarah Chan is a Chancellor’s Fellow in Bioethics with the Usher Institute for Population Health Sciences and Informatics and Director of the Mason Institute for Medicine, Life Sciences and the Law.
You can hear Lee & Sarah discuss more ideas surrounding the use of data at ‘The Naked Blind Data Show,’ our Cabaret of Dangerous Ideas Fringe performance on Monday August 20th 13:30 at the Stand’s New Town Theatre, 96 George St. Book your tickets now!