Data Retention in Learn Ultra

Background
There has been a shift in priorities relating to data and storage of end user data over the last decade particularly in relation to services hosted by DLAM, one of the constants has always been around security, as we move forward there is an increasing expectation that we should retain the minimum required amount of data which our services and end users require, this is in line with the rights of our users as per GDPR and allows us to make our services more efficient. Our services have been a victim of their own success as we have had significant growth in use of Blackboard Learn year on year since its adoption at the University in 2012. Over this period of time, we have accumulated data, primarily Learn user accounts and Learn courses, which have aged out but not been completely deleted from our systems. The work to remove this data will bring the following benefits to the institution:
- Lower licence costs
- Less data in AWS
- Compliance & end user expectations being met
- Ability to focus resources on new features and tools
- Simplified administration
- Reduced carbon footprint
What we did
In 2024, we began the process by tackling older courses in Learn, a first step was to review and update our data retention policy (DRP) for the service, and then by carrying out manual deletions using the Learn database to generate feed files, and then uploading these to Learn via the built in Student Information System integration. In consultation with the wider service team and end users, we agreed on a 5 year retention schedule for EUCLID courses in Learn, with the added security that no course can be deleted if it contains an active student enrolment. In addition, any courses marked for deletion would generate an email to all course organisers, course secretaries and instructors to ensure that courses are not removed without prior warning.
Impact so far
Before course deletions began in 2024, there was a total of 91,156 courses in our live instance of Learn. We initially identified 28,864 courses eligible for deletion, however carried out a further deletion later in the year which identified a further 11,459 courses for deletion, bringing the total number of courses removed to over 40,000. Removing a large number of courses over the summer allowed us to look more closely at the remaining data, this is when we identified a number of courses which were skipped from deletion due to demo accounts using the student role and were able to remove these courses too.
Prior to any deletions taken place, the flagged courses were backed up and transferred to AWS storage, this gave us a longer backup duration than that offered within Learn itself and allowed us to anticipate any potential issues, given that this was the first and a particularly large removal of data from Learn involving several years worth of courses. Going forward, we will soft delete courses for a period of 3 months at a time, this will allow us to streamline the process and negate the need to purchase external storage for backups.
Future Work
Going forward we will analyse the data still remaining in Learn to see if further efficiencies can be made, examples include reducing the number of system logs held within Learn and targeting data types which use a particularly large amount of storage, for example videos and multimedia files. This can be partially achieved by reinforcing best practice and prompting use of integrations to external storage such as Media Hopper Create for videos.
We are currently developing a tool which will allow courses to be deleted automatically once a year, this will also include a bulk emailer which will ensure that instructors on courses marked to be deleted receive an email 4 weeks before with a list of their courses to be removed. The tool will also remove user accounts in Learn which belong to users who have left the University and have been in a disabled state for at least 12 months.
By running these processes annually will ensure that we do not accumulate redundant data at the same rate moving forward.