Anonymising data using R – George Kinnear's Blog

I often work with student data for research purposes, and one of the first steps I take before doing any analyses is to remove identifying details.

Our students have a “University User Name” (UUN) which is an S followed by a 7-digit number (e.g. “S1234567”), which can be used to link different datasets together. I need to replace these identifiers with new IDs, so that the resulting datasets have no personal identifiers in them.

I’ve written a simple R script that can read in multiple .csv files, and replace the identifiers in a consistent way. It also produces a lookup table so that I can de-anonymise the data if needed. But the main thing is that it produces new versions of all the .csv files with all personal identifiers removed!

Code on GitHub

May 12, 2022

Anonymising data using R / George Kinnear's Blog by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Your name Your email address Please enter an email address you wish to be contacted on. Report description Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

Cancel

Leave a Reply Cancel reply

Report this page