Informatics and the CSE Data Security Plan
As you were reading Johanna’s email last month on the encryption of personal computing devices, you may have noticed that this advice is but one part of a College of Science and Engineering action plan on data security (this plan was included in Johanna’s message). There are several actions which Schools are required to perform as part of this plan. Johanna’s email, and the accompanying documentation to be found on computing.help is intended as the response to one of these actions, pointing out to School members the importance of the security of data and devices.
Another substantial requirement the plan places on Schools is to ‘produce initial registers of datasets, websites, servers and services, their primary locations and ownership, highlighting the most sensitive or vulnerable for special attention’ . To tackle this requirement, the College Computing Professionals Advisory Group (CCPAG) set up a sub-group, of which I am a member, to establish how best to create and populate this register and then establish the sensitivity of the data it contains. After much debate, this group has now come to the conclusion that there are two types of objects this register needs to track, datasets and services.
Datasets are the actual collections of related data, examples of which might be files on disks, text written on a piece of paper or entries in a database. Services are the ways in which this data is accessed, for instance a filesystem, a locked filing cabinet or a database front end. Datasets have characteristics relating to the type of data they contain, and services have characteristics relating to how access to the data is controlled. For example, a dataset might contain anonymised details of the eye colour of 10 people (this would not be considered to be terribly sensitive data) or it might contain the medical records of 10000 identifiable people (this would be very sensitive). Equally, a dataset might be accessed via an unauthenticated web form available world wide or it might only be accessible on a single machine, not attached to a network, kept in a locked room, and accessed via a hardware security device such as a fingerprint scanner. Obviously, if the medical records were accessible via the unauthenticated web site, this would be highlighted in the register as being very high risk and flagged for attention. If on the other hand, the eye colour database was accessible via the website, this might not be regarded as a cause for alarm since although the website is an insecure access method, the dataset itself does not contain terribly sensitive data. To look at the opposite example, if the medical records dataset was only accessible via the machine in the locked room, although the dataset itself is highly sensitive, the overall risk might still be assessed as being acceptable since access to the dataset was so tightly controlled.
Since datasets and services may have a many to many relationship (datasets may be accessed in more than one way and services may make use of more than one dataset), overall risk can only be assessed for a particular dataset/service combination. When datasets and services are added to the register, they are matched against a set of criteria establishing just how sensitive the data is (for datasets) and how secure access is (for services). These factors are then combined to assess the overall risk factor for each dataset/service combination. Examples of the criteria being using to establish whether a dataset contains medium or high risk data can be found in this document. Service characteristics include
• whether the service or data is hosted on a fully supported server or file store
• what the filestore is (eg. datastore, dropbox, AFS, an un‐supported device, etc …)
• what authentication to the service is used
• whether role‐based access controls to parts of the service are in place and managed appropriately
• what the total (approximate) number of users of the service is
• what the number of administrators with access to all data within the service is
• the ease with which bulk export access is enabled by the service
• whether there are any users external to the university, and what limits on access do they have
• likely end‐user patterns of usage, especially in respect of exports
As well as identifying the difference between a dataset and a service, the sub-group also identified two different classes of datasets and services. Datasets and services managed, maintained and curated by members of the administration and computing staff are class A. Computing and administration staff have been working together to identify class A datasets and services held by the School and this process is well advanced.
Datasets and services managed, maintained and curated by academic and research staff are defined as class B. Many more examples of these datasets and services exist so gathering and evaluating this data will be a considerable task, one which we hope to begin in Informatics soon after the start of the new year. Further information about how this is to be done will be circulated closer to the time. In the meantime, it would be very helpful if each and every one of you could start thinking about what data you own and which services you are responsible for now so that you have the information to hand when the time comes.