Using Administrative Data in a Clinical Trial

In this post, written for the Early Career Researchers Using Scottish Administrative Data (eCRUSADers) blog, Catriona Keerie, Senior Statistician within Edinburgh Clinical Trials Unit (ECTU) talks about her work within ECTU and her involvement on a rare Scottish trial that used administrative health data. She provides some great diagrams to help along the way, which I can tell you are essential if you want to understand the complicated structure of the data! Catriona also highlights some of the key challenges the team faced in terms of data access and use and offers her reflections on what they learned from the project which could help other trials like this one in the future.

Can you tell us a little about your role in ECTU? 

My role involves a variety of tasks – however, primarily my role is the statistical reporting of trials run from within ECTU. I typically have up to eight active trials throughout the year. My role varies on these – I am Trial Statistician for approximately half of them, and the ‘reporting’ statistician for the other half. When I have my reporting statistician hat on, I’m responsible for the statistical programming and generating the analysis and results.

How trials have you worked on that have involved using administrative data? 

Since I joined ECTU in 2014, I have worked on three trials using administrative data. Two of them used solely routine healthcare data and the third one is running currently, based on a blend of routine data plus data captured within the trial.

Is the use of administrative data in trials becoming more common over time?

The use of administrative data in the trials setting is definitely becoming more common since clinical trials are known to be expensive and time-consuming. The use of administrative healthcare data is viewed as a more efficient means of understanding the health of the population using readily available data. However, there is a trade-off in terms of the quality of the data being captured.

What was the High-STEACS trial?

High- Sensitivity Troponin in the Evaluation of patients with suspected Acute Coronary Syndrome (High-STEACS) was a step wedge, cluster- randomised control trial. In plain English this means…

It’s a relatively recent study design that’s increasingly being used to evaluate service delivery type interventions. The design involves crossover of clusters (usually hospitals or other healthcare settings) from control (standard care) to an alternative intervention until all the clusters are exposed to the intervention. This differs to traditional parallel studies where only half of the clusters will receive the intervention and the other half will receive the control. This diagram helps to demonstrate the difference in designs:

The population of interest were patients presenting in hospital with heart attack symptoms. The trial sought to test a new high-sensitivity cardiac troponin assay against the standard care contemporary assay. Specifically, to test if the new assay could detect heart attacks earlier and with a more accurate diagnosis.

How were patients enrolled into the trial and how does this differ from a standard trial?

Step wedge trials usually randomise at a cluster (hospital) level, rather than randomising patients individually, so this was the main difference to a standard trial. So patients were enrolled rather than randomised into the trial. Standard trials require patient consent before randomisation, but in this context, individual patient consent was not needed due to the randomisation being performed at hospital level. Appropriate approvals for consent were sought through the hospitals.

If patients presenting with heart attack symptoms at any of the hospitals were eligible for the trial (based on our pre-specified inclusion/exclusion criteria), then we had permission (at hospital level) to include them in the study and use their securely anonymised data.

How many patients were enrolled into the trial?

Approximately 48,000 patients were enrolled from 10 hospital sites in NHS Lothian (3 sites) and NHS Greater Glasgow and Clyde (7 sites), over a period of just under three years.

Which administrative data sets were used?

We used a total of 12 distinct data sources which were a combination of general administrative datasets and datasets more specific to our area of research from locally held electronic health care records. Prescribing data was obtained from the Prescribing Information System, also ECG data, plus general patient demographics. Trial-specific outcome data was obtained from the Scottish Morbidity Record (SMR01) and also from the register of deaths (National Records of Scotland).
All data were captured separately for each Health Board – there is currently no amalgamated data source which holds all data. Health Boards are the owners of their own data.

The main linking mechanism for these 12 data sources was the patient CHI (Community Health Index) number. To ensure patient anonymity, CHI numbers were securely encrypted prior to use.

How did you get approval for these data sets? How long did this approvals process take?

Approvals were required at a number of levels. We required ethics approval, approval to use patient data without consent and Health and Social Care approval (through the Privacy Approvals Committee, predecessor to the Public Benefit Privacy Panel). There were also health board specific approvals required for local data to be released. In addition, we required data supplier approval. Finally, approval was needed for the data to be hosted on the Safe Haven platform.

This process was long! This was ongoing throughout the duration of the trial. Although the data was being captured automatically via routine records, the final dataset wasn’t confirmed until relatively late on in the process due to complexities of mapping locally held healthcare records. One of the advantages of the national datasets is that they are the same across all health boards.

Where were the data sets stored?

Datasets from NHS Lothian and NHS GG&C were supplied separately in their own Safe Havens. The combined dataset was hosted on the NHS Lothian Safe haven space on the National Safe Haven analysis platform .

How did the linkage of the data sets happen?

The data sources from both health boards were combined and hosted on the National Safe Haven analysis platform. This wasn’t a straightforward process. Although we’d anticipated capturing exactly the same patient data across both health boards, the reality was quite different.

Data were captured in different formats with different variable names and different definitions. So there was an unexpected element of data cleaning required before the data could effectively be merged into one large analysis dataset.

The final linkage was done using the securely encrypted CHI number for each patient.

What do you see as the major benefits of using administrative data in this setting?

Use of administrative data in this context is a more efficient process – less resource spent on the administrative aspects of trial enrolment e.g. capturing demographic details such as age, sex, postcode or medical history.

Using administrative data also gave us the opportunity to research a large representative patient population in comparison to the setting of an RCT where a strict pre-specified population, not necessarily representative of the target population, are studied.

Overall, what were the major challenges of the study?

From the data side of things, ensuring the correct data was extracted was difficult. The diagram above is very over-simplified view of what happened! The reality of picking up the required variables from two separate health boards which capture data very differently was difficult.

Another challenging aspect was ensuring that a patient wasn’t enrolled more than once in the study. Patients can present in any hospital with heart attack symptoms more than once, so we needed to ensure they weren’t included in the study each time they came to hospital. This required a de-duplication algorithm using encrypted and de-identified patient data.

However, I think the biggest challenge was for those in the team tasked with obtaining the correct approvals. It was underestimated how complex this would be. While approval for the national datasets was straightforward and the eDRIS team were very helpful, processes for locally held data at the time of trial set up were not established. Legislation around patient data confidentiality was continually changing, so we were faced with keeping abreast of new legislation as time progressed. The safe haven networks are now more established and hopefully, the processes are more straight forward.

Is there anything you would do differently next time?

I think the data validation aspect of the trial is crucial. Ideally we would have had more time spent on this in order to ensure the data was as correct as possible. Involving the clinical team much sooner in this process would have helped – they have a really important role to play in terms of ensuring the data picked up makes sense from a clinical perspective.

For High-STEACS, the access to the data was highly restricted and did not include the clinical team. Many of the data discrepancies were only picked up at the final review stage once data and results had been released out of the Safe Haven area.

Working within the Safe Haven environment creates time lags on both sides of the process – data being imported into the Safe Haven and also results exported out at the end take time. We hadn’t considered this time lag when working to tight timelines.

Do you know if anyone is using the learning from this trial for future trials of this kind?

The High-STEACS trial was directly followed by the HiSTORIC trial, addressing similar research questions and using many of the same data sources. So we have been through the loop again which has made for a more streamlined process.
Other trials within ECTU are also making use of the learnings from High-STEACS, particularly from the governance and approvals side of things.


Thanks for sharing this with us Catriona! It is great to see that administrative data are being utilised alongside clinical trials in Scotland. It is also interesting to hear that despite being part of a trials unit like ECTU, the High-STEACS team still faced many of the same challenges that researchers and eCRUSADers have experienced when using administrative data for research. In particular, we can relate to the issues of permissions, timing and working within the Safe Haven environment. Overall, it seems that the timing issues were due to the use of the locally held data rather than using the national data.


Leave a Reply

Your email address will not be published. Required fields are marked *