Reading Time: 3 minutes
Alternatively titled : Why am I Being Given Money for the Next 3.5 Years?

Welcome to my blog, PREdiCCting IBD Outcomes, and thank you for finding this site! I am a PhD student and a biomedical data scientist based at the MRC Human Genetics Unit at the University of Edinburgh. I am going to be building statistical models which predict when a person with inflammatory bowel disease (IBD) is going to have a flare. This will improve our understanding of Crohn’s disease, ulcerative colitis, and inflammatory bowel disease unclassified (IBDU).

I have a long way to go first, however. I am incredibly fortunate to have joined an ongoing study, PREdiCCt. This meant I did not have to worry about starting a study: writing ethics proposals or considering many of the logistics involved in setting up a study. It did mean I had an awful lot of information thrown at me as soon as I started, however. PREdiCCt is a very large UK-wide study with involvement from 42 hospitals, the Wellcome Sanger Institute (sequencing), the University of Aberdeen (dietary expertise), and the University of Edinburgh.  As such, getting my head around the complexities of PREdiCCt has not been an easy task.

PREdiCCt is recruiting 3100 people with IBD who are in remission (1550 with Crohn’s, 1550 with colitis or IBDU). Shortly after recruitment, participants are asked to complete a questionnaire which informs us of the participant’s history, and forms the baseline: how the participant feels when they are in remission. Participants also provide stool and saliva samples which are sequenced so we can investigate genetics and metabolites. Participants are then asked to complete short monthly questionnaires which allow us to follow their progress. If a participant reports a flare, they are asked to provide another stool sample. Participants are followed across the course of two years. If you would like to find out more about PREdiCCt, please click on the About PREdiCCt tab at the top of this blog or visit the PREdiCCt website.

Having some of the data available almost immediately has been a huge boon. I have spent most of the first couple of weeks of my PhD understanding the data, and writing scripts (I primarily use the statistical programming language R). These scripts should become more useful once I have more data available. I have also been investigating where data is currently missing to assist the brilliant Spyros Siakavellas who is currently chasing up missing data and working on improving how data gets from various sources to our central databases. I have been experimenting with replacing the missing data with predicted values using a mixture of machine learning and classical statistical techniques.

My next goal is to conduct a review of past attempts to predict IBD flare. Ultimately, researchers stand on the shoulders of giants and it would not be wise to conduct research without knowing what work has been undertaken previously with the same goal. As an added bonus, this review will likely be included in my thesis: meaning I will already be chipping away at that daunting document! If you are interested in reading about my findings, please revisit this site in two weeks to read my next post. You can follow me on Twitter to be reminded when I post, or subscribe to my mailing list at the bottom of this page to get emailed when I post.

At present, I am still awaiting the data from the monthly follow-up questionnaires. Even without this data, it is expected there will be some interesting results from the data I do have: the baseline questionnaires completed by a participant when they join the study. There are many claims and beliefs about IBD, and this data should allow us to validate or dispute these claims.

However, the main focus of my project is building statistical models which will predict when an IBD patient will have a flare, and I should be able to create such models once I start getting the data from the monthly follow up questionnaires. I will work on making these models as accurate as possible whilst also finding which variables are most important for improving accuracy in my models. At a later date, I will get the results of the sequencing and can incorporate genetics and metabolites into my models. This should lead to an increased understanding in the role genetics plays in the progression of IBD.

The final goal of my project is to translate my research into medical interventions in collaboration with the Lees research group. This is certainly my most ambitious goal, but it would be a remarkable product of my PhD studies.

That is everything I have done thus far, and my expectations for the future of my project, in a nutshell. Of course, as always, “the best-laid plans of mice and men often go awry” and I am sure the plan for my project will change considerably due to unexpected factors. If you are interested in hearing about my research as I do it (no matter how according to plan it goes!) please do revisit this blog in two weeks (or subscribe for email alerts/ follow me on Twitter for reminders).

Share