A project funded by the STFC Hartree Centre Discovery Accelerator predicts patient response to ailments like ulcerative colitis and Crohn’s disease treatment with 95% accuracy.

Artificial intelligence could soon help the more than 6 million1 people around the world who suffer from inflammatory bowel disease (IBD) choose the best drug for their condition. Research just published2 in PLOSone describes how an explainable AI pharmacogenomics workflow we developed accurately predicted how patients would respond — positively or poorly — to an IBD drug 95% of the time.

Chronic IBDs like ulcerative colitis and Crohn’s disease result from clinical, genetic, and environmental factors, such as diet and lifestyle. There is no one-size-fits-all treatment for IBD that is effective for all patients, even if they have the same symptoms. The process of choosing the best treatment for a patient is still trial-and-error between doctor and patient.

With the backing of the This work was funded by the UK’s Hartree National Centre for Digital Innovation (HNCDI), an ongoing collaboration between IBM Research and the STFC Hartree Centre. STFC Hartree Centre’s Discovery Accelerator, our teams at IBM Research in the UK, and Read the press release: REPROCELL co-author paper with IBM showcasing a novel ML driven precision medicine strategy for drug development. REPROCELL, a stem cell and fresh tissue research company , combined IBD patient data and explainable AI techniques to investigate drug responses. Our goal was to help take the guesswork out of finding the best drugs for IBD treatments. The resulting set of algorithms showed it was possible to unlock the black box of IBD data, and understand, predict, and explain how people suffering from IBD may respond to different drugs on the market, as well as in development.

Designing an explainable AI pharmacogenomics workflow

For our AI algorithms to generate explanations for their predictions, we needed IBD patient and drug data. REPROCELL provided tumor necrosis factor alpha (TNFα) release data from IBD patients’ fresh tissue samples, taken during preclinical drug candidate testing. TNFα measurements indicate levels of inflammation; the higher the TNFα level, the more inflammation — which translates to a worse drug response.

Our model used the data from patients with lower TNFα levels, in the presence or absence of a drug, to “learn” from positive responses. It then combined the IBD patients’ “Multi-omic” refers to a large dataset that combines a number of genetic markers, including DNA and RNA sequencing of a patient’s transcriptome, which identifies which genes are being used or read. multi-omic,demographic, and pharmacological data into an “explainable AI pharmacogenomics workflow,” highlighting which features were most-impactful in predicting the effectiveness of different IBD drugs.

Overcoming overfitting

To overcome our data’s “high dimensionality” problem of having tens of thousands of genomic features describing a small set of 25 patients, we used two approaches for feature selection prior to model training.

  • First, we reduced the dimensionality of our dataset from 33,590 features down to about 40 genetic, demographic, and medicinal features using statistical association tests.
  • Second, we used biological domain knowledge and literature to select genetic variations known to be associated with Crohn’s and ulcerative colitis as input features. This, together with the application of standard cross validation techniques, prevented overfitting.

Our results showed variations in drug effectiveness, based on TNFα levels, between patients who differed in demographic, medicinal, and genomic features. For example, we linked new genetic variations to patient responses after treatment with anti-inflammatory drug BIRB 796, the compound also known as Doramapimod.

Our model predicted the correct drug response — for better or worse — with an error rate of only 4.98% on unseen patients. These promising results put REPROCELL on the path toward their goal of eliminating the 72% of adverse drug reactions considered avoidable, which could help lower costs and reduce patient risk.

Our best model predicted the correct drug response with an error rate of only 4.98% on unseen patients.

Not only did our explainable AI accurately predict patient drug response, based on existing REPROCELL-provided data, but we used it to tell us why a patient might respond better or worse to certain treatments. For example, we identified which combination of genetic, medical, or demographic factors might make them respond in a particular way. We could see these predictive factors being developed into future biomarkers to help screen patients so they have a higher chance of getting the best drug for their IBD at the start of their treatment.

Our paper goes on to explain how the explainable AI workflow could be applied to predict other targets, and test other drugs or mechanisms of interest. Future work will involve the extension of this analysis to a larger set of patients to investigate the wider adoption of our approach and further showcase its impact.

Stay up to date with the latest news, research, and events from IBM Research on Twitter.


  1. Silangcruz K, Nishimura Y, Czech T, Kimura N, Hagiya H, Koyama T, Otsuka F., Impact of the World Inflammatory Bowel Disease Day and Crohn’s and Colitis Awareness Week on Population Interest Between 2016 and 2020: Google Trends Analysis. JMIR Infodemiology. 2021;1(1)
  2. Gardiner LJ, Carrieri AP, Bingham K, Macluskie G, Bunton D, et al. (2022) Combining explainable machine learning, demographic and multi-omic data to inform precision medicine strategies for inflammatory bowel disease. PLOS ONE 17(2): e0263248.


By Anna Paola Carrieri | Laura-Jayne Gardiner | Graeme Macluskie
Source IBM

Previous How IBM Quantum Is Advancing Physics With World-Leading Theory Research
Next Machine Learning Model Finds COVID-19 Risks For Cancer Patients