How Data Sharing Can Upgrade AI for Pharma
AI for drug discovery has enormous potential; to unlock it, we need to reset our mindset
Robin Röhm | | 5 min read | Opinion
It feels like we have been talking about the world-changing possibilities of using AI in drug discovery for decades. Yes, machine learning could potentially transform the way new medicines are developed – but it is fair to say that progress has been slow.
Discussions around AI typically focus on the technology and the need for more data to build the applications that can drive drug discovery forward. Although this is true up to a point, it is also misleading. To truly unlock the power of AI, we need to see a general shift in the collective mindset around data collaboration. The issue is not a shortage of data, but rather accessibility to the data that does exist. There are terabytes of high-quality data that could advance the application of AI in the drug discovery process. The problem is that much of this information is siloed in pharmaceutical companies or in clinical research organizations. In my view, we need more collaboration to accelerate the use of AI in the industry.
It goes without saying that the better and more diverse the data, the better the results – and the more efficient the entire drug discovery process becomes (with better outcomes for patients too). According to frequently-cited research from the Tufts Center for the Study of Drug Development,(1) it takes up to 15 years to develop a new medicine, and costs around $2.6 billion from initial discovery through to approval. Worse still, only a small fraction of the drugs in development ever make it to patients.
Research from McKinsey found that the efficiencies attainable from scaling the impact of advanced analytics were the equivalent of between 15 percent and 30 percent of EBITDA (a measure of profit) over five years.(2) This positive impact increased to 70 percent over the course of a decade thanks to “predictive modeling in discovering and optimizing new blockbuster therapies”.
Enhanced data analytics through greater data collaboration could have a dramatic effect at the preclinical stage. For example, instead of using animal models, researchers could use more accurate prediction models that combine molecular and clinical data. This approach would allow medicine makers to more accurately plan clinical studies and glean, at an earlier stage, whether trials might fail. To get there, we need collaboration between pharma companies that have already staged clinical trials – unfortunately, that’s when commercial rivalry becomes a serious problem.
Despite these challenges, we are seeing more collaboration across the industry. For example, the MELLODY project brought together 10 of the biggest pharmaceutical companies to “enhance predictive machine learning models on distributed data in a privacy-preserving way”, while the Pistoia Alliance has been advocating for greater collaboration since 2009. The alliance, which has more than 100 members including Pfizer, AstraZeneca and GlaxoSmithKline, has created a Centre of Excellence in AI to help the pharmaceutical industry overcome the obstacles such as access to data and skills as they continue to incorporate AI into their businesses.
However, these examples are still relatively isolated. Why? Failure to collaborate is typically the result of fears over privacy or the sharing of commercially sensitive data with rivals, but the cost and time commitment required to share data across multiple jurisdictions can also be a factor.
Federated data platforms for the creation of collaborative data ecosystems would enable multiple organizations to share and extract value from each other’s decentralized datasets in a way that helps them overcome regulatory, technical, and commercial challenges. In a healthcare context, this means organizations can safely work using each other’s data – including sensitive information from patients – without it ever leaving its secure environment. For example, take a pharma company in partnership with a genomic laboratory that uses data from all over a certain jurisdiction. Much of this data can’t be shared because of patient privacy laws. However, use of a federated data platform would mean that the data in question never leaves its own secure server, and thus the two partners could develop models to better identify targets for new medicines. In the next stage on the value chain – lead generation – collaboration enables researchers to better predict target structure and binding using external data and proprietary models to complement their existing libraries, reducing the extent of trial and error and, therefore, cutting time and cost.
Access to this data – whether sourced from pharma companies, genomic laboratories, or hospitals – allows filtration and prioritization. Advanced analytics are necessary to recognize patterns and identify the information contained in the various data points, but to properly understand what matters we need varied and high quality datasets. That’s where multiple data sources are advantageous.
Participating in a data ecosystem enables companies to work on more data than they have in-house, which supports their AI and analytics programs. And by using federated data platforms, it is possible to collaborate in a controlled way, opening up possibilities for commercial rivals and organizations to work together, while ensuring that the owner of the data has granular control of who has access and for what purpose. Data collaboration during clinical trials can help pharma companies to identify the most suitable patients and understand where on the globe it will be most effective to test. Such clinical operations data can help businesses understand their patients better and plan accordingly.
In short, enhanced access to data will drive AI analytics development, improving the drug discovery process and building a better healthcare ecosystem for all.