Why You Should Care More About Bioinformatics
Applied correctly, data can do almost anything
Angus Stewart | | 6 min read | Interview
Dan Elgort has worked at the crossroads of biology and computer science for his entire career – even as far back as his academic studies, where he focused on applying computational science to medical imaging technology. A first job with Philips Research Labs eventually led him to becoming the head of a healthcare data and analytics team, largely focused on oncology. In January 2022, Elgort became Chief Data and Analytics Officer for M2GEN.
We sat down with Dr. Elgort to discuss the basics of bioinformatics in pharma – and why it matters to personalized medicine, cancer therapy, and data security.
Could you give us a quick bioinformatics 101?
In short, bioinformatics is the use of computational techniques to collect, store, organize, and analyze biological and healthcare data. Perhaps the most common and recognizable form of biological data is derived from DNA sequencing. Most people reading this will know that DNA consists of four bases, abbreviated as A, C, G, and T. Since the completion of the Human Genome Project in 2003, laboratory scientists have been able to generate sequences with hundreds of billions of As, Cs, Gs, and Ts.
Though genomic and associated biomolecular data provided a starting point for bioinformatics, the incorporation of healthcare data has greatly expanded the overall potential of the field. Medical informatics includes patient records, healthcare claims, clinical outcomes, and so on. By integrating this with biological data, we can reveal key insights to inform patient care and invent new treatments.
Over time, as technology has improved, so too has our ability to collect, store and analyze data, which has led to the creation of significantly larger datasets, which in turn allow researchers to ask increasingly sophisticated questions. We are at something of an inflection point and must seriously consider the infrastructure supporting our datasets; how do we use modern technology to scale the methods and technologies we need to deploy in this context?
One major challenge in the field is standardization. With the exponential growth of bioinformatics data and analytics, a sophisticated and rigorous approach to organizing this data is absolutely essential. Without a standardized approach, we will struggle to leverage and scale datasets for meaningful analysis and decision making. The bioinformatics field, as a whole, needs to develop rigorous definitions of medical/healthcare data elements.
What does bioinformatics look like when applied to personalized medicine?
Personalized medicine is the creation of unique therapy recommendations based on the available data from the individual patient – made in the context of data collected across large patient populations.
For personalized medicine to work, you need access to a very large dataset composed of patients that span a diverse set of profiles and cohorts. Only after the acquisition of considerable volumes of data can you start to tease out the patterns and structures that can be applied at the individual level.
At that point, you can ask interesting questions; for example, what characteristics distinguish two patients that experienced different outcomes, but had similar diagnoses and were exposed to the same kind of care?
To answer that question, you often need to bring in genetic data. It has been well established that many forms of cancer are driven by genetic mutations that lead to cellular overgrowth and tumor development. By collecting genetic data from cancer patients, we can map the causative mutations for tumor growth. As we collect more data from patients, our confidence in this mapping process increases. This process is ongoing and continuously informed by new data.
Ultimately, as we become more informed – and deliberate – about the treatments we prescribe to patients, we improve prospective treatment decisions and patient outcomes.
What role does M2GEN play in identifying patients who are eligible for targeted cancer therapies?
M2GEN serves as the operational arm for the Oncology Research Information Exchange Network (ORIEN) – an alliance of 18 US cancer centers recognized by the National Cancer Institutes. ORIEN members span the US – from Florida to California, and from New Mexico to New Hampshire. When patients are treated at these centers, they are given the opportunity to opt into the ORIEN program via our total cancer care (TCC) protocol. TCC is a uniform research protocol in which patients agree to contribute tissue samples for sequencing and any relevant clinical data. To date, over 325,000 patients have given lifetime consent to participate in this longitudinal study.
It is through this partnership and the consented arrangement with patients that we have been able to build one of the largest resources of clinico-genomic data in oncology. This resource consists of extensive clinical findings matched with deep genomic information, including whole exome genetic sequencing from germline and tumor tissue and RNA expression data.
Right now, our main priority is to use this resource to help our partners advance oncology drug development and discovery research. We are also starting to use this data and the ORIEN consortium to support clinical trials.
What form does your software take – and what data privacy concerns must you address?
Our software is primarily deployed as custom analytics scripts used to support oncology researchers using M2GEN’s data or in support of managing M2GEN and ORIEN’s data operations and data curation. Our team also provides analysis services and other products to deliver insights needed to drive research and drug development.
Protecting patient healthcare data is extremely important to us. Patients entrust us with their information, and it is our responsibility to keep these data secure. Unfortunately, everyone is a potential target for hackers. Broadly speaking, an arms race has emerged in the need for security sophistication. Strategies to breach IT network securities have become more advanced, necessitating stronger security protocols.
One important way we address this concern is by de-identifying the data captured from patients at the start of the process. Patients are assigned unique identifying numbers, so there are no records that have personal information in any databases used for oncology research. Our team is dedicated to protecting all data assets and we align with ISO 2700:2013 ISMS Framework – an internationally recognized compliance framework – to manage and mitigate risks.
Can you share any case studies of M2GEN’s work in the field?
One of the most prominent case studies involves the expansion of Merck’s PD-1 checkpoint inhibitor drug, Keytruda. When Keytruda was approved for non-small cell lung cancer (NSCLC) in 2016, M2GEN partnered with Merck to determine other forms of cancer in which patients would benefit from Keytruda therapy. Our work identified over 30 tumor types to pursue, leading to additional clinical studies to confirm these findings. Keytruda is now approved for 19 unique forms of cancer.