A Little Byte of Life
Industry and finance went global decades ago. Has the time come for genome data to do the same?
Thorben Seeger, Chief Business Development Officer at UK-based precision medicine software company Lifebit, wants to democratize the data held by the world’s leading genomic institutions. But who exactly is partnering with Lifebit? What does the word “democratize” mean here? And, more importantly still, how can pharma and bioscience benefit? Here’s what Seeger had to say.
How did you join Lifebit?
I came to London in 2008, in the heat of the financial crisis. Actually, I was working right in the thick of it – on the trading floors of companies like Morgan Stanley. There, my focus was on using data to work out solutions for major financial institutions in Germany and Austria.
Many years later, I moved into the biotech space to join Lifebit. Prior to this move, I saw the financial industry make very effective use of widely-available big data, which we often refer to as “democratized” data. In the life sciences it is harder to see such democratization in action, and I believe that lack is hindering the industry. At Lifebit, I’m doing my little part to help bridge science and medicine by working to connect and open up all the data that could be available to scientists working on potential treatments for unmet medical needs.
Lifebit was founded in 2017 by Maria Chatzou Dunford and Pablo Prieto Barja, who had worked together at the Center for Genomic Regulation (Centro de Regulación Genómica, CRG). They wanted to help end the siloing of genomics data, and felt that the best vehicle to tackle this difficult and rather complex problem would be a private company.
Where does Lifebit sit in the wider context of science and technology?
In recent years, a good proportion of the life sciences has moved into the “dry lab.” This marks quite a shift from the traditional approach, in which researchers conduct experiments in labs, make potentially useful findings, and take them into clinical trials.
Today, the wholesale sequencing of human genes at affordable levels has generated a vast amount of data, which now exists in tandem with clinical data. Time and time again, it has been shown that when we make use of this data, we double our chances of regulatory approval. Bioinformatics writ large can combine data science and large-scale computing with traditional biomedical approaches to vastly improve our understanding of diseases.
Where does this huge amount of data come from, and what can it be used for?
I’ll lay out a very real example – the UK government’s research response to COVID-19.
A partnership between the GenOMICC COVID-19 Study and Genomics England was set up and publicly funded in 2020 (1). It was tasked with collecting 20,000 whole genomes from severe COVID patients in over 200 intensive care units across the country, and then combining those with 15,000 whole genomes from patients with milder cases of the disease. (As a matter of fact, the mild COVID case I suffered means that my data is in that collection.)
Genomics England is owned by the British government’s Department of Health and Social Care. They safeguard their participant’s data and regulate access to it. The company’s purpose, of course, is to look into the genome – especially the “spelling mistakes” of the genome, such as mutations and their role as a relevant “biomarker” relevant for specific diseases. These data will help isolate the contributing factors behind why, for example, some people contract such severe cases of COVID-19, while others only have to deal with mild symptoms. In 2021, work by GenOMICC and Genomics England published in Nature demonstrated one such breakthrough, in which five novel biomarkers contributing to the severity of COVID were identified (2).
Such research into genome data is relevant to many diseases and could help the industry to develop better treatments. The pipeline should start at the patient as data collection and end at the patient in the form of a treatment.
Why does Lifebit partner with governments?
Generating large cohorts of patient data requires serious financial investment, effort, and cohesion. It’s a large-scale affair, which bears heavy burdens of trust and responsibility. And that’s why most of the world’s best genomics institutes are owned and funded by governments. Authorities like this are able to play the role of protector for their citizens’ data in a way that private companies might find harder to establish initial trust for.
Genomics England, for example, was set up to deliver the UK-wide 100,000 Genomes Project that was set up by David Cameron in 2012 – three years after his son Ivan died of a rare disease. Its aim was to create evidence and research assets that would help humanity tackle these diseases more effectively. At Lifebit we partner with organizations like this, but also with other international public sector organizations, as seen in our work with the Hong Kong Genome Institute – which is funded by Hong Kong’s Special Administrative Region government.
Conversely, I should point out that we are also seeing the emergence of private biobank initiatives in underserved and developing areas of the world; for example, parts of Africa and Asia where national governments don’t have the resources to manage such projects themselves. We see this as a positive development, as our mission is to connect the world’s datasets and bring about global connectivity. Part of that goal necessarily involves gathering genome data that represents populations across the world. Multicultural countries like the UK might be a start, but, to really achieve ethnic diversity, we need to be gathering data worldwide – from a diverse range of institutions. Right now, we’re talking to standout organizations all over the world to make sure this happens.
Do you draw on alternative sources for gathering genomic data?
We have a long-term AI partnership with Boehringer Ingelheim that allows us to make use of real-world data. Such data is not “locked away” in a traditional sense; rather, it is typically rendered unusable because of its vast quantity. The upside of this data is that much of it is publicly available via the Internet. One can find it in everything from scientific publications and specialist forums to the depths of Twitter and Reddit. Of course, the scale and distribution of such information is far too overwhelming for any one human or even groups of humans to absorb in any reasonable timeframe. But technologies, such as natural language processing and deep learning, can step in to both rapidly parse potential sources and also understand the context of particular elements within them. For example, Boehringer Ingelheim has used our AI technology to detect when a new emerging disease arises, or when a disease spreads between two different global regions. Such information allows the corporation to run an early warning system of sorts, fueled by vast amounts of non-sensitive public information.
Looking ahead, what are your hopes for biotechnology?
I foresee exponential development. I expect exponential research to leverage exponential data, producing exponential efficacy and data for novel drug development. I expect faster diagnostics and more personalized therapeutics to drive more national programs, such as Genomics England and the Hong Kong Genome Program, and I expect even more action in the private sector. Acceleration is underway and seeing it fills me with hope.
All that said… We should remember that acceleration is just a means to an end. Nobody needs sequencing machines. Nobody even needs data. People need the insights to create better diagnostics, treatments and drugs – the end at which the means arrives. Simple. Democratization via the dissemination of data can achieve this. Giant pharma companies do incredible things but the smaller and more diverse biotechs of this world could work wonders too, if presented with proper access to the right data.
I know where I’m pinning my hopes.
- UKRI, “Five genes identified that could be key to new COVID-19 treatments”, ukri.org (2020). Available at: https://bit.ly/OMICC2020
- E Pairo-Castineira, “Genetic mechanisms of critical illness in COVID-19”, Nature, 591, 92-98 (2021). DOI: 10.1038/s41586-020-03065-y