The 100,000 Genomes Project
An ambitious UK sequencing project aims to learn more about patients with cancer and rare diseases
The Human Genome Project was declared complete in 2003 to great applause from the scientific community. But then a big question quickly presented itself: how can we use the data? Time to think big.
The 100,000 Genomes Project was launched by Genomics England in 2014 with the aim of sequencing and analyzing 100,000 genomes from patients and their families affected by cancer or rare diseases. We found out more from David Bentley, vice president and chief scientist at Illumina, who is leading a team at Illumina in Cambridge to help bring genome sequencing to the bedside, in partnership with Genomics England.
It can’t be as easy as it sounds – can it?
The time is right to do it and the concept is easy to grasp, but we must remember this is the first time in the world that a project of this scale has been attempted. Several countries and organizations have been deliberating on this idea for some time, but the UK is first to take the plunge. It’s difficult being right at the forefront because every problem you come across is new – and you have to solve it.
The technology we’re using is Illumina’s HiSeq X Ten sequencer. Our sequencing technology launched in 2006 and eight years is not a long time for a technology to mature, even though out of the gate it was much faster and cheaper than what was out there the day before. But it’s still early days, and it has never been used at this sort of production level before. The transition from academic and pure research to robust production testing for healthcare and population research is another first.
It’s not just about instrumentation; the project requires a huge infrastructure to track the samples being collected from hospitals and the regional centers, log all the processes and QC steps, and monitor how we analyse the data afterwards. The scale of the 100,000 Genomes Project demands a significant level of process engineering beyond what the original research pipeline has been doing. It’s at the cutting edge though and that is what is so exciting about it.
It’s a brave move from the UK government and the Prime Minister, David Cameron, deserves some credit. He understands the importance of this project and the personal impact that this can have on families with people suffering from genetic diseases where no one understands the cause. Genome sequencing could help solve that.
How did the project get started?
Almost all disease has some genetic component. Some of it is obvious as the disease runs in families, such as classic familial genetic disease, and some is more complex like the genetics underlying breast cancer. But genetics play some part in almost every disease, which means that we would ultimately have to develop an almost infinite number of different tests to cover all diseases. Instead, the idea behind this project is to sequence the whole genome of each patient and learn how to extract the clinically useful (or actionable) information for each case.
The starting points for the 100,000 Genomes Project are the collection of patients and their clinical information, the sequencing technology, and the information that came from the human reference sequence created by the Human Genome Project (HGP). The HGP promised a great deal – many said early on that it had not delivered on this promise, but I believe people need to understand that it can take a long time to develop the necessary understanding and all the tools needed to make proper use of the reference sequencing. We have a fantastic human genome sequence - it’s just that we didn’t have the right tools to use it at the beginning. Our sequencing technology can help by enabling us to apply the knowledge of the reference genome to any individual in one or a few days, for about $1000 per genome.
And as for exactly how we got started, two very influential people (although there are many others too) were Sir John Bell, who is the Regius Chair of Medicine at Oxford University, and Dame Sally Davies, who is Chief Medical Officer for England. My own group was involved in doing some clinical collaborations with them and others to investigate whether you could find genetic mutations by whole genome sequencing and explain cases of undiagnosed diseases or cancer. The examples developed with various laboratories in the UK and the US captured the imagination of all of us, and it all started to come together.
The government formed a company, Genomics England, to take direct responsibility in moving the 100,000 genomes project forward. This was a very astute move. Genomics England is a non-profit company charged with a mission to get the project up and running. Since it formed, Genomics England have taken on the task of scouring the world for possible technologies and systems that would be useful, and then making comparisons and managing the contracting processes. Genomics England agreed on a partnership with Illumina to deliver the genome sequencing for the project in August 2014.
How has technology advanced since the Human Genome Project?
When I was a PhD student I did manual sequencing using the Fred Sanger method. I sequenced one piece of DNA in a test tube, and if I wanted to sequence four pieces then I used four test tubes. The number of sequences I did at once was determined by the number of tubes I could handle. Fast forward to the Human Genome Project, which used machines that could manipulate a hundred fragments at a time. Now, with our technology we can do five billion fragments at once in a single run on one HiSeqX machine. We take the DNA from an individual sample and we break it up into pieces; then we wash that solution across a microscope slide and the DNA molecules become attached to the glass surface of the slide. We can get billions of them on one surface. Then we perform the sequencing on all of the molecules in parallel in a single reaction.
The chemistry behind our sequencing technology hasn’t really changed, but we have gone to higher densities. When we first launched the technology it could sequence a million fragments. Now it can sequence five billion. You essentially sequence all the fragments in parallel on the microscope slide. When synthesizing, you put a colour tag in to represent each base and then you simply take photographs of the microscope slide at ever cycle. With each photograph, you are capturing all five billion fragments at once. It’s a big step up from those four tubes I used to handle...
How difficult is data interpretation?
A genome has three billion bases and between three and four million of those are different between people… so we don’t have to analyze everything. What we need to look at are the bases that differ between diseased and non-diseased individuals. With computer systems and software you can then attach meaning to the differences – then you can discover which mutations occur within cancer genes or genes that may cause a genetic disease.
We can sequence a genome in a day and then analyze and interpret the results in eight hours. And in some situations, we can perform the diagnosis in about 10 minutes – for example if it is a disease we know a lot about and if it fits a clear inheritance pattern. Clearly, it’s not always so easy – cancer and many genetic diseases are highly complex or in cases where we know much less about the underlying genetic factors that influence disease onset.
The difficult bit is interpreting the genome for the patient and coming up with a definite diagnosis or a few options to follow up on, and then trying to arrive at a definitive action – the action may be explaining to the family, counselling about family planning for future children, prescribing a particular drug – genomes contain all kinds of actionable information.
The project is in the ‘pilot stage’ at the moment. What’s next?
There are two areas that we will be focusing on: rare diseases and cancer. We have already collected data from some patients and the results are now eagerly being reviewed. They will serve as an initial data set to see how best to do the diagnosis and how best to do the discovery. It also helps us to put into place the high throughput pipeline that is needed above the instruments themselves. This pipeline has to take DNA in and produce an annotated genome - and has undergone initial validation. The data are being deposited at a secure data centre managed by Genomics England.
Is it difficult to find the people with the right skills?
I hope this project will motivate many more people to take up training and education programmes in these areas. Genomics England and the 100,000 genomes project have the chance to really stimulate training and possibly the creation of new training courses. Just as informatics grew and became much more prominent in education some years ago, the same thing needs to happen with genomics and genomic medicine.
Will the project kickstart R&D in the pharma industry?
The pharma industry has the potential to benefit hugely from the 100,000 genomes project. The two main goals of this project are diagnosis of the conditions that we do know about, plus a discovery programme so that we can learn more about the diseases we don’t know enough about. Initially, a lot of the discoveries we find in genomes will be difficult to explain or understand; but the research community needs to have access to the data so that the whole discovery process takes off and adds value to the project and derives important new insights from it. When researchers start to sift through the data that become available, they will definitely find new patterns of genetic mutation, some of which may identify new targets for drug development, thus generating new leads for the pharma industry.
Providing pharma companies with access to the HGP or 100,000 genome sequences is not enough. It is really important also to provide clinical information associated with each genome – this is the role of the Genomics England clinical interpretation network that is part of the 100,000 genomes project. Of course, the clinical information needs to be provided in anonymous form in order to protect the privacy of all the individuals involved, while also making the information useful for research.
What are your hopes?
I really do believe that this project will achieve a very long-held aspiration: introducing precision medicine and making it an effective part of healthcare. Using information from each genome, each patient, and all the results of the 100,000 genomes project in aggregate, will massively increase the precision with which we understand and diagnose diseases of all kinds, and it will help doctors every day when they need to make diagnoses and take clinical decisions.
I’ve been involved in sequencing technology and development for around 35 years. Before I joined the industry I was an academic medical researcher for 25 years. For part of that time, I worked in a hospital, using molecular biology to identify mutations that cause genetic disease. Now I’ve come full circle and I’m seeing our technology being applied directly to help patients and doctors. It is always very exciting to be so close to medical care.
Meeting patients makes it all feel very real - I can see some real benefit to my work. Every one of those 100,000 genomes is a person, not a number. It’s very stimulating and it makes you realize why you are doing a project like this.
For more information about the 100,000 Genomes Project, visit www.genomicsengland.co.uk
Making great scientific magazines isn’t just about delivering knowledge and high quality content; it’s also about packaging these in the right words to ensure that someone is truly inspired by a topic. My passion is ensuring that our authors’ expertise is presented as a seamless and enjoyable reading experience, whether in print, in digital or on social media. I’ve spent fourteen years writing and editing features for scientific and manufacturing publications, and in making this content engaging and accessible without sacrificing its scientific integrity. There is nothing better than a magazine with great content that feels great to read.