Subscribe to Newsletter
Discovery & Development Technology and Equipment

Illuminating the “Dark Regions” of the Genome

Uncovering the most complex parts of the genome could accelerate diagnoses, support the development of personalized treatment plans, and identify broader trends in population health. Only in the past few years have we been able to shine a light on these dark regions. Until 2022, the human genome sequence was just 92 percent complete; the remaining 8 percent was sequenced by the Telomere to Telomere (T2T) consortium, an achievement made possible by new “third generation” genomic sequencing technology. As sequencing technologies become more accurate and accessible to research institutions of all sizes and budgets, the dark side of the genome will be further illuminated, driving innovation across human health. 

Unlocking the dark regions
 

The completion of the human genome can be credited to a type of technology known as long-read sequencing. This technique preserves the integrity of DNA molecules by sequencing them in long fragments that span thousands of base pairs. This is in contrast to traditional sequencing, which typically covers only 50 to 300 base pairs. By avoiding fragmentation, long-reads capture the larger, more complex stretches of DNA that short-reads miss – illuminating previously “dark” regions. Since the pieces of DNA are much larger, assembly and data interpretation are also simplified, providing a more accurate and complete genomic picture.

While short-reads were once favoured for their lower price and faster turnaround, advances in long-reads mean the technology not only matches these benefits but also delivers the high level of accuracy needed for dark regions. The latest sequencing machines have made it possible to sequence a whole human genome for less than $500 – a huge decrease on the first human genome, which cost $2.7 billion. Additionally, long-read sequencers once required significant lab space and high-quality DNA samples. Today’s machines are far more compact, and require four-times less DNA to produce the same quality genomic data. 

One example of a dark and challenging genetic variation is tandem repeat expansions. These repeating units of DNA sequences have been linked to several diseases, including amyotrophic lateral sclerosis, Friedreich’s ataxia, and Huntington’s disease. Short-read technologies struggle to span the full length of tandem repeats, making it difficult to reconstruct their true size and sequence. Think of it as solving a jigsaw puzzle where many pieces look identical. Without unique landmarks, the fragments of the genome can get misaligned or merged, leading to inaccurate analyses. 

Areas that will benefit from advances in long-reads
 

Many areas of research stand to benefit from the investigation of complex genetic variations that are enabled by advances in long-reads. 

Consider cancer research. Cancer research has progressed significantly in recent years, partly driven by genomic insights that allow a precision medicine approach to the disease. The accuracy and completeness of long-reads are providing a deeper understanding of tumor biology by identifying large structural variants and tandem repeats. For example, one study used long-reads to uncover 3,059 breast tumor-specific splicing events – 35 of which are significantly associated with patient survival. 

Long-reads also allow researchers to explore layers of biology beyond the genome, potentially leading to earlier or more accurate cancer diagnoses. The epigenome – a collection of chemical modifications influencing how and when genes are expressed – plays a crucial role in cancer development. Since many genetic changes related to cancer first appear in the methylation layer, understanding epigenomic changes is essential to detect cancer before solid tumors begin to grow.

Rare diseases could also benefit. Rare diseases might be individually rare, but they are collectively common across the global population. Around 300 million people worldwide have a rare disease, but around 60 percent don’t receive a diagnosis. This is because many rare diseases stem from complex genetic origins that cannot be detected through short-read sequencing alone, including tandem repeat expansions. 

Current diagnostic pathways involve multiple tests and it can take years for patients to receive answers. Long-reads could replace this multi-test approach with a single comprehensive test that can capture more complex variants associated with rare diseases. In a recent clinical study, Radboud University Medical Center hypothesized that complex, multi-step diagnostic approaches could be replaced with a single, comprehensive test by using a type of long-read sequencing called HiFi. The study showed the technology successfully identified 93 percent of pathogenic variants in a single test for cases that previously required complex genetic testing. Maintaining all the equipment and trained staff to do a broad range of tests is a significant overhead that Radboud is hoping can be removed in the future if a move to a single genomic test is possible. 

In addition, the growing accessibility of long-read technology has contributed to increased investment in national rare disease initiatives, such as Germany’s GenomDE National Genome Strategy and Sweden’s national long-read sequencing study

A third area that stands to benefit from advances in long-read technology is pharmacogenomics. Understanding which genes are involved in drug response should lead to more effective prescribing of medicines, improved patient outcomes, and reduced healthcare costs. For instance, variation of the CYP2D6 gene affects the metabolism of around 20 percent of the most prescribed medications, such as opioids and antidepressants, impacting their efficacy and the risk of side effects.

However, many genes associated with drug response are particularly challenging to study because of high levels of polymorphisms and structural variants. In such cases, highly accurate long-read sequencing is necessary to identify pharmacogenomic markers, difficult-to-sequence pseudogenes, and complex variants. 

Pharmacogenomics research is expected to accelerate over the next decade, with significant projects already underway. For instance, the Estonian Biobank project plans to sequence 10,000 whole genomes using long-read sequencing to unlock insights into its population’s health and adopt personalized medicine at scale, including precision prescribing.

With long-read sequencing becoming more affordable and scalable, researchers are no longer limited by incomplete or inaccurate sequencing. The latest generation of long-read technologies provide an unparalleled view of the genome, uncovering dark regions that were once hidden. Continued investment in the most accurate sequencing technologies will unlock the power of genomics in disease prevention, diagnosis, and treatment. 

Receive content, products, events as well as relevant industry updates from The Medicine Maker and its sponsors.
Stay up to date with our other newsletters and sponsors information, tailored specifically to the fields you are interested in

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

About the Author
Neil Ward

VP and General Manager, PacBio EMEA

Register to The Medicine Maker

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:
  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts
  • Receive print (and PDF) copies of The Medicine Maker magazine

Register