Scientists achieve first
complete assembly of human X chromosome
By
Tim Stephens
UC
Santa Cruz – July 14, 2020 -- The first end-to-end (‘telomere-to-telomere’)
completely gapless DNA sequence of a human chromosome is a major milestone for
genomics research.
Although
the current human reference genome is the most accurate and complete vertebrate
genome ever produced, there are still gaps in the DNA sequence, even after two
decades of improvements. Now, for the first time, scientists have determined
the complete sequence of a human chromosome from one end to the other
(‘telomere to telomere’) with no gaps and an unprecedented level of accuracy.
The
publication of the telomere-to-telomere assembly of a complete human X
chromosome July
14 in Nature is a landmark achievement for genomics
researchers. Lead author Karen Miga, a research scientist at the UC Santa Cruz
Genomics Institute, said the project was made possible by new sequencing
technologies that enable “ultra-long reads,” such as the nanopore
sequencing technology pioneered at UC Santa Cruz.
Repetitive
DNA sequences are common throughout the genome and have always posed a challenge
for sequencing because most technologies produce relatively short “reads” of
the sequence, which then have to be pieced together like a jigsaw puzzle to
assemble the genome. Repetitive sequences yield lots of short reads that look
almost identical, like a large expanse of blue sky in a puzzle, with no clues
to how the pieces fit together or how many repeats there are.
“These
repeat-rich sequences were once deemed intractable, but now we’ve made leaps
and bounds in sequencing technology,” Miga said. “With nanopore sequencing, we
get ultra-long reads of hundreds of thousands of base pairs that can span an
entire repeat region, so that bypasses some of the challenges.”
Filling
in the remaining gaps in the human genome sequence opens up new regions of the
genome where researchers can search for associations between sequence
variations and disease and for other clues to important questions about human
biology and evolution.
“We’re
starting to find that some of these regions where there were gaps in the reference
sequence are actually among the richest for variation in human populations, so
we’ve been missing a lot of information that could be important to
understanding human biology and disease,” Miga said.
Telomere
to telomere
Miga
and Adam Phillippy at the National Human Genome Research Institute (NHGRI),
both corresponding authors of the new paper, co-founded the Telomere-to-Telomere
(T2T) consortium to pursue a complete genome assembly after working
together on a 2018
paper that demonstrated the potential of nanopore technology to
produce a complete human genome sequence. That effort used the Oxford Nanopore
Technologies MinION sequencer, which sequences DNA by detecting the change in
current flow as single molecules of DNA pass through a tiny hole (a
"nanopore") in a membrane.
The
new project built on that effort, combining nanopore sequencing with other
sequencing technologies from PacBio and Illumina, and optical maps from BioNano
Genomics. Using these technologies, the team produced a whole-genome assembly
that exceeds all prior human genome assemblies in terms of continuity,
completeness, and accuracy, even surpassing the current human reference genome
by some metrics.
Nevertheless,
there were still multiple breaks in the sequence, Miga said. To finish the X
chromosome, the team had to manually resolve several gaps in the sequence. Two
segmental duplications were resolved with ultra-long nanopore reads that
completely spanned the repeats and were uniquely anchored on either side. The
remaining break was at the centromere, a notoriously difficult region of
repetitive DNA found in every chromosome.
In
the X chromosome, the centromere encompasses a region of highly repetitive DNA
spanning 3.1 million base pairs (the bases A, C, T, and G form pairs in the DNA
double helix and encode genetic information in their sequence). The team was
able to identify variants within the repeat sequence to serve as markers, which
they used to align the long reads and connect them together to span the entire
centromere.
“For
me, the idea that we can put together a 3-megabase-size tandem repeat is just
mind-blowing. We can now reach these repeat regions covering millions of bases
that were previously thought intractable,” Miga said.
Polishing
strategy
The
next step was a polishing strategy using data from multiple sequencing
technologies to ensure the accuracy of every base in the sequence.
“We
used an iterative process over three different sequencing platforms to polish
the sequence and reach a high level of accuracy,” Miga explained. “The unique
markers provide an anchoring system for the ultra-long reads, and once you
anchor the reads, you can use multiple data sets to call each base.”
Nanopore
sequencing, in addition to providing ultra-long reads, can also detect bases
that have been modified by methylation, an “epigenetic” change that does not
alter the sequence but has important effects on DNA structure and gene
expression. By mapping patterns of methylation on the X chromosome, the team
was able to confirm previous observations and reveal some intriguing trends in
methylation patterns within the centromere.
The
new human genome sequence, derived from a human cell line called CHM13, closes
many gaps in the current reference genome, known as Genome Reference Consortium
build 38 (GRCh38).
The
T2T consortium is continuing to work toward completion of all of the CHM13
chromosomes. “It’s an open consortium, so in many respects this is a
community-driven project, with a lot of people dedicating time and resources to
it,” Miga said.
In
addition to Miga and Phillippy, the authors of the paper include co-first
author Sergey Koren at the National Human Genome Research Institute and
scientists at nearly two dozen institutions in the U.S. and U.K., including the
University of Washington, Johns Hopkins University, UC San Diego, and the
Wellcome Sanger Institute. This work was supported by the U.S. National Institutes
of Health.
No comments:
Post a Comment