A new reference genome includes genetic sequences collected from people around the globe, plugging major gaps in the current one.
The current reference, first published in 2001, is a cornerstone of human genetics research. It has undergone 20 revisions and updates but still does not represent the full range of genetic diversity. About 70 percent of it is based on DNA from one person; the rest comes from another 10 people.
This is a major problem for whole-genome sequencing, a technique increasingly used in autism research. For instance, many people have segments in their genome that do not match the reference, so researchers often discard these segments. But these portions may play important roles in genetic conditions such as autism.
Previous work has shown that 37 percent of the discarded sections are transcribed into RNA that carries instructions for building proteins. This suggests the regions perform biological functions that whole-genome studies do not capture1.
The updated reference incorporates data from 338 people, including 169 women. It encompasses 52 samples from the 26 global populations included in the 1000 Genomes Project, as well as 154 samples from Han Chinese donors in Taiwan. The remaining samples come from East Asian, Hispanic and South Asian donors.
The researchers used a sequencing technology that reads short segments of the genome and maps their position within the full sequence. The team determined where the new segments belong by lining the new sequences up against the previous reference genome.
They presented the new dataset at the 2019 American Society of Human Genetics meeting in October in Houston, Texas. The results are in peer review and are expected to be freely available after publication.