Researchers have mapped unique identifiers in the regions around human genes that are at risk for duplication or deletion, allowing precise sequencing of nearly 1,000 genes for the first time, according to a paper published today in Science.
Duplications and deletions of gene regions — called copy number variations (CNVs) — are known to contribute to autism and other neurological disorders. Large duplicated regions of the genome, called segmental duplications, can lead to CNVs by promoting DNA rearrangement within the gene region they flank.
Accurately counting the number of gene copies and sequencing segmental duplications has been a challenge using standard techniques. Sequencing relies on matching short snippets of DNA to reference points within the genome, which is difficult to do with highly repetitive regions. For this reason, these sequences are often ignored, even though they may represent some of the most informative regions of the genome.
Evan Eichler and his colleagues have partially solved this problem by identifying single DNA base changes unique to each highly repetitive region flanking 990 genes. The researchers were able to accurately map CNVs as small as 1.9 kilobases, allowing for precise sequencing of these genes.
The researchers identified the unique nucleotides in 159 human genomes, part of the 1000 Genomes Project. Initial results from the project, which aims to sequence the DNA of people from different ethnic populations, were published yesterday in Nature.
Eichler’s team reports significant variation among different human populations when they use an algorithm to predict the number of duplicated genes in large CNVs. For example, about 25 percent of Asians, but almost no Europeans or Africans, have 6 copies of the NSF gene, which is expressed in the brain.
The researchers also identified the genes with the most CNVs, and some that show no variation in primates. Several of the human-specific variable genes — such as CHRNA7, which is duplicated in some people with schizophrenia and epilepsy — have neurological roles.
The researchers also found 173 duplications that are not identified in the reference human genome. This information can help create an optimized human genome and aid future sequencing projects.