Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays

Genome Res. 2006 Dec;16(12):1575-84. doi: 10.1101/gr.5629106. Epub 2006 Nov 22.

Abstract

Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Chromosomes, Human
  • DNA / genetics*
  • Gene Deletion
  • Gene Dosage*
  • Genetic Variation*
  • Genome, Human*
  • Homozygote
  • Humans
  • Mass Spectrometry
  • Oligonucleotide Array Sequence Analysis*
  • Polymerase Chain Reaction
  • Polymorphism, Single Nucleotide

Substances

  • DNA

Associated data

  • GEO/GSE5013
  • GEO/GSE5173