Computational inference of homologous gene structures in the human genome

R F Yeh; L P Lim; C B Burge

doi:10.1101/gr.175701

Computational inference of homologous gene structures in the human genome

Genome Res. 2001 May;11(5):803-16. doi: 10.1101/gr.175701.

Authors

R F Yeh¹, L P Lim, C B Burge

Affiliation

¹ Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

Abstract

With the human genome sequence approaching completion, a major challenge is to identify the locations and encoded protein sequences of all human genes. To address this problem we have developed a new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model. Extensive testing shows that GenomeScan can accurately identify the exon-intron structures of genes in finished or draft human genome sequence with a low rate of false-positives. Application of GenomeScan to 2.7 billion bases of human genomic DNA identified at least 20,000-25,000 human genes out of an estimated 30,000-40,000 present in the genome. The results show an accurate and efficient automated approach for identifying genes in higher eukaryotic genomes and provide a first-level annotation of the draft human genome.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Chromosomes, Artificial, Bacterial
Computational Biology / methods*
Genes / genetics*
Genome, Human*
Humans
Sequence Analysis, DNA / methods
Sequence Homology, Nucleic Acid*

Abstract

Publication types

MeSH terms

Grants and funding