A new approach for gene annotation using unambiguous sequence joining

Proc IEEE Comput Soc Bioinform Conf. 2003:2:353-62.

Abstract

The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA / genetics*
  • Database Management Systems*
  • Databases, Genetic*
  • Documentation / methods*
  • Information Storage and Retrieval / methods*
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA