Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

Bioinformatics. 2003 Jul 1;19(10):1275-83. doi: 10.1093/bioinformatics/btg153.

Abstract

Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses.

Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases.

Availability: Software available from http://www.russet.org.uk.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Artificial Intelligence
  • Databases, Factual
  • Databases, Genetic*
  • Documentation*
  • Gene Expression Profiling / methods
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / classification*
  • Proteins / genetics
  • Reproducibility of Results
  • Semantics
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Statistics as Topic
  • Terminology as Topic*

Substances

  • Proteins