A metadata framework for interoperating heterogeneous genome data using XML

Proc AMIA Symp. 2001:110-4.

Abstract

The rapid advances in the Human Genome Project and genomic technologies have produced massive amounts of data populated in a large number of network-accessible databases. These technological advances and the associated data can have a great impact on biomedicine and healthcare. To answer many of the biologically or medically important questions, researchers often need to integrate data from a number of independent but related genome databases. One common practice is to download data sets (text files) from various genome Web sites and process them by some local programs. One main problem with this approach is that these programs are written on a case-by-case basis because the data sets involved are heterogeneous in structure. To address this problem, we define metadata that maps these heterogeneously structured files into a common eXtensible Markup Language (XML) structure to facilitate data interoperation. We illustrate this approach by interoperating two sets of essential yeast genes that are stored in two yeast genome databases (MIPS and YPD).

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Databases, Genetic*
  • Genome, Fungal*
  • Internet
  • Programming Languages*
  • Yeasts / genetics*