Deciphering Arabidopsis thaliana gene neighborhoods through bibliographic co-citations

A Louis; H Chiapello; C Fabry; E Ollivier; A Hénaut

doi:10.1016/s0097-8485(02)00011-6

Deciphering Arabidopsis thaliana gene neighborhoods through bibliographic co-citations

Comput Chem. 2002 Jul;26(5):511-9. doi: 10.1016/s0097-8485(02)00011-6.

Authors

A Louis¹, H Chiapello, C Fabry, E Ollivier, A Hénaut

Affiliation

¹ Laboratoire Génome et Informatique, Tour Evry 2, France. louis@genopole.cnrs.fr

PMID: 12144179
DOI: 10.1016/s0097-8485(02)00011-6

Abstract

In the framework of genome annotation, scientific literature is obviously the major source of biological knowledge. The aim of the work described in this paper is to exploit this source of data for the model plant Arabidopsis thaliana. The first step has consisted in constituting a relevant bibliographic references dataset for plant genomic research. Genes co-citations have then been systematically annotated in this reference dataset, starting from the simple idea that if genes are cited in the same publication, they must probably share some related functional properties. In order to deal with the synonymous gene name problem, a gene name reference list has been constituted starting from A. thaliana SwissProt entries. This list was used to build clusters of co-cited genes by a single linkage procedure such that any gene in a given cluster possesses at least one co-cited partner in the same cluster. Analysis of the clusters demonstrate the biological consistency of this approach, with only very few fortuitous links. As an example, a cluster including genes related to flowering time is more deeply described in the paper. Finally, a graphical representation of each cluster was performed, which provides a convenient way to retrieve the genes (the nodes of the graphs) and the references in which they were co-cited (the edges of the graphs). All the results can be accessed at the URL http://chlora.Igi.infobiogen.fr:1234/bib_arath/.

MeSH terms

Arabidopsis / genetics*
Arabidopsis Proteins / genetics
Cluster Analysis
Computational Biology / methods*
Databases, Bibliographic*
Databases, Protein
Genes, Plant / genetics
Genome, Plant*
Internet
Knowledge
Molecular Sequence Data
Physical Chromosome Mapping / methods*
Research
Species Specificity
Terminology as Topic

Substances

Arabidopsis Proteins