Improving the prediction of pharmacogenes using text-derived drug-gene relationships

Yael Garten; Nicholas P Tatonetti; Russ B Altman

doi:10.1142/9789814295291_0033

Improving the prediction of pharmacogenes using text-derived drug-gene relationships

Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.

Authors

Yael Garten¹, Nicholas P Tatonetti, Russ B Altman

Affiliation

¹ Stanford Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.

Abstract

A critical goal of pharmacogenomics research is to identify genes that can explain variation in drug response. We have previously reported a method that creates a genome-scale ranking of genes likely to interact with a drug. The algorithm uses information about drug structure and indications of use to rank the genes. Although the algorithm has good performance, its performance depends on a curated set of drug-gene relationships that is expensive to create and difficult to maintain. In this work, we assess the utility of text mining in extracting a network of drug-gene relationships automatically. This provides a valuable aggregate source of knowledge, subsequently used as input into the algorithm that ranks potential pharmacogenes. Using a drug-gene network created from sentence-level co-occurrence in the full text of scientific articles, we compared the performance to that of a network created by manual curation of those articles. Under a wide range of conditions, we show that a knowledge base derived from text-mining the literature performs as well as, and sometimes better than, a high-quality, manually curated knowledge base. We conclude that we can use relationships mined automatically from the literature as a knowledgebase for pharmacogenomics relationships. Additionally, when relationships are missed by text mining, our system can accurately extrapolate new relationships with 77.4% precision.

Publication types

Research Support, N.I.H., Extramural
Validation Study

MeSH terms

Algorithms
Computational Biology
Data Mining / statistics & numerical data
Humans
Knowledge Bases
Pharmacogenetics / statistics & numerical data*

Abstract

Publication types

MeSH terms

Grants and funding