Learning a Markov Logic network for supervised gene regulatory network inference

Céline Brouard; Christel Vrain; Julie Dubois; David Castel; Marie-Anne Debily; Florence d'Alché-Buc

doi:10.1186/1471-2105-14-273

Learning a Markov Logic network for supervised gene regulatory network inference

BMC Bioinformatics. 2013 Sep 12:14:273. doi: 10.1186/1471-2105-14-273.

Authors

Céline Brouard¹, Christel Vrain, Julie Dubois, David Castel, Marie-Anne Debily, Florence d'Alché-Buc

Affiliation

¹ IBISC EA 4526, Université d'Évry-Val d'Essonne, 23 Boulevard de France, 91037, Évry, France. celine.brouard@ibisc.univ-evry.fr.

Abstract

Background: Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules.

Results: We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions.

Conclusions: The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Databases, Genetic
Gene Regulatory Networks*
Humans
Logic*
Markov Chains*
Models, Statistical
ROC Curve
Support Vector Machine
Systems Biology / methods*