Motivation: DNA microarrays are routinely applied to study diseased or drug-treated cell populations. A critical challenge is distinguishing the genes directly affected by these perturbations from the hundreds of genes that are indirectly affected. Here, we developed a sparse simultaneous equation model (SSEM) of mRNA expression data and applied Lasso regression to estimate the model parameters, thus constructing a network model of gene interaction effects. This inferred network model was then used to filter data from a given experimental condition of interest and predict the genes directly targeted by that perturbation.
Results: Our proposed SSEM-Lasso method demonstrated substantial improvement in sensitivity compared with other tested methods for predicting the targets of perturbations in both simulated datasets and microarray compendia. In simulated data, for two different network types, and over a wide range of signal-to-noise ratios, our algorithm demonstrated a 167% increase in sensitivity on average for the top 100 ranked genes, compared with the next best method. Our method also performed well in identifying targets of genetic perturbations in microarray compendia, with up to a 24% improvement in sensitivity on average for the top 100 ranked genes. The overall performance of our network-filtering method shows promise for identifying the direct targets of genetic dysregulation in cancer and disease from expression profiles.
Availability: Microarray data are available at the Many Microbe Microarrays Database (M3D, http://m3d.bu.edu). Algorithm scripts are available at the Gardner Lab website (http://gardnerlab.bu.edu/SSEMLasso).