Sequence-based Protein-Protein Interaction Prediction Optimized for Target Selection in Biological Experiments

Conf Proc IEEE Eng Med Biol Soc. 2005:2006:236-9. doi: 10.1109/IEMBS.2005.1616387.

Abstract

A set of protein pairs predicted to be interacting with high ratio of true positive is valuable for target selection in experiments like protein structure determination. Our goal in this paper is to investigate the problem of finding such a set of protein pairs in an organism by machine learning methods. Yeast genome was taken for this study and support vector machine was adopted as the classification model. Domain information of each protein was extracted and transformed into features of a protein pair. We specifically analyzed the effect of negative sample selection based on different principles. We also evaluated the feasibility to adjust the intercept parameter of a trained SVM model to improve the ratio of predicted true positive. Our result shows that the approximate 1:3 ratio of positive samples to negative ones in the testing data can be significantly improved to 2:1 of the positive to negative in the predicted data.