Implementation of PCA enabled Support Vector Machine using cytokines to differentiate smokers versus nonsmokers

Proc (Int Conf Comput Sci Comput Intell). 2021 Dec:2021:312-317. doi: 10.1109/csci54926.2021.00125. Epub 2022 Jun 22.

Abstract

Presently, the role of cytokines in severe illness like COPD, cancer, cardiac disease associated with smoking is being explored to enable preemptive diagnosis and delivery of treatment interventions. We are investigating the connection between the elevation of inflammatory plasma cytokine in smokers versus nonsmokers. Disease indicator cytokines can be used to monitor the progression of disease which can help in the crucial task of prognosis and definitive diagnosis. Powerful and versatile Machine Learning algorithms can be leveraged to extract insights that cannot be obtained manually. We have applied Support Vector Machine (SVM) on 65 plasma cytokines and other traditional biomarkers to differentiate smokers and nonsmokers. To optimize the classification separability, we have used the following techniques: Principal component analysis (PCA), 10-fold cross validation and variable importance. The primary metric of evaluation is Area Under Receiver Operating Curve (AUROC), though we have additionally recorded and compared prediction accuracy across classifiers. The results are very promising. The AUROC classification accuracy achieved by SVM using the selected predictor feature variables is 89.2% with a 95%CI (85.4%,93.1%). The most prominent cytokines, contributing to the classification, in the order of importance are: I-TAC, Age, TG, G-CSF-CSF-3, MDC-CCL22, Eotaxin-3, LIF, IL-2, Eotaxin-2, MIP-3alpha. The AUROC classification accuracy improved to 93% with a 95% CI (90.1%,99.5%) upon choosing the five most prominent cytokines. The versatile prowess of Machine Learning algorithms such as Support Vector Machine can translate pioneering molecular discoveries into actionable insights that can be applied in the field of translational and precision medicine to save life.

Keywords: Classification; Cytokines; Prediction Accuracy; Principal Component Analysis; Support Vector Machine.