Objective: To screen for long non-coding RNA (lncRNA) molecular markers characteristic of osteoarthritis (OA) by utilizing the Gene Expression Omnibus (GEO) database combined with machine learning.
Methods: The samples of 185 OA patients and 76 healthy individuals as normal controls were included in the study. GEO datasets were screened for differentially expressed lncRNAs. Three algorithms, the least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE), and random forest (RF), were used to screen for candidate lncRNA models and receiver operating characteristic (ROC) curves were plotted to evaluate the models. We collected the peripheral blood samples of 30 clinical OA patients and 15 health controls and measured the immunoinflammatory indicators. RT-PCR was performed for quantitative analysis of the expression of lncRNA molecular markers in peripheral blood mononuclear cells (PBMC). Pearson analysis was performed to examine the correlation between lncRNA and indicators for inflammation of the immune system.
Results: A total of 14 key markers were identified with LASSO, 6 genes were identified with SVM-RFE, and 24 genes were identified with RF. Venn diagram was used to screen for overlapping genes identified with the three algorithms, showing HOTAIR, H19, MIR155 HG, and NKILA to be the overlapping genes. The ROC curves showed that these four lncRNAs all had an area under the curve ( AUC) greater than 0.7. The RT-PCR findings revealed relatively elevated expression of HOTAIR, H19, and MIR155HG and decreased expression of NKILA in the PBMC of OA patients compared with those of the normal group ( P<0.01). The results were consistent with the bioinformatics predictions. Pearson analysis showed that the candidate lncRNAs were correlated with clinical indicators for inflammation.
Conclusion: HOTAIR, H19, MIR155 HG, and NKILA can be used as molecular markers for the clinical diagnosis of OA and are correlate with clinical indicators of inflammation of the immune system.
目的: 利用GEO(Gene Expression Omnibus)数据库联合机器学习筛选骨关节炎(osteoarthritis, OA)特征性的长链非编码RNA(lncRNA)分子标志物。
方法: 纳入185例OA及76例正常健康人样本,GEO数据库筛选数据集得出差异表达lncRNA,通过随机森林(randomforest, RF)、最小绝对收缩和选择算子(LASSO)逻辑回归、支持向量机递归特征消除(SVM-RFE)3种算法筛选候选的lncRNA模型,绘制受试者操作特征曲线评价模型。收集临床OA患者30例和正常对照15例的外周血,测定免疫炎症指标,RT-PCR定量分析外周血单核细胞lncRNA分子标志物的表达,Pearson分析lncRNA与免疫炎症指标的相关性。
结果: LASSO得出14个关键标志物,SVM-RFE算法确定6个基因,RF算法确定24个基因。Venn图筛选得出3种算法的重叠基因,包括HOTAIR、H19、MIR155HG和NKILA。受试者工作特征曲线显示这4个lncRNA的曲线下面积均大于0.7。RT-PCR法发现与正常对照组相比,HOTAIR、H19、MIR155HG在OA患者外周血单核细胞中相对表达量升高,NKILA表达量下降(均P<0.01),结果与生物信息学预测结果相一致。Pearson相关性分析表明选定的lncRNA与临床免疫炎症指标相关。
结论: HOTAIR、H19、MIR155HG和NKILA可作为OA临床诊断分子标志物,且与临床免疫炎症指标相关。
Keywords: Diagnostic markers; Immune inflammation; Long non-coding RNA; Machine learning strategy; Osteoarthritis.
Copyright© by Editorial Board of Journal of Sichuan University (Medical Sciences).