中国卒中杂志 ›› 2024, Vol. 19 ›› Issue (8): 924-930.DOI: 10.3969/j.issn.1673-5765.2024.08.010

• 论著 • 上一篇    下一篇

基于可解释性机器学习模型的轻型缺血性卒中复发预测研究

莫秋红1,丁晓波1,李靓12张岩波12李伟荣13   

  1. 1太原 030000山西医科大学公共卫生学院

    2重大疾病风险评估山西省重点实验室

    3山西省心血管病医院神经内科

  • 收稿日期:2024-01-14 出版日期:2024-08-20 发布日期:2024-08-20
  • 通讯作者: 李伟荣weironglee@163.com

Research on Prediction of Recurrence of Minor Ischemic Stroke Based on Interpretable Machine Learning Models

MO Qiuhong1, DING Xiaobo1, LI Jing1,2, ZHANG Yanbo1,2, LI Weirong1,3   

  1. 1School of Public Health, Shanxi Medical University, Taiyuan 030000, China; 2Shanxi Key Laboratory of Risk Assessment of Major Diseases, Taiyuan 030000, China; 3Department of Neurology, Shanxi Cardiovascular Hospital, Taiyuan 030000, China
  • Received:2024-01-14 Online:2024-08-20 Published:2024-08-20
  • Contact: LI Weirong, E-mail: weironglee@163.com

摘要:

目的       利用可解释的机器学习模型,探讨轻型缺血性卒中(minor ischemic strokeMIS2年内复发相关危险因素。

方法       回顾性收集2020712月山西省心血管病医院神经内科MIS患者一般资料、实验室结果、影像学等资料,单因素分析进行复发危险因素变量筛选,合成少数过采样技术-标称连续处理数据不平衡,数据集按82的比例分为训练集与测试集,网格搜索10折交叉验证构建轻量梯度提升机(light gradient boosting machineLightGBM)、支持向量机(support vector machineSVM)模型,并与逻辑回归(logistic regressionLR)模型进行比较,基于ROCAUC、校准曲线分别评价模型的区分度与校准度,性能最好的模型通过Shapley加性解释(Shapley additive explanationSHAP)模型对预测结果进行解读。

结果       本研究共纳入520MIS患者,2年内复发93例(17.9%),测试集中LightGBMSVMLR预测患者2年内复发的AUC分别为0.93595%CI0.8960.973)、0.83395%CI0.7700.896)、0.76495%CI0.6910.835),准确度分别为0.8900.7730.693,布里尔分数分别为0.1050.1670.200。结果显示LightGBM模型性能最优,基于SHAPLightGBM可解释模型重要性前5的是舒张压、年龄、糖尿病、LDL-C、吸烟。

结论       本研究建立的LightGBM模型预测效果良好,可为MIS患者2年内复发的预测提供借鉴。通过SHAP可解释性帮助临床医师更好地理解预测模型结果背后的原因,对MIS患者做出更个性化与合理化的临床决策。

文章导读:

本研究构建了LightGBMSVM预测MIS患者2年复发的机器学习模型,并证实其预测效能优于传统的LR构建的模型,基于LightGBM的可解释性机器学习模型可帮助临床医师更好地理解预测模型结果背后的原因,对MIS患者做出更个性化与合理化的临床决策。

关键词: 轻型缺血性卒中; 复发; 轻量梯度提升机; Shapley加性解释

Abstract:

Objective  To explore the risk factors related to the recurrence of minor ischemic stroke (MIS) within two years by using an interpretable machine learning model.

Methods  General data, laboratory results, imaging, and other data of patients with MIS in the Department of Neurology, Shanxi Cardiovascular Hospital from July to December 2020 were retrospectively collected. The risk factors for recurrence were screened by univariate analysis. Synthetic minority oversampling technique-nominal continuous treated the imbalance in the data. The data set was divided into a training set and a test set in a ratio of 82. Grid search 10-fold cross-validation to build light gradient boosting machine (LightGBM) and support vector machine (SVM) models. Compared with the logistic regression (LR) model, the discrimination and calibration degree of the model were evaluated based on the AUC and calibration curve, respectively. The model with the best performance was interpreted by the Shapley additive explanation (SHAP) model.

Results  A total of 520 patients with MIS were included in this study, and 93 (17.9%) relapsed within two years. The AUC of LightGBM, SVM, and LR predicted recurrence within 2 years in the test set were 0.935 (95%CI0.896-0.973), 0.833 (95%CI0.770-0.896), and 0.764 (95%CI0.691-0.835), respectively. The accuracy was 0.890, 0.773, 0.693, and the Brier score was 0.105, 0.167, and 0.200, respectively. The results showed that the LightGBM model had the best performance. The top 5 features of the SHAP-based LightGBM explanatory model were diastolic blood pressure, age, diabetes mellitus, LDL-C, and smoking.

Conclusions  The prediction effect of the LightGBM model established in this study is good, and it can provide a reference for predicting recurrence in patients with MIS within two years. SHAP interpretability helps clinicians better understand the reasons behind prediction model results and make more personalized and rational clinical decisions for patients with MIS.

Key words: Minor ischemic stroke; Recurrence; Light gradient boosting machine; Shapley additive explanation

中图分类号: