基于可解释性机器学习模型的轻型缺血性卒中复发预测研究

doi:10.3969/j.issn.1673-5765.2024.08.010

摘要/Abstract

摘要：

目的利用可解释的机器学习模型，探讨轻型缺血性卒中（minor ischemic stroke，MIS）2年内复发相关危险因素。

方法回顾性收集2020年7—12月山西省心血管病医院神经内科MIS患者一般资料、实验室结果、影像学等资料，单因素分析进行复发危险因素变量筛选，合成少数过采样技术-标称连续处理数据不平衡，数据集按8∶2的比例分为训练集与测试集，网格搜索10折交叉验证构建轻量梯度提升机（light gradient boosting machine，LightGBM）、支持向量机（support vector machine，SVM）模型，并与逻辑回归（logistic regression，LR）模型进行比较，基于ROC的AUC、校准曲线分别评价模型的区分度与校准度，性能最好的模型通过Shapley加性解释（Shapley additive explanation，SHAP）模型对预测结果进行解读。

结果本研究共纳入520例MIS患者，2年内复发93例（17.9%），测试集中LightGBM、SVM、LR预测患者2年内复发的AUC分别为0.935（95%CI 0.896～0.973）、0.833（95%CI 0.770～0.896）、0.764（95%CI 0.691～0.835），准确度分别为0.890、0.773、0.693，布里尔分数分别为0.105、0.167、0.200。结果显示LightGBM模型性能最优，基于SHAP的LightGBM可解释模型重要性前5的是舒张压、年龄、糖尿病、LDL-C、吸烟。

结论本研究建立的LightGBM模型预测效果良好，可为MIS患者2年内复发的预测提供借鉴。通过SHAP可解释性帮助临床医师更好地理解预测模型结果背后的原因，对MIS患者做出更个性化与合理化的临床决策。

文章导读：

本研究构建了LightGBM、SVM预测MIS患者2年复发的机器学习模型，并证实其预测效能优于传统的LR构建的模型，基于LightGBM的可解释性机器学习模型可帮助临床医师更好地理解预测模型结果背后的原因，对MIS患者做出更个性化与合理化的临床决策。

关键词: 轻型缺血性卒中; 复发; 轻量梯度提升机; Shapley加性解释

Abstract:

Objective To explore the risk factors related to the recurrence of minor ischemic stroke (MIS) within two years by using an interpretable machine learning model.

Methods General data, laboratory results, imaging, and other data of patients with MIS in the Department of Neurology, Shanxi Cardiovascular Hospital from July to December 2020 were retrospectively collected. The risk factors for recurrence were screened by univariate analysis. Synthetic minority oversampling technique-nominal continuous treated the imbalance in the data. The data set was divided into a training set and a test set in a ratio of 8∶2. Grid search 10-fold cross-validation to build light gradient boosting machine (LightGBM) and support vector machine (SVM) models. Compared with the logistic regression (LR) model, the discrimination and calibration degree of the model were evaluated based on the AUC and calibration curve, respectively. The model with the best performance was interpreted by the Shapley additive explanation (SHAP) model.

Results A total of 520 patients with MIS were included in this study, and 93 (17.9%) relapsed within two years. The AUC of LightGBM, SVM, and LR predicted recurrence within 2 years in the test set were 0.935 (95%CI 0.896-0.973), 0.833 (95%CI 0.770-0.896), and 0.764 (95%CI 0.691-0.835), respectively. The accuracy was 0.890, 0.773, 0.693, and the Brier score was 0.105, 0.167, and 0.200, respectively. The results showed that the LightGBM model had the best performance. The top 5 features of the SHAP-based LightGBM explanatory model were diastolic blood pressure, age, diabetes mellitus, LDL-C, and smoking.

Conclusions The prediction effect of the LightGBM model established in this study is good, and it can provide a reference for predicting recurrence in patients with MIS within two years. SHAP interpretability helps clinicians better understand the reasons behind prediction model results and make more personalized and rational clinical decisions for patients with MIS.

Key words: Minor ischemic stroke; Recurrence; Light gradient boosting machine; Shapley additive explanation

中图分类号:

莫秋红, 丁晓波, 李靓, 张岩波, 李伟荣. 基于可解释性机器学习模型的轻型缺血性卒中复发预测研究[J]. 中国卒中杂志, 2024, 19(8): 924-930.

MO Qiuhong, DING Xiaobo, LI Jing, ZHANG Yanbo, LI Weirong.

Research on Prediction of Recurrence of Minor Ischemic Stroke Based on Interpretable Machine Learning Models [J]. Chinese Journal of Stroke, 2024, 19(8): 924-930.

参考文献

[1] FEIGIN V L，NGUYEN G，CERCY K，et al. Global，regional，and country-specific lifetime risks of stroke，1990 and 2016[J]. N Engl J Med，2018，379（25）：2429-2437.

[2] HOBEANU C，LAVALLéE PC，CHARLES H，et al. Risk of subsequent disabling or fatal stroke in patients with transient ischaemic attack or minor ischaemic stroke：an international，prospective cohort study[J]. Lancet Neurol，2022，21（10）：889-898.

[3] LI J J，LIN J X，PAN Y S，et al. Interleukin-6 and YKL-40 predicted recurrent stroke after ischemic stroke or TIA：analysis of 6 inflammation biomarkers in a prospective cohort study[J/OL]. Neuroinflammation，2022，19（1）：131[2024-01-01]. https://doi.org/10.1186/s12974-022-02467-1.

[4] FISCHER U，BAUMGARTNER A，ARNOLD M，et al. What is a minor stroke？[J]. Stroke，2010，41（4）：661-666.

[5] YU X F，YIN W W，HUANG C J，et al. Risk factors for relapse and nomogram for relapse probability prediction in patients with minor ischemic stroke[J]. World J Clin Cases，2021，9（31）：9440-9451.

[6] WANG Y L，PAN Y S，ZHAO X Q，et al. Recurrent stroke was associated with poor quality of life in patients with transient ischemic attack or minor stroke：finding from the CHANCE trial[J]. CNS Neurosci Ther，2014，20（12）：1029-1035.

[7] ATHANASIOU M，SFRINTZERI K，ZARKOGIANNI K，et al. An explainable XGBoost-based approach towards assessing the risk of cardiovascular disease in patients with type 2 diabetes mellitus[C]//2020 IEEE 20th International Conference on Bioinformatics and Bioengineering（BIBE）. IEEE，2020：859-864
[2024-01-01]. https://doi.org/10.1109/BIBE.2007.4375537.

[8] BERNARD D，DOUMARD E，ADER I，et al. Explainable machine learning framework to predict personalized physiological aging[J/OL]. Aging Cell，2023，22（8）：e13872[2024-01-01]. https://doi.org/10.1111/acel.13872.

[9] ZHANG C Q，ZHAO X Q，WANG C X，et al. Prediction factors of recurrent ischemic events in one year after minor stroke[J/OL]. PLoS One，2015，10（3）：e0120105[2024-01-01]. https://doi.org/10.1371/journal.pone.0120105.

[10] ZHANG K L，FANG Y L，FAN H M，et al. A nomogram for predicting the in-hospital risk of recurrence among patients with minor non-cardiac stroke[J]. Curr Med Res Opin，2022，38（4）：487-499.

[11] 陈晨. 小卒中后再发脑梗死风险预警研究[D]. 太原：山西医科大学，2018.

CHEN C. Study on risk warning of recurrent cerebral infarction after the stroke[D]. Taiyuan：Shanxi Medical University，2018.

[12] LUNDBERG S M，LEE S I. A unified approach to interpreting model predictions[J/OL]. Adv Neural Inf Process Syst，2017，30[2024-01-01]. https://doi.org/10.48550/arXiv.1705.07874.

[13] 杨弘，田晶，王可，等. 混合型缺失数据填补方法比较与应用[J]. 中国卫生统计，2020，37（3）：395-399.

YANG H，TIAN J，WANG K，et al. Comparison and application of hybrid missing data filling methods[J]. Chinese Journal of Health Statistics，2019，37（3）：395-399.

[14] KOBAYASHI Y，YOSHIDA K. Quantitative structure-property relationships for the calculation of the soil adsorption coefficient using machine learning algorithms with calculated chemical properties from open-source software[J/OL]. Environ Res，2021，196：110363
[2024-01-01]. https://doi.org/10.1016/j.envres.2020.110363.

[15] WANG K，TIAN J，ZHENG C，et al. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP[J/OL]. Comput Biol Med，2021，137：104813[2024-01-01]. https://doi.org/10.1016/j.
compbiomed.2021.104813.

[16] MATSUMOTO K，JIN Z Z，HOMMA S，et al. Office，central，and ambulatory blood pressure for predicting first stroke in older adults：a community-based cohort study[J]. Hypertension，2021，78（3）：851-858.

[17] GUTIERREZ J，ELKIND M S，CHEUNG K，et al.
Pulsatile and steady components of blood pressure and subclinical cerebrovascular disease：the Northern Manhattan study[J]. J Hypertens，2015，33（10）：2115-2122.

[18] CHEN C L，HUANG J Y，LIU L，et al. Relationship between diastolic blood pressure and the first ischaemic stroke in elderly patients with hypertension[J]. Postgrad Med J，2020，96（1139）：525-529.

[19] SUN C，LI X，SONG B L，et al. A nade nomogram to predict the probability of 6-month unfavorable outcome in Chinese patients with ischemic stroke[J/OL]. BMC Neurol，2019，19（1）：274[2024-01-01]. https://doi.org/10.1186/s12883-019-1464-6.

[20] LIU Z Z，LIN W J，LU Q L，et al. Risk factors affecting the 1-year outcomes of minor ischemic stroke：results from Xi’an stroke registry study of China[J/OL]. BMC Neurol，2020，20（1）：379[2024-01-01]. https://doi.org/10.1186/s12883-020-01954-3.

[21] 游文霞，钟剑萍，黄美媚，等. 缺血性小卒中患者两年复发卒中的危险因素分析[J]. 中国医药科学，2019，9（10）：11-14，46.

YOU W X，ZHONG J P，HUANG M M，et al. Analysis of risk factors for recurrent stroke in patients with minor ischemic stroke within two years[J]. China Medicine and Pharmacy，2019，9（10）：11-14，46.

[22] SYKES G P，KAMTCHUM-TATUENE J，FALCIONE S，et al. Aging immune system in acute ischemic stroke：a transcriptomic analysis[J]. Stroke，2021，52（4）：1355-1361.

[23] ZHANG R，WANG J F. Machine learning-based prediction of subsequent vascular events after 6 months in Chinese patients with minor ischemic stroke[J/OL]. Int J Gen Med，2022，15：3797-3808[2024-01-01]. https://doi.org/10.2147/IJGM.S356373.

[24] XU J，ZHANG X，JIN A M，et al. Trends and risk factors associated with stroke recurrence in China，2007—2018[J/OL]. JAMA Netw Open，2022，5（6）：e2216341[2024-01-01]. https://doi.org/10.1001/jamanetworkopen.2022.16341.