基于机器学习的新发急性缺血性卒中患者1年功能预后预测研究

doi:10.3969/j.issn.1673-5765.2022.03.008

摘要/Abstract

摘要：

目的建立基于机器学习算法的新发急性缺血性卒中（acute ischemic stroke，AIS）患者1年预后的预测模型，为相关研究和临床工作提供借鉴。

方法以中国国家卒中登记（China national stoke registry，CNSR）数据库中的新发AIS患者为研究对象，通过logistic回归模型确定进入模型的预测因子，分别基于机器学习[CatBoost模型、XGBoost模型、梯度提升决策树（gradient boosted decision trees，GBDT）模型、随机森林模型]和传统logistic回归模型构建新发AIS患者1年预后不良（mRS≥3分）的预测模型。将入组患者按照7∶3的比例随机分为训练集和测试集。训练集用于模型训练和参数优化，测试集用于评价模型效果。评价各模型预测预后不良效能的区分度指标主要是AUC，校准度指标为Brier分数。

结果共纳入8230例新发AIS患者，平均64.4±12.8岁，其中女性3113例，1年预后不良患者2360 例。针对训练集的logistic回归结果显示，年龄、女性、卒中前mRS≥3分、入院和出院时NIHSS、肢体运动功能障碍、周围血管病史、入院血糖、调节血脂药物（出院带药）、抗血小板药（1年服药依从）可作为1年预后不良的预测因子。Catboost、XGBoost、GBDT、随机森林及logistic回归模型预测新发AIS患者1年功能预后的AUC分别为0.857（0.850～0.864）、0.856（0.850～0.863）、0.856（0.848～0.864）、 0.853（0.846～0.859）和0.846（0.837～0.855）。Catboost（P =0.0130）、XGBoost（P =0.0133）、GBDT （P =0.0229）和随机森林（P =0.0429）等机器学习模型的预测效能均优于logistic回归模型；所有预测模型的校准度均良好。

结论基于机器学习算法建立的新发AIS患者1年功能预后预测模型具有较高的预测价值,其中 Catboost模型的预测效果最好。

文章导读： 机器学习算法模型因其可以处理海量数据，优化参数的配置，已经成为卒中风险和预后预测研究的新兴技术，本研究构建了CatBoost、XGBoost、GBDT和随机森林模型4种预测AIS患者1年预后的机器模型，并证实其预测效能优于传统的logistic回归构建的模型。

关键词: 缺血性卒中; 1年功能预后; 预测模型; 机器学习

Abstract:

Objective To establish a 1-year functional outcome prediction model for new-onset acute ischemic stroke (AIS) patients based on machine learning algorithms, to provide reference for related research and clinical work.

Methods This study was based on the data of new-onset AIS patients from China national stroke registry (CNSR) database. Based on machine learning [CatBoost model, XGBoost model, GBDT model, randomized forest model] and traditional logistic regression model, the 1-year poor prognosis (mRS≥3) prediction models for new-onset AIS patients were constructed. According to the ratio of 7:3, the patients were randomly divided into training set and test set. The training set was used for model training and parameter optimization, and the test set was used to evaluate the prediction value of the models. The evaluation indicators were mainly the AUC in the discrimination index and the Brier score in the calibration index.

Results A total of 8230 eligible patients were included, with a mean age of 64.4±12.8 years old and 3113 females (38.7%), and 2360 patients with 1-year poor prognosis. Multivariate analysis showed that aging, female, mRS≥3 before stroke onset, NIHSS score at admission and discharge, limb dysfunction, history of peripheral vascular disease, blood glucose at admission, blood lipidregulating drugs (with medications at discharge) , antiplatelet drugs (1-year medication compliance) were predictors for 1-year poor prognosis. The AUC of Catboost, XGBoost, GBDT, random forest and logistic regression models for predicting 1-year functional prognosis of new-onset AIS patients were 0.857 (0.850-0.864), 0.856 (0.850-0.863), 0.856 (0.848-0.864), 0.853 (0.846-0.859) and 0.846 (0.837-0.855), respectively. The prediction performance of machine learning-based prediction models were all superior than that of logistic regression model (Catboost vs . logistic, P =0.0130, XGBoost vs . logistic, P =0.0133, GBDT vs . logistic, P =0.0229, random forest vs . logistic, P =0.0429), and the calibration of each model was good.

Conclusions The 1-year functional prognosis prediction models of new-onset AIS patients based on machine learning algorithm had high predictive value, and the Catboost model has the best prediction effect.

Key words: Ischemic stroke; 1-year functional prognosis; Prediction model; Machine learning

陈思玎, 俞蔚然, 黄馨莹, 刘欢, 姜勇, 王拥军, . 基于机器学习的新发急性缺血性卒中患者1年功能预后预测研究[J]. 中国卒中杂志, 2022, 17(03): 265-271.

CHEN Siding, Yu Weiran, HUANG Xinying, LIU Huan, JIANG Yong, WANG Yongjun. 1-Year Functional Outcome Prediction of New-Onset Acute Ischemic Stroke Patients Based on Machine Learning[J]. Chinese Journal of Stroke, 2022, 17(03): 265-271.

参考文献

[1] GBD 2017 Causes of Death Collaborators. Global，regional，and national age-sex-specific mortality for 282 causes of death in 195 countries and territories，1980-2017：a systematic analysis for the global burden of disease study 2017[J]. Lancet，2018，392（10159）：1736-1788.
[2] 王拥军，李子孝，谷鸿秋，等. 中国卒中报告2019（中文版）（1）[J]. 中国卒中杂志，2020，15（10）：1037-1043.
[3] 马林，巢宝华，曹雷，等. 2007—2017年中国脑卒中流行趋势及特征分析[J]. 中华脑血管病杂志（电子版），2020，14（5）：253-258.
[4] POWERS W J，RABINSTEIN A A，ACKERSON T，et al. Guidelines for the early management of patients with acute ischemic stroke：2019 update to the 2018 guidelines for the early management of acute ischemic stroke：a guideline for healthcare professionals from the American Heart Association/American Stroke Association[J/OL]. Stroke，2019，50
（12）：e344-e418[2021-12-09]. https://doi.org/10.1161/str.0000000000000211.
[5] CHAUDHARY D，ABEDI V，LI J，et al. Clinical risk score for predicting recurrence following a cerebral ischemic event[J/OL]. Front Neurol，2019，10：1106[2021-12-09]. https://doi.org/10.3389/fneur.2019.01106.
[6] PROKHORENKOVA L，GUSEV G，VOROBEV A，et al. CatBoost：unbiased boosting with categorical feat ures[C/OL]. Neur IPS，2018，Montreal， 2018：6638-6648[2021-02-05]. https://arxiv.org/pdf/1706.09516v5.pdf.
[7] CHEN T Q，GUESTRIN C. Xgboost：a scalable tree boosting system[C]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York：ACM，2016：785-794.

[8] PENG T，CHEN X Y，WAN M，et al. The prediction of Hepatitis E through ensemble learning[J/OL]. Int J Environ Res Public Health，2021，18（1）：159[2021-12-09]. https://doi.org/10.3390/ijerph18010159.

[9] FRIEDMAN J H. Greedy function approximation：a gradient boosting machine[J]. Annals of Statistics，2000，29（5）：1189-1232.
[10] WANG C S，CHEN X X，DU L D，et al. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease[J/OL]. Comput Methods Programs Biomed，2020，188：105267[2021-12-09]. https://doi.org/10.1016/j.cmpb.2019.105267.
[11] NTAIOS G，FAOUZI M，FERRARI J，et al. An integer-based score to predict functional outcome in acute ischemic stroke：the ASTRAL score[J]. Neurology，2012，78（24）：1916-1922.
[12] MICHEL P，ODIER C，RUTGERS M，et al. The acute stroke registry and analysis of Lausanne（ASTRAL）: design and baseline analysis of an ischemic stroke registry including acute multimodal imaging[J]. Stroke，2010，41（11）：2491-2498.
[13] SAPOSNIK G，KAPRAL M K，LIU Y，et al. IScore：a risk score to predict death early after hospitalization for an acute ischemic stroke[J]. Circulation，2011，123（7）：739-749.
[14] CÔTÉ R，HACHINSKI V C，SHURVELL B L，et al. The Canadian neurological scale：a preliminary study in acute stroke[J]. Stroke，1986，17（4）：731-737.
[15] FLINT A C，CULLEN S P，FAIGELES B S，et al. Predicting long-term outcome after endovascular stroke treatment：the totaled health risks in vascular events score[J]. AJNR Am J Neuroradiol，2010，31（7）：1192-1196.
[16] SMITH W S，SUNG G，STARKMAN S，et al. Safety and efficacy of mechanical embolectomy in acute ischemic stroke：results of the MERCI trial[J]. Stroke，2005，36（7）：1432-1438.
[17] SMITH W S，SUNG G，SAVER J，et al. Mechanical thrombectomy for acute ischemic stroke-final results of the multi MERCI trial[J]. Stroke，2008，39（4）：1205-1212.
[18] COORAY C，MAZYA M，BOTTAI M，et al. External validation of the ASTRAL and DRAGON scores for prediction of functional outcome in stroke[J]. Stroke，2016，47（6）：1493-1499.