Chinese Journal of Stroke ›› 2023, Vol. 18 ›› Issue (12): 1397-1404.DOI: 10.3969/j.issn.1673-5765.2023.12.009

Previous Articles     Next Articles

Development and Validation of a Predictive Model for In-Hospital Recurrence Risk in Ischemic Stroke Patients

CHEN Siding1, JIANG Yingyu1, WANG Chunjuan2, YANG Xin2, LI Zixiao2, JIANG Yong1,3, WANG Yongjun1,4,5, GU Hongqiu1,2   

  • Received:2023-12-01 Online:2023-12-20 Published:2023-12-20

缺血性卒中患者院内复发风险预测模型开发与验证研究

陈思玎1,姜英玉1,王春娟2,杨昕2,李子孝2,姜勇1,3,王拥军1,4,5,谷鸿秋1,2   

  1. 1  北京 100070首都医科大学附属北京天坛医院,国家神经系统疾病临床医学研究中心
    2  首都医科大学附属北京天坛医院,国家神经系统疾病医疗质量控制中心
    3  北京大数据精准医疗高精尖创新中心(北京航空航天大学,首都医科大学)
    4  首都医科大学人脑保护高精尖创新中心
    5  中国科学院脑科学与智能技术卓越创新中心
  • 通讯作者: 谷鸿秋 guhongqiu@yeah.net 王拥军 yongjunwang@ncrcnd.org.cn
  • 基金资助:
    国家自然科学基金项目(72004146)
    北京市医院管理中心“青苗”人才计划(QML20210501)
    北京市医院管理中心“培育”人才计划(PX2021024)

Abstract: Objective  Develop a predictive model for in-hospital recurrence risk of ischemic stroke patients based on machine learning algorithms and externally validate it to provide insights for related research.
Methods  The development cohort was the China Stroke Center Alliance (CSCA) research cohort, and ischemic stroke patients in this cohort were randomly divided into training and internal validation sets in an 8∶2 ratio. The validation cohort was the Third China National Stroke Registry (CNSR-Ⅲ) research cohort. A list of candidate predictive factors was determined based on guidelines, literature, and data, followed by selection using least absolute shrinkage and selection operator (LASSO) regression. A predictive model for the in-hospital recurrence risk of ischemic stroke patients was developed using logistic regression and machine learning algorithms[random forest model, eXtreme gradient boosting (XGBoost) model, light gradient boosting machine (LightGBM) model]. Model evaluation primarily focused on discrimination (C-statistic) and calibration (Brier score). 
Results  The CSCA research cohort included 1 587 779 cases of ischemic stroke patients, with 99 085 cases of in-hospital recurrence (6.2%). The CNSR-Ⅲ research cohort included 14 146 cases of ischemic stroke patients, with 623 cases of in-hospital recurrence (4.4%). LASSO feature selection revealed that age, gender, stroke history, hypertension, diabetes, lipid metabolism disorders, atrial fibrillation, heart failure, coronary artery heart disease, peripheral vascular disease, LDL-C, fasting blood glucose, serum creatinine and in-hospital anticoagulation therapy were important predictive factors for predicting in-hospital recurrence of ischemic stroke patients. In internal validation, the discrimination of each model was around 0.75, with XGBoost model slightly outperforming other models (AUC 0.765, 95%CI 0.759-0.770), and the Brier scores for all models were around 0.05. In external validation, the predictive performance of all models was relatively low (AUC<0.60), with Brier scores for all models less than 0.08. 
Conclusions  In the limited context of the number and dimensions of predictive factors, the efficacy of logistic models and machine learning algorithms in predicting the recurrence risk of stroke was relatively low. Future exploration should involve more investigation into predictive factors and algorithm models.

Key words: Ischemic stroke; In-hospital recurrence; Predictive model; Machine learning

摘要: 目的 开发基于机器学习算法的缺血性卒中患者院内复发风险的预测模型,并进行外部验证,为相关研究提供借鉴。
方法 开发队列为中国卒中联盟(China Stroke Center Alliance,CSCA)研究队列,将此队列中的缺血性卒中患者按照8∶2随机划分为训练集和内部验证集。验证队列为第3次中国国家卒中登记(the third China national stroke registry,CNSR-Ⅲ)研究队列。基于指南、文献回顾,确定备选预测因子,然后采用拉索(least absolute shrinkage and selection operator,LASSO)回归进行筛选。基于logistic回归模型以及机器学习算法[随机森林模型、极端梯度提升(extreme gradient boosting,XGBoost)、轻量级梯度提升机器学习(light gradient boosting machine,LightGBM)模型]开发缺血性卒中患者院内复发风险预测模型。评价模型区分度(C统计量)和校准度(Brier得分)两方面的指标。 
结果 CSCA研究队列共纳入1 587 779例缺血性卒中患者,其中院内复发99 085例(6.2%)。CNSR-Ⅲ研究队列共纳入14 146例缺血性卒中患者,其中院内复发623例(4.4%)。LASSO回归选择出年龄、性别、卒中病史、高血压、糖尿病、脂质代谢紊乱、心房颤动、心力衰竭、冠心病、周围血管病、LDL-C、空腹血糖、血清肌酐以及院内抗栓治疗作为缺血性卒中院内复发的预测因子。内部验证中,各模型的区分度均在0.75左右,XGBoost模型的区分度(AUC 0.765,95%CI 0.759~0.770)略高于其他模型,各模型的Brier分数均在0.05左右。外部验证中,所有模型的预测效能均较低(AUC<0.60),各模型的Brier分数均<0.08。
结论 在预测因子数量和维度有限的情况下,logistic回归模型和机器学习算法预测缺血性卒中院内复发风险的效能均较低。未来需从预测因子和算法模型上做更多探索。

关键词: 缺血性卒中; 院内复发; 预测模型; 机器学习