Chinese Journal of Stroke ›› 2020, Vol. 15 ›› Issue (06): 587-594.DOI: 10.3969/j.issn.1673-5765.2020.06.003

Previous Articles     Next Articles

Comparison of Prediction Models for In-hospital Stroke Recurrence in Patients with Ischemic Stroke Based on Logistic Regression and XGBoost Methods

  

  • Received:2020-03-01 Online:2020-06-20 Published:2020-06-20

基于Logistic回归与XGBoost构建缺血性卒中院内复发风险预测模型的初步比较研究

谷鸿秋,王春娟,李子孝,王伊龙,王拥军,姜勇   

  1. 1100070 北京首都医科大学附属北京天坛医院;国家神经系统疾病临床医学研究中心
    2国家神经系统疾病医疗质量控制中心
    3北京大数据精准医疗高精尖创新中心(北京航空航天大学&首都医科大学)
  • 通讯作者: 姜勇jiangyong@ncrcnd.org.cn
  • 基金资助:

    “十三五”国家重点研发计划(2016YFC0901001,2017YFC1310901,2016YFC0901002,2017YFC1307905,2015BAI12B00)

    中国医学科学院脑血管病人工智能研究创新单元(2019RU018)
    北京市科学技术委员会基于人工智能的脑血管病临床诊疗决策研究(Z201100005620010)
    北京市百千万人才工程(2018A13)
    北京市青年拔尖人才项目(2018000021223ZK03)

Abstract:

Objective To compare prediction models for in-hospital stroke recurrence in patients with ischemic stroke based on logistic regression and XGBoost methods. Methods Data of ischemic stroke inpatients discharged according to medical advice from China National Stroke Registry Ⅱ (CNSR Ⅱ) database were retrospectively analyzed. Logistic regression and XGBoost methods were used to develop a model for predicting in-hospital stroke recurrence. Candidate predictors included demographic characteristics, stroke severity, medical history, medication history, and clinical measure indicators. The performance measures of the predictive models included the area under the receiver operating characteristic curve (AUC), calibration intercept, calibration slope and Brier score. All statistical analysis was performed using R (version 3.6.2). Results A total of 17 227 eligible patients were included in this analysis. The mean age was 64.72±11.84 years, and 6317 (36.7%) cases were females. A total of 14 482 (84.1%) patients had a mRS score of 0 or 1 point before symptoms onset, and the NIHSS score at admission was 4 (2-6). A total of 444 (2.6%) patients had recurrent stroke during hospitalization. The three leading strong predictors were mRS score, atrial fibrillation and stroke history in logistic regression model, and mRS score, atrial fibrillation and total cholesterol in XGBoost model. No significant difference was found in AUC between logistic regression model and XGBoost model (0.63, 95%CI 0.58-0.68 vs 0.64, 95%CI 0.59-0.68, P =0.9229). The calibration intercept, calibration slope and Brier score in logistic regression model were -0.81, 0.76 and 0.03, respectively; and were -1.37, 1.20 and 0.38 in XGBoost model. Logistic regression model had better calibration than XGBoost model. Conclusions No significant difference was found in discrimination between logistic-based prediction model and XGBoost-based prediction model for in-hospital stroke recurrence constructed using data of CNSR II, while logistic-based prediction model had better calibration.

Key words: Ischemic stroke; In-hospital stroke recurrence; Prediction model

摘要:

目的 基于Logistic回归和XGBoost方法构建缺血性卒中院内复发风险预测模型,并进行初步比较。 方法 利用中国国家卒中登记Ⅱ(China National Stoke Registry Ⅱ,CNSRⅡ)数据库中按医嘱离院的 缺血性卒中患者数据,分别基于Logistic回归和XGBoost方法构建缺血性卒中院内复发风险预测模型。 备选的预测因子包括人口学特征、卒中严重程度、既往病史、用药史以及临床测量指标。模型的评价 指标包括ROC曲线下面积(area under the cure,AUC)、校准截距、校准斜率以及Brier得分。所有统计 分析均在R(3.6.2版)中完成。 结果 最终纳入17 227例符合条件的患者,平均年龄64.72±11.84岁,女性6317例(36.7%),发病前 mRS评分为0或1分的病例14 482例(84.1%),入院NIHSS评分4(2~6)分,院内卒中复发444例(2.6%)。 预测模型识别的前三位强预测因子,在Logistic回归中为发病前mRS评分、心房颤动及卒中史;在 XGBoost中为发病前mRS评分、心房颤动及总胆固醇。Logistic回归预测模型与XGBoost预测模型的AUC无 显著差异(0.63,95%CI 0.58~0.68 vs 0.64,95%CI 0.59~0.68,P =0.9229)。Logistic预测模型校准截 距、校准斜率以及Brier得分分别为-0.81、0.76和0.03;XGBoost预测模型的校准截距、校准斜率以及 Brier得分分别为-1.37、1.20和0.38。Logistic预测模型校准度更好。 结论 利用CNSRⅡ数据构建的缺血性卒中院内复发风险预测模型应用中,基于XGBoost方法构建的 预测模型相比Logistic回归构建的预测模型的区分度没有显著差异,但校准度略低。

关键词: 缺血性卒中; 院内复发; 预测模型