中国卒中杂志 ›› 2022, Vol. 17 ›› Issue (07): 730-736.DOI: 10.3969/j.issn.1673-5765.2022.07.009

• 论著 • 上一篇    下一篇

基于机器学习预测血糖异常急性缺血性卒中患者预后模型研究

  

  1. 北京 100070首都医科大学附属北京天坛医院;国家神经系统疾病临床医学研究中心
  • 收稿日期:2022-02-20 出版日期:2022-07-20 发布日期:2022-07-20
  • 通讯作者: 王拥军yongjunwang@ncrcnd.org.cn
  • 基金资助:

    国家自然科学基金面上项目(82171269) 

    北京市科技新星计划(Z201100006820076) 

    中国博士后科学基金特别资助(2020T130437)

Prediction of Clinical Outcome of Acute Ischemic Stroke Patients with Hyperglycemia Based on Machine Learning Model

  • Received:2022-02-20 Online:2022-07-20 Published:2022-07-20

摘要:

目的     建立基于机器学习的血糖异常急性缺血性卒中患者的预后预测模型,比较传统logistic模型与机器学习模型的预测效能。

方法     以中国国家卒中登记研究Ⅲ(China national stroke registration study III,CNSR-Ⅲ)血糖异常急性缺血性卒中患者为研究对象,采用病例报告表收集患者的人口学信息、既往病史、实验室检查、头颅影像学检查、卒中病因分型等临床资料。采用分层10折交叉验证划分训练集(3325例)和测试集(369例),基于随机森林、梯度提升决策树(gradient boosted decision trees,GBDT)、极致梯度提升(eXtreme Gradient Boosting,XGBoost)等机器学习方法和传统logistic回归方法构建3个月功能预后不良(mRS≥3分)的预测模型。采用ROC的AUC评价效度,Brier分数评价校准度,同时综合F1分数、准确率、灵敏度及特异度等指标评价不同模型的预测效果。

结果     共纳入3694例血糖异常的急性缺血性卒中患者,平均年龄62.4±10.4岁,男性2408例(65.2%),3个月预后不良585例(15.8%)。logistic回归、随机森林、GBDT和XGBoost模型预测患者3个月预后不良的AUC分别为0.843(0.814~0.872)、0.847(0.823~0.871)、0.845(0.819~0.871)、0.848(0.820~0.876),灵敏度分别为0.373(0.340~0.405)、0.679(0.629~0.728)、0.426(0.383~0.468)、0.634(0.583~0.686)。机器学习模型的AUC有高于logistic回归模型的趋势,但差异没有统计学意义,机器学习模型的灵敏度较传统logistic回归模型好(均P<0.05),所有模型的Brier分数提示校准度均良好(0.094~0.138)。

结论     传统logistic回归模型与机器学习模型对血糖异常的急性缺血性卒中患者3个月预后不良均有较高的预测价值,且区分度没有显著差异。本研究结果有待应用于更大样本量的队列进行验证。

文章导读: 本研究基于覆盖全国大部分省级行政区的多中心、大样本数据构建了基于机器学习的模型,验证显示机器学习模型预测血糖异常的急性缺血性卒中患者发病3个月预后的区分度较好,有优于传统logistic回归模型的趋势。

关键词: 缺血性卒中; 功能预后; 机器学习; 预测模型

Abstract:

Objective  To establish the prediction models of prognosis of acute ischemic stroke patients with hyperglycemia based on machine learning, and to compare the prediction performance of traditional logistic model and machine learning model.

Methods  This study included the patients from the China National Stroke Registration Ⅲ. The baseline information including patients' demographic characteristics, medical history, laboratory tests, head magnetic resonance imaging results and stroke etiology classification were collected case report forms. The cases were divided into the training set (3325 patients) and test set (369 patients) using stratified 10-fold cross-validation. Poor clinical outcome was defined as a modified Rankin score of 3-6 at 3-month follow-up. Machine learning methods such as random forest model, GBDT (Gradient Boosted Decision Trees) model, XGBoost (eXtreme Gradient Boosting) model, and traditional logistic model were used to construct the 3-month poor prognosis prediction models. The area under the receiver operating characteristic curve (AUC) was used to evaluate the degree of discrimination, and the Brier score was used to evaluate the degree of calibration.

Results  A total of 3694 acute ischemic stroke patients with hyperglycemia were included, with an average age of 62.4±10.4 years and 2408 males (65.2%). There were 585 patients (15.8%) with poor prognosis at 3 months. The AUCs of logistic model, random forest model, GBDT model and XGBoost model for predicting 3-month poor prognosis were 0.843 (0.814-0.872), 0.847 (0.823-0.871), 0.845 (0.819-0.871), 0.848 (0.820-0.876), respectively. The sensitivity of logistic model, random forest model, GBDT model and XGBoost model were 0.373 (0.340-0.405), 0.679 (0.629-0.728), 0.426 (0.383-0.468), 0.634 (0.583-0.686), respectively. Although the AUC of the machine learning model was higher than that of the logistic model, the difference was not statistically significant (P>0.05). The sensitivity of the machine learning model was better than that of the logistic model (all P<0.05), and the calibration of all models were good (0.094-0.138).

Conclusions  The traditional logistic model and machine learning model have high predictive value in predicting 3-month poor prognosis of acute ischemic stroke patients with hyperglycemia, and there is no significant difference in discrimination. The results of this study need to be validated in a larger sample size cohort.

Key words: Ischemic stroke; Functional prognosis; Machine learning; Prediction model