中国卒中杂志 ›› 2021, Vol. 16 ›› Issue (09): 895-900.DOI: 10.3969/j.issn.1673-5765.2021.09.005

• 论著 • 上一篇    下一篇

基于机器学习的缺血性卒中功能预后预测模型研究

上官艺, 王孟, 王春娟, 谷鸿秋, 赵性泉, 王伊龙, 王拥军, 李子孝   

  1. 1北京 100070首都医科大学附属北京天坛医院神经病学中心
    2国家神经系统疾病临床医学研究中心
    3中国医学科学院脑血管病人工智能研究创新单元
    4北京脑科学与类脑研究中心
  • 收稿日期:2021-05-06 出版日期:2021-09-20 发布日期:2021-09-20
  • 通讯作者: 李子孝 lizixiao2008@hotmail.com
  • 基金资助:
    北京市自然科学基金(Z200016)
    国家自然科学基金(92046016)
    “十三五”重点研发计划(2017YFC131901)
    北京市科委医药协同科技创新研究专项(Z201100005620010)
    中国医学科学院医学与健康科技创新工程项目(2019-I2M-5-029)
    北京市青年拔尖人才项目(2018000021223ZK03)

Machine Learning-based Models for Prediction of Functional Outcome of Ischemic Stroke

  • Received:2021-05-06 Online:2021-09-20 Published:2021-09-20

摘要: 目的 建立基于机器学习的缺血性卒中功能预后预测模型,为患者分层管理提供科学依据。 方法 选取中国国家卒中登记Ⅱ(China National Stoke Registry Ⅱ,CNSRⅡ)数据库中发病7 d内的缺 血性卒中患者为研究对象。logistic回归分析采用逐步回归方法筛选候选预测因子,机器学习采用Boruta 算法筛选特征。使用logistic回归和CatBoost、XGBoost、LightGBM三种机器学习方法构建功能预后预测 模型,并比较这四种预测模型对缺血性卒中患者3个月功能预后(mRS>2分为预后不良)的预测价值。 结果 本研究共纳入14 885例缺血性卒中患者,平均年龄64.34±11.71岁,其中男性占63.96% (9521/14 885)。患者按8∶2随机分为训练集(11 908例)和测试集(2977例),两组3个月功能预后 不良率分别为17.36%和17.06%(P =0.7045)。多因素分析结果显示年龄(OR 1.05,95%CI 1.04~1.05, P <0.0001)、男性(OR 0.77,95%CI 0.69~0.86,P <0.0001)、糖尿病(OR 1.16,95%CI 1.00~1.35, P = 0.0497)、脑血管病史(O R 1.53,95%C I 1.37~1.70,P <0.0001)、合并肺炎(O R 2.4 5, 95%CI 2.03~2.95,P <0.0001)、入院时NIHSS评分(OR 1.14,95%CI 1.13~1.15,P <0.0001)、发病前 mRS(OR 3.11,95%CI 2.67~3.63,P <0.0001)、LDL-C(OR 1.07,95%CI 1.02~1.12,P =0.0057)、空腹 血糖(OR 1.03,95%CI 1.01~1.06,P =0.0072)和白细胞计数(OR 1.07,95%CI 1.05~1.09,P <0.0001) 可作为预测模型的预测因子。logistic回归、CatBoost、XGBoost、LightGBM预测模型预测缺血性卒中功 能预后的AUC分别为0.815(0.801~0.829)、0.828(0.814~0.841)、0.826(0.812~0.839)和0.822 (0.808~0.836)。CatBoost(P =0.0023)和XGBoost(P =0.0182)建立的预测模型预测效果均优于传统 logistic回归模型。 结论 基于机器学习算法建立的缺血性卒中功能预后预测模型具有较高的预测价值。

文章导读: 本研究通过涵盖219家中心的CNSRⅡ数据库的大样本数据,检验了传统多因素logistic回归分析方法和基于
机器学习建立的缺血性卒中3个月预后模型的预测效能,结果提示通过CatBoost和XGBoost方法建立的预测模型预
测效果优于传统logistic回归模型。

关键词: 缺血性卒中; 预后; 预测模型; 机器学习

Abstract: Objective To establish machine learning-based models for prediction of functional outcome of ischemic stroke, and to provide scientific basis for stratified management of patients. Methods The patients with ischemic stroke within 7 days of onset in the China National Stroke Registry Ⅱ (CNSR Ⅱ) study were selected as the analyzed subjects. Predictors were screened by stepwise regression in logistic regression while by Boruta algorithm in machine learning. Then four outcome prediction models were constructed by three machine learning methods (CatBoost, XGBoost and LightGBM) and logistic regression, and the predictive value of the four models were

compared.

Results A total of 14 885 patients of ischemic stroke were included, with a mean age of 64.34±11.71 years old and 9521 males (63.96%). The patients were randomly divided into training set (n =11 908) and test set (n =2977) at a ratio of 8:2. The rate of poor functional outcome of the two sets were 17.36% and 17.06% (P =0.7045), respectively. Multivariate logistic regression analysis showed that predictors of the model were aged (OR 1.05, 95%CI 1.04-1.05, P <0.0001), male (OR 0.77, 95%CI 0.69-0.86, P <0.0001), history of diabetes (OR 1.16, 95%CI 1.00-1.35, P =0.0497) or cerebrovascular disease (OR 1.53, 95%CI 1.37-1.70, P <0.0001), complicated with pneumonia (OR 2.45, 95%CI 2.03-2.95, P <0.0001), NIHSS score at admission (OR 1.14, 95%CI 1.13-1.15, P <0.0001), premorbid mRS score (OR 3.11, 95%CI 2.67-3.63, P <0.0001), LDL-C (OR 1.07, 95%CI 1.02-1.12, P =0.0057), fasting blood glucose (OR 1.03, 95%CI 1.01-1.06, P =0.0072) and white blood cell count (OR 1.07, 95%CI 1.05-1.09, P <0.0001). The area under the ROC curve of CatBoost, XGBoost and LightGBM models, which was used to predict the functional outcome of ischemic stroke, were 0.828 (0.814- 0.841), 0.826 (0.812-0.839) and 0.822 (0.808-0.836), respectively, while that of logistic learning regression model was 0.815 (0.801-0.829). CatBoost (P =0.0023) and XGBoost (P =0.0182) models had better predictive function than logistic regression model. Conclusions The machine learning-based predictive models had high predictive value for functional outcome of ischemic stroke.

Key words: Ischemic stroke; Outcome; Predictive model; Machine learning