中国卒中杂志 ›› 2020, Vol. 15 ›› Issue (03): 243-249.DOI: 10.3969/j.issn.1673-5765.2020.03.004

• 论著 • 上一篇    下一篇

基于机器学习算法的脑出血相关肺炎预测模型研究

王孟,覃露,王春娟,李姣,王伊龙,赵性泉,王拥军,李子孝   

  1. 1100070 北京首都医科大学附属北京天坛医院神经病学中心;国家神经系统疾病临床医学研究中心;北京脑重大疾病研究院脑卒中研究所;脑血管病转化医学北京市重点实验室
    2国家神经系统疾病医疗质量控制中心
    3中国医学科学院/北京协和医学院医学信息研究所
  • 收稿日期:2020-01-10 出版日期:2020-03-20 发布日期:2020-03-20
  • 通讯作者: 李子孝 lizixiao2008@hotmail.com

Machine Learning-based Models for Prediction of Intracerebral Hemorrhage Associated Pneumonia

  • Received:2020-01-10 Online:2020-03-20 Published:2020-03-20

摘要:

目的 建立基于机器学习的脑出血相关肺炎预测模型。 方法 选择中国国家卒中登记Ⅱ(China National Stoke Registry Ⅱ,CNSRⅡ)数据库中发病7 d内的急 性脑出血住院患者为研究对象,登记时间为2012年5月-2013年1月,研究覆盖我国219家医院。研究对 象按照8∶2比例随机分为训练集和测试集。采用多因素Logistic回归分析,筛选出候选预测因子。应用 基于机器学习的Logistic回归、CatBoost、XGBoost和LightGBM算法构建诊断预测模型,比较4种方法构建 的模型对脑出血相关肺炎的预测诊断价值。 结果 本研究共筛选2303例患者,平均年龄62.1±12.7岁,其中男性占62.1%。患者随机分为训 练集(n =1841)和测试集(n =462),两组脑出血相关肺炎发生率分别为15.6%和15.8%(χ 2=0.007, P =0.934)。根据多因素Logistic回归分析,候选预测因子为年龄(OR 1.03,95%CI 1.02~1.04)、NIHSS 评分(OR 1.02,95%CI 1.00~1.04)、白细胞计数(OR 1.11,95%CI 1.07~1.16)和吞咽功能障碍(OR 6.85,95%CI 5.01~9.39)。Logistic回归、CatBoost、XGBoost和LightGBM四种模型灵敏度分别为75.34%、 50.68%、80.82%和80.82%;特异度分别为68.64%、86.12%、52.96%和57.33%;ROC曲线下面积分别 为0.776、0.692、0.736和0.767。Logistic回归和LightGBM模型诊断效果显著高于CatBoost和XGBoost模型 (DeLong test,P <0.05)。 结论 基于机器学习建立的脑出血相关肺炎风险预测模型有较高的诊断价值,年龄、NIHSS评分、白 细胞计数和吞咽功能障碍为模型的候选预测因子,可将模型纳入脑出血相关肺炎诊断决策。本研究 结果的临床应用价值有待于更大样本的外部队列进行验证。

文章导读: 基于机器学习的方法,结合实验室检查指标,可优化脑出血相关肺炎预测模型。

关键词: 卒中相关肺炎; 预测模型; 机器学习

Abstract:

Objective To establish machine learning-based models to predict intracerebral hemorrhage associated pneumonia. Methods The in-hospital patients with cerebral hemorrhage within 7 days of onset in the China National Stoke Registry Ⅱ (CNSR Ⅱ) study in 219 hospitals between May 2012 and January 2013 were selected as the analyzed subjects. The subjects were randomly divided into a training set (80%) and a test set (20%). Multivariable logistic regression analysis was applied to screen the candidate predictors in the training set. Then four diagnostic prediction models were constructed using the four machine learning methods (Logistic regression, CatBoost, XGBoost and LightGBM), and the predictive value of the four models were compared.Results A total of 2303 patients (mean age 62.1±12.7 years old, 62.1% males) were enrolled and randomly divided into a training set (n =1841) and a test set (n =462). The incidence of intracerebral hemorrhage associated pneumonia in the two groups was 15.6% and 15.8%, respectively (χ 2=0.007, P =0.934). According to multivariate logistic regression analysis, candidate predictors were age (OR 1.03, 95%CI 1.02-1.04), NIHSS score (OR 1.02, 95%CI 1.00-1.04), white blood cell count (OR 1.11, 95%CI 1.07-1.16) and dysphagia (OR 6.85, 95%CI 5.01-9.39). The sensitivity of Logistic regression, CatBoost, XGBoost and LightGBM models were 75.34%, 50.68%, 80.82% and 80.82%, respectively; the specificity were 68.64%, 86.12%, 52.96% and 57.33%, respectively. The area under the ROC curve were 0.776, 0.692, 0.736 and 0.767, respectively. Logistic regression and LightGBM models were significantly more effective than CatBoost and XGBoost models (DeLong test, P <0.05). Conclusions The machine learning based predictive models for intracerebral hemorrhage associated pneumonia have high diagnostic value, which can be applied in the diagnosis decisionmaking of intracerebral hemorrhage associated pneumonia. Age, NIHSS score, white blood cell count and dysphagia were candidate predictors to construct predictive models. The clinical value of the results is yet to be validated in an external cohort with a larger sample size.

Key words: Intracerebral hemorrhage associated pneumonia; Predictive model; Machine learning