基于机器学习算法的脑出血相关肺炎预测模型研究

doi:10.3969/j.issn.1673-5765.2020.03.004

摘要/Abstract

摘要：

目的建立基于机器学习的脑出血相关肺炎预测模型。方法选择中国国家卒中登记Ⅱ（China National Stoke Registry Ⅱ，CNSRⅡ）数据库中发病7 d内的急性脑出血住院患者为研究对象，登记时间为2012年5月-2013年1月，研究覆盖我国219家医院。研究对象按照8∶2比例随机分为训练集和测试集。采用多因素Logistic回归分析，筛选出候选预测因子。应用基于机器学习的Logistic回归、CatBoost、XGBoost和LightGBM算法构建诊断预测模型，比较4种方法构建的模型对脑出血相关肺炎的预测诊断价值。结果本研究共筛选2303例患者，平均年龄62.1±12.7岁，其中男性占62.1%。患者随机分为训练集（n =1841）和测试集（n =462），两组脑出血相关肺炎发生率分别为15.6%和15.8%（χ 2=0.007， P =0.934）。根据多因素Logistic回归分析，候选预测因子为年龄（OR 1.03，95%CI 1.02～1.04）、NIHSS 评分（OR 1.02，95%CI 1.00～1.04）、白细胞计数（OR 1.11，95%CI 1.07～1.16）和吞咽功能障碍（OR 6.85，95%CI 5.01～9.39）。Logistic回归、CatBoost、XGBoost和LightGBM四种模型灵敏度分别为75.34%、 50.68%、80.82%和80.82%；特异度分别为68.64%、86.12%、52.96%和57.33%；ROC曲线下面积分别为0.776、0.692、0.736和0.767。Logistic回归和LightGBM模型诊断效果显著高于CatBoost和XGBoost模型（DeLong test，P <0.05）。结论基于机器学习建立的脑出血相关肺炎风险预测模型有较高的诊断价值，年龄、NIHSS评分、白细胞计数和吞咽功能障碍为模型的候选预测因子，可将模型纳入脑出血相关肺炎诊断决策。本研究结果的临床应用价值有待于更大样本的外部队列进行验证。

文章导读： 基于机器学习的方法，结合实验室检查指标，可优化脑出血相关肺炎预测模型。

关键词: 卒中相关肺炎; 预测模型; 机器学习

Abstract:

Objective To establish machine learning-based models to predict intracerebral hemorrhage associated pneumonia. Methods The in-hospital patients with cerebral hemorrhage within 7 days of onset in the China National Stoke Registry Ⅱ (CNSR Ⅱ) study in 219 hospitals between May 2012 and January 2013 were selected as the analyzed subjects. The subjects were randomly divided into a training set (80%) and a test set (20%). Multivariable logistic regression analysis was applied to screen the candidate predictors in the training set. Then four diagnostic prediction models were constructed using the four machine learning methods (Logistic regression, CatBoost, XGBoost and LightGBM), and the predictive value of the four models were compared.Results A total of 2303 patients (mean age 62.1±12.7 years old, 62.1% males) were enrolled and randomly divided into a training set (n =1841) and a test set (n =462). The incidence of intracerebral hemorrhage associated pneumonia in the two groups was 15.6% and 15.8%, respectively (χ 2=0.007, P =0.934). According to multivariate logistic regression analysis, candidate predictors were age (OR 1.03, 95%CI 1.02-1.04), NIHSS score (OR 1.02, 95%CI 1.00-1.04), white blood cell count (OR 1.11, 95%CI 1.07-1.16) and dysphagia (OR 6.85, 95%CI 5.01-9.39). The sensitivity of Logistic regression, CatBoost, XGBoost and LightGBM models were 75.34%, 50.68%, 80.82% and 80.82%, respectively; the specificity were 68.64%, 86.12%, 52.96% and 57.33%, respectively. The area under the ROC curve were 0.776, 0.692, 0.736 and 0.767, respectively. Logistic regression and LightGBM models were significantly more effective than CatBoost and XGBoost models (DeLong test, P <0.05). Conclusions The machine learning based predictive models for intracerebral hemorrhage associated pneumonia have high diagnostic value, which can be applied in the diagnosis decisionmaking of intracerebral hemorrhage associated pneumonia. Age, NIHSS score, white blood cell count and dysphagia were candidate predictors to construct predictive models. The clinical value of the results is yet to be validated in an external cohort with a larger sample size.

Key words: Intracerebral hemorrhage associated pneumonia; Predictive model; Machine learning

王孟，覃露，王春娟，李姣，王伊龙，赵性泉，王拥军，李子孝. 基于机器学习算法的脑出血相关肺炎预测模型研究[J]. 中国卒中杂志, 2020, 15(03): 243-249.

WANG Meng, QIN Lu, WANG Chun-Juan, LI Jiao, WANG Yi-Long, ZHAO Xing-Quan, WANG Yong-Jun, LI Zi-Xiao. Machine Learning-based Models for Prediction of Intracerebral Hemorrhage Associated Pneumonia[J]. Chinese Journal of Stroke, 2020, 15(03): 243-249.

参考文献

[1] NAGHAVI M，ABAJOBIR A A，ABBAFATI C，et

al. Global，regional，and national age-sex specific

mortality for 264 causes of death，1980-2016：a

塞性肺疾病、mRS评分、NIHSS评分、GCS评

分、吞咽困难等11个指标，结果表明，该模型

AUC为0.76，预测效果较好。然而，该模型未

纳入实验室检查的指标，已有研究证明超敏反

应蛋白、白细胞计数等指标与SAP严重程度正

相关[2 6 -27]；同时，该模型纳入指标过多，在临

床使用时，增加临床医生工作负担。本研究纳

入实验室检查指标，使用白细胞计数作为预

测因子，结果显示，白细胞计数对于SAP发生

的影响（OR 1.11，95%CI 1.07～1.16）高于年

龄（OR 1.03，95%CI 1.02～1.04）和NIHSS

评分（OR 1.02，95%CI 1.0 0～1.0 4）；同时

本研究只纳入4个预测因子，L o g i s t i c回归

（AUC=0.776）和LightGBM（AUC=0.767）

两个模型的预测效果均高于上述研究的预测

效果，预测结果更准确。

本研究的优势有以下三点：首先，脑出血

相关肺炎预测模型较少，本研究尝试在脑出血

患者中，使用机器学习的方法预测SAP发生风

险，研究方法可供后续研究使用；其次，白细

胞计数在临床上容易获得，并且与SAP发生关

联较高，因此模型只纳入4个预测因子，取得较

好的预测效果，方便临床医生的实际应用；最

后本研究将人群随机分为两部分，对建立的模

型进行了内部验证，保证了模型结果的可靠性。

同时，本研究也有不足之处，模型未进行外部

验证，仍需在大样本、多中心的外部人群中进

行验证，以保证模型的准确性与可靠性。

综上，基于机器学习方法建立的脑出血相

关肺炎风险预测模型有较高的诊断价值，年龄、

NIHSS评分、白细胞计数和吞咽功能障碍为候

选预测因子，可将模型纳入脑出血相关肺炎诊

断决策。本研究结果的临床应用价值有待于更

大样本的外部队列进行验证。

systematic analysis for the Global Burden of Disease

Study 2016[J]. The Lancet，2017，390（10100）：

1151-1210.

[2] ZHOU M，WANG H，ZENG X，et al. Mortality，

morbidity，and risk factors in China and its

provinces，1990-2017：a systematic analysis for the

Global Burden of Disease Study 2017[J]. The Lancet，

2019，394（10204）：1145-1158.

[3] VOS T，ABAJOBIR A A，ABATE K H，et al.

Global，regional，and national incidence，prevalence，

and years lived with disability for 328 diseases and

injuries for 195 countries，1990-2016：a systematic

analysis for the Global Burden of Disease Study

2016[J]. The Lancet，2017，390（10100）：1211-1259.

[4] WESTENDORP W F，NEDERKOORN P

J，VERMEIJ J D，et al. Post-stroke infection：a

systematic review and meta-analysis[J]. BMC Neurol，

2011，11：110.

[5] KWAN J，HAND P. Infection after acute stroke is

associated with poor short-term outcome[J]. Acta

Neurol Scand，2007，115（5）：331-338.

[6] 杨兰，张霞，杨霞，等. 急性脑出血患者卒中相关性

肺炎发病的危险因素分析[J]. 中国实用神经疾病杂

志，2016，19（14）：97-98.

[7] INGEMAN A，ANDERSEN G，HUNDBORG H

H，et al. In-hospital medical complications，length

of stay，and mortality among stroke unit patients[J].

Stroke，2011，42（11）：3214-3218.

[8] KATZAN I L，DAWSON N V，THOMAS C L，et

al. The cost of pneumonia after acute stroke[J].

Neurology，2007，68（22）：1938-1943.

[9] KAMMERSGAARD L P，JØRGENSEN H S，

REITH J，et al. Early infection and prognosis after

acute stroke：The Copenhagen Stroke Study[J]. J

Stroke Cerebrovasc Dis，2001，10（5）：217-221.

[10] ASLANYAN S，WEIR C J，DIENER H C，et al.

Pneumonia and urinary tract infection after acute

ischaemic stroke：a tertiary analysis of the GAIN

international trial[J]. Eur J Neurol，2004，11（1）：

49-53.

[11] KWON H M，JEONG S W，LEE S H，et al. The

pneumonia score：a simple grading scale for

prediction of pneumonia after acute stroke[J]. Am J

Infect Control，2006，34（2）：64-68.

[12] JI R，SHEN H，PAN Y，et al. Novel risk score to

predict pneumonia after acute ischemic stroke[J].

Stroke，2013，44（5）：1303-1309.

[13] JI R，SHEN H，PAN Y，et al. Risk score to predict

hospital-acquired pneumonia after spontaneous

intracerebral hemorrhage[J]. Stroke，2014，45（9）：

2620-2628.

[14] 曹文哲，应俊，张亚慧，等. 基于机器学习算法的前

列腺癌诊断模型研究[J]. 中国医疗设备，2016，31

（4）：30-35.

[15] 苗丰顺，李岩，高岑，等. 基于CatBoost算法的糖

尿病预测方法[J]. 计算机系统应用，2019，28（9）：

215-218.

[16] Stroke--1989. Recommendations on stroke

prevention，diagnosis，and therapy. Report

of the WHO Task Force on Stroke and other

Cerebrovascular Disorders[J]. Stroke，1989，20（10）：

1407-1431.

[17] Garner J S，Jarvis W R，Emori T G，et al. CDC

definitions for nosocomial infections，1988[J]. Am J

Infect Control，1988，16（3）：128-140.

[18] CHEN T，GUESTRIN C．XGBoost：a scalable tree

boosting system[C/OL]//ACM SIGKDD international

conference on knowledge discovery and data mining.

ACM，2016：785-794[2020-01-10]. https：//doi.

org/10.1145/2939672.2939785.

[19] 曹文哲，应俊，张亚慧，等. 基于机器学习算法的前

列腺癌诊断模型研究[J]. 中国医疗设备，2016，31

（4）：30-35.

[20] HEO J，YOON J G，PARK H，et al. Machine

learning-based model for prediction of outcomes in

acute stroke[J]. Stroke，2019，50（5）：1263-1265.

[21] NTAIOS G，FAOUZI M，FERRARI J，et al. An

integer-based score to predict functional outcome

in acute ischemic stroke：the ASTRAL score[J].

Neurology，2012，78（24）：1916-1922.

[22] KUMAR S，MARCHINA S，MASSARO J，et al.

ACDD4 score：a simple tool for assessing risk of

pneumonia after stroke[J]. J Neurol Sci，2017，372：

399-402.

[23] SMITH C J，BRAY B D，HOFFMAN A，et al.

Can a novel clinical risk score improve pneumonia

prediction in acute stroke care? A UK multicenter

cohort study[J/OL]. J Am Heart Assoc，2015，4（1）：

e001307[2020-01-10]. https：//doi.org/10.1161/JAHA.

114.001307.

[24] HARMS H，GRITTNER U，DRÖGE H，et al.

Predicting post-stroke pneumonia：the PANTHERIS

score[J]. Acta Neurol Scand，2013，128（3）：178-

184.

[25] HOFFMANN S，MALZAHN U，HARMS H，et al.

Development of a clinical score（A2DS2）to predict

pneumonia in acute ischemic stroke[J]. Stroke，2012，

43（10）：2617-2623.

[26] ZHANG H，LI X. Correlation between inflammatory

factors and post-stroke pneumonia in diabetic

patients[J]. Exp Ther Med，2013，6（1）：105-108.

[27] YANG N Z，LI X，YUN X H，et al. Risk factors

analysis of nosocomial pneumonia in elderly patients

with acute cerebral infraction[J/OL]. Medicine，2019，

98（13）：e15045[2020-01-10]. https：//doi.org/10.

1097/MD.0000000000015045.