Objective To establish machine learning-based models to predict intracerebral hemorrhage
associated pneumonia.
Methods The in-hospital patients with cerebral hemorrhage within 7 days of onset in the China
National Stoke Registry Ⅱ (CNSR Ⅱ) study in 219 hospitals between May 2012 and January 2013
were selected as the analyzed subjects. The subjects were randomly divided into a training set (80%)
and a test set (20%). Multivariable logistic regression analysis was applied to screen the candidate
predictors in the training set. Then four diagnostic prediction models were constructed using the
four machine learning methods (Logistic regression, CatBoost, XGBoost and LightGBM), and the
predictive value of the four models were compared.Results A total of 2303 patients (mean age 62.1±12.7 years old, 62.1% males) were enrolled and
randomly divided into a training set (n =1841) and a test set (n =462). The incidence of intracerebral
hemorrhage associated pneumonia in the two groups was 15.6% and 15.8%, respectively (χ 2=0.007,
P =0.934). According to multivariate logistic regression analysis, candidate predictors were age (OR
1.03, 95%CI 1.02-1.04), NIHSS score (OR 1.02, 95%CI 1.00-1.04), white blood cell count (OR
1.11, 95%CI 1.07-1.16) and dysphagia (OR 6.85, 95%CI 5.01-9.39). The sensitivity of Logistic
regression, CatBoost, XGBoost and LightGBM models were 75.34%, 50.68%, 80.82% and 80.82%,
respectively; the specificity were 68.64%, 86.12%, 52.96% and 57.33%, respectively. The area
under the ROC curve were 0.776, 0.692, 0.736 and 0.767, respectively. Logistic regression and
LightGBM models were significantly more effective than CatBoost and XGBoost models (DeLong
test, P <0.05).
Conclusions The machine learning based predictive models for intracerebral hemorrhage
associated pneumonia have high diagnostic value, which can be applied in the diagnosis decisionmaking
of intracerebral hemorrhage associated pneumonia. Age, NIHSS score, white blood cell
count and dysphagia were candidate predictors to construct predictive models. The clinical value of
the results is yet to be validated in an external cohort with a larger sample size.