Chinese Journal of Stroke ›› 2018, Vol. 13 ›› Issue (10): 1019-1024.DOI: 10.3969/j.issn.1673-5765.2018.10.004

Previous Articles     Next Articles

Analysis of Related Factors of Leukoaraiosis Based on Data Mining

  

  • Received:2017-08-01 Online:2018-10-20 Published:2018-10-20

基于数据挖掘的脑白质疏松症相关因素分析

娜迪热·艾孜热提艾力,封红亮,张帅美,王美瑶,刘煜敏   

  1. 430071 武汉大学中南医院神经内科
  • 通讯作者: 刘煜敏 lym9381@126.com
  • 基金资助:

    国家自然科学基金(81371273)

Abstract:

Objective To analyze the related factors of leukoaraiosis by using data mining technology. Methods The clinical data of 1197 inpatients who were eligible for inclusion criteria in Zhongnan Hospital during April 2015 to February 2017 were included in the study. Among the data mining technology, 4 kinds of prediction models and chi square test based feature selection method were used to analyze the related factors of leukoaraiosis. Results Among the 4 kinds of data mining models, the logistic regression model had the best prediction performance (the number of features is 9, AUC=0.825±0.012). Feature selection methods were used to select the top 9 features as related factors of leukoaraiosis. Decision tree model (the number of features was 4, AUC=0.788±0.017) was used to obtain the visual result graph of the decision tree. Conclusions According to data mining methods in this paper, the positive relevant factors of leukoaraiosis were age, history of hypertension, arteriosclerosis, anemia, type 2 diabetes mellitus, creatinine, red blood cell count, and red blood cell distribution width; and the negative relevant factors were red blood cell count and hemoglobin concentration.

Key words: Leukoaraiosis; Relevant factors; Data mining; Feature selection

摘要:

目的 利用数据挖掘技术分析脑白质疏松症相关因素。 

方法 回顾性地收集2015年4月-2017年2月中南医院神经内科符合标准的1197例住院患者临床资料, 利用数据挖掘技术4种预测模型和卡方检验基础上的特征选择方法分析脑白质疏松症相关因素。 

结果 4种数据挖掘模型中,逻辑回归模型预测性能最佳[特征数为9,受试者工作特征曲线下面积 (area under the curve,AUC)=0.825±0.012]。特征选择方法选出9种与脑白质疏松症相关的因素。利用决策树模型(特征数为4,AUC=0.788±0.017)得到该决策树的可视化结果图。 

结论 数据挖掘方法选出的因素中与脑白质疏松呈正相关的因素有年龄、高血压病史、颅内动脉狭 窄、贫血、2型糖尿病、肌酐、红细胞分布宽度;呈负相关的因素有红细胞计数、血红蛋白浓度。

关键词: 脑白质疏松症; 相关因素; 数据挖掘; 特征选择