中国血吸虫病防治杂志 ›› 2023, Vol. 35 ›› Issue (3): 225-235,243.

• 论著 • 上一篇    下一篇

基于机器学习的江苏省输入性疟疾病例就医延迟风险预测模型构建

章钰莹1,曹园园2,杨凯3,王伟明2,杨蒙蒙2,柴丽莹1,顾己悦1,李梦月4,卢艳5, 周华云2,朱国鼎2,曹俊2,卢光玉1,6*   

  1. 1 扬州大学公共卫生学院(江苏 扬州 225007);2 国家卫生健康委员会寄生虫病预防和控制技术重点实验室、江苏省寄生虫与媒介控制技术重点实验室、江苏省血吸虫病防治研究所;3 扬州大学人工智能学院;4 扬州大学护理学院;5 南京海关卫生检疫处;6 江苏省人兽共患病学重点实验室(江苏 扬州225007)
  • 出版日期:2023-06-25 发布日期:2023-07-05
  • 作者简介:章钰莹,女,硕士研究生。研究方向:传染病流行病学与健康管理
  • 基金资助:
    国家自然科学基金(71904165);国家卫生健康委员会寄生虫病预防与控制技术重点实验室、江苏省寄生虫与媒介控制技术重点实验室开放课题(wk023⁃007);江苏省博士后科研基金(2020Z003);江苏省人兽共患病学重点实验室资助项目(R2208);江苏省高校动物重要疫病和重要人兽共患病防控技术国际合作联合实验室(01)

Risk predictive models of healthcare⁃seeking delay among imported malaria patients in Jiangsu Province based on the machine learning

ZHANG Yuying1, CAO Yuanyuan2, YANG Kai3, WANG Weiming2, YANG Mengmeng2, CHAI Liying1, GU Jiyue1, LI Mengyue4, LU Yan5, ZHOU Huayun2, ZHU Guoding2, CAO Jun2, LU Guangyu1, 6*   

  1. 1 School of Public Health, Yangzhou University, Yangzhou, Jiangsu 225007, China; 2 National Health Commission of Key Laboratory for Parasitic Disease Prevention and Control, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, China; 3 School of Artificial Intelligence, Yangzhou University, China; 4 School of Nursing, Yangzhou University, China; 5 Health and Quarantine Office, Nanjing Customs, China; 6 Jiangsu Key Laboratory of Zoonoses, Yangzhou University, Yangzhou, Jiangsu 225007, China
  • Online:2023-06-25 Published:2023-07-05

摘要: 目的 基于机器学习算法构建江苏省输入性疟疾病例就医延迟风险预测模型,为江苏省输入性疟疾病例早期发现提供依据。方法 基于中国疾病预防控制中心传染病报告信息管理系统和寄生虫病防治信息管理系统,收集2019年江苏省报告的输入性疟疾病例个案调查、首发症状及初诊时间等信息。以职业、感染疟原虫虫种、主要临床表现、有无并发症、疾病严重程度、年龄、国外居留时间、在国外感染疟疾次数、潜伏期、初诊单位级别、来源国、同行人员和出国途径等13个因素为自变量,以就医延迟时间(≤ 24 h和> 24 h)为因变量,分别运用BP神经网络、logistic回归、随机森林和贝叶斯算法构建输入性疟疾病例就医延迟风险预测模型。使用列线图对logistic回归进行可视化分析,绘制校准曲线对列线图进行评价,并比较4种模型的受试者工作特征曲线(receiver operator characteristic curve,ROC)曲线下面积(area under curve,AUC),以评价模型预测效能。进一步分析各特征数值大小对预测结果的正负影响,应用SHAP算法对各特征重要性进行量化和归因。结果 共纳入输入性疟疾病例244例,其中自出现首发症状后到初诊时间超过24 h的病例累计100例(40.98%)。建立logistic回归模型发现,有疟疾感染史[比值比(odds ratio,OR)= 3.075,95%可信区间(confidential interval,CI):(1.597, 5.923)]、潜伏期长[OR = 1.010,95% CI:(1.001,1.018)]或在省市级医疗机构就医[OR = 12.550,95% CI:(1.158, 135.963)]是输入性疟疾病例就医延迟的危险因素。BP神经网络模型结果分析发现,对输入性疟疾就医延迟影响较大的因素是国外居留时间、潜伏期和年龄。随机森林模型结果分析发现,影响输入性疟疾就医延迟的前5位因素依次为主要临床表现、出国途径、潜伏期、国外居留时间和年龄。贝叶斯模型结果分析发现,影响输入性疟疾就医延迟的前5位因素依次为初诊单位级别、年龄、来源国、疟疾感染史和同行人员。通过比较各模型AUC发现,BP神经网络模型与logistic回归模型总体性能较优(Z = 2.700 ~ 4.641,P均< 0.01),且AUC差异无统计学意义(Z = 1.209,P > 0.05)。Logistic回归模型预测灵敏度(71.00%)和约登指数(43.92%)均高于BP神经网络模型(63.00%和36.61%);而BP神经网络模型预测特异度(73.61%)高于logistic回归模型(72.92%)。结论 国外居留时间长、有疟疾感染史、潜伏期长、高年龄组和在省市级医疗机构就诊的江苏省输入性疟疾病例发生就医延迟的概率较高。基于logistic回归模型和BP神经网络模型构建江苏省输入性疟疾患者就医延迟风险预测模型具有较好预测效能,可为输入性疟疾患者健康管理提供参考。

关键词: 输入性疟疾, 就医延迟, 机器学习, BP神经网络模型, logistic回归模型, 风险预测模型, 江苏省

Abstract: Objective To create risk predictive models of healthcare⁃seeking delay among imported malaria patients in Jiangsu Province based on machine learning algorithms, so as to provide insights into early identification of imported malaria cases in Jiangsu Province. Methods Case investigation, first symptoms and time of initial diagnosis of imported malaria patients in Jiangsu Province in 2019 were captured from Infectious Disease Report Information Management System and Parasitic Disease Prevention and Control Information Management System of Chinese Center for Disease Control and Prevention. The risk predictive models of healthcare⁃seeking delay among imported malaria patients were created with the back propagation (BP) neural network model, logistic regression model, random forest model and Bayesian model using thirteen factors as independent variables, including occupation, species of malaria parasite, main clinical manifestations, presence of complications, severity of disease, age, duration of residing abroad, frequency of malaria parasite infections abroad, incubation period, level of institution at initial diagnosis, country of origin, number of individuals travelling with patients and way to go abroad, and time of healthcare⁃seeking delay as a dependent variable. Logistic regression model was visualized using a nomogram, and the nomogram was evaluated using calibration curves. In addition, the efficiency of the four models for prediction of risk of healthcare⁃seeking delay among imported malaria patients was evaluated using the area under curve (AUC) of receiver operating characteristic curve(ROC). The importance of each characteristic was quantified and attributed by using SHAP to examine the positive and negative effects of the value of each characteristic on the predictive efficiency. Results A total of 244 imported malaria patients were enrolled, including 100 cases (40.98%) with the duration from onset of first symptoms to time of initial diagnosis that exceeded 24 hours. Logistic regression analysis identified a history of malaria parasite infection [odds ratio (OR) = 3.075, 95% confidential interval (CI): (1.597, 5.923)], long incubation period [OR = 1.010, 95% CI: (1.001, 1.018)] and seeking healthcare in provincial or municipal medical facilities [OR = 12.550, 95% CI: (1.158, 135.963)] as risk factors for delay in seeking healthcare among imported malaria cases. BP neural network modeling showed that duration of residing abroad, incubation period and age posed great impacts on delay in healthcare⁃seek among imported malaria patients. Random forest modeling showed that the top five factors with the greatest impact on healthcare⁃seeking delay included main clinical manifestations, the way to go abroad, incubation period, duration of residing abroad and age among imported malaria patients, and Bayesian modeling revealed that the top five factors affecting healthcare⁃seeking delay among imported malaria patients included level of institutions at initial diagnosis, age, country of origin, history of malaria parasite infection and individuals travelling with imported malaria patients. ROC curve analysis showed higher overall performance of the BP neural network model and the logistic regression model for prediction of the risk of healthcare⁃seeking delay among imported malaria patients (Z = 2.700 to 4.641, all P values < 0.01), with no statistically significant difference in the AUC among four models (Z = 1.209, P > 0.05). The sensitivity (71.00%) and Youden index (43.92%) of the logistic regression model was higher than those of the BP neural network (63.00% and 36.61%, respectively), and the specificity of the BP neural network model (73.61%) was higher than that of the logistic regression model (72.92%). Conclusions Imported malaria cases with long duration of residing abroad, a history of malaria parasite infection, long incubation period, advanced age and seeking healthcare in provincial or municipal medical institutions have a high likelihood of delay in healthcare⁃seeking in Jiangsu Province. The models created based on the logistic regression and BP neural network show a high efficiency for prediction of the risk of healthcare⁃seeking among imported malaria patients in Jiangsu Province, which may provide insights into health management of imported malaria patients.

Key words: Imported malaria, Healthcare?seeking delay, Machine learning, BP neural network model, Logistic regression model, Risk predictive model, Jiangsu Province

中图分类号: