Chinese Journal of Schistosomiasis Control ›› 2022, Vol. 34 ›› Issue (3): 241-.

Previous Articles     Next Articles

Prediction of trends for fine⁃scale spread of Oncomelania hupensis in Shanghai Municipality based on supervised machine learning models

GONG Yan⁃feng1, LUO Zhuo⁃wei1, FENG Jia⁃xin1, XUE Jing⁃bo1, GUO Zhao⁃yu1, JIN Yan⁃jun2, YU Qing2, XIA Shang1, 3, LÜ Shan1, 3, XU Jing1, LI Shi⁃zhu1, 3*   

  1. 1 National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai 200025, China; 2 Shanghai Municipal Center for Disease Control and Prevention, China; 3 School of Global Health, Chinese Center for Tropical Diseases Research, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
  • Online:2022-07-06 Published:2022-07-06

基于监督式机器学习模型的 上海市小尺度湖北钉螺扩散趋势预测研究

公衍峰1,罗卓韦1,冯家鑫1,薛靖波1,郭照宇1,靳艳军2,余晴2,夏尚1,3,吕山1,3,许静1,李石柱1,3*   

  1. 1 中国疾病预防控制中心寄生虫病预防控制所(国家热带病研究中心)、国家卫生健康委员会寄生虫病原与媒介生物学重点实验室、WHO热带病合作中心、国家级热带病国际联合研究中心(上海 200025);2 上海市疾病预防控制中心;3 上海交通大学医学院⁃国家热带病研究中心全球健康学院(上海 200025)
  • 作者简介:公衍峰,男,硕士研究生。研究方向:传染病风险评估与监测预警
  • 基金资助:
    上海市公共卫生三年行动计划(GWV⁃10.1⁃XK13);国家传染病重大专项(2016ZX10004222⁃004);上海市卫生健康委员会科研课题(2019Y0359)

Abstract: Objective To predict the trends for fine⁃scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control. Methods Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k⁃nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1⁃score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality. Results Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1⁃score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1⁃score = 0.869, ACC = 0.834) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 ℃ annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 ℃ annual accumulated temperature of 5 550 to 5 675 ℃, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai. Conclusions Supervised machine learning models are effective to predict the risk of fine⁃scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.

Key words: Oncomelania hupensis, Machine learning model, Spread, Prediction, Shanghai Municipality

摘要: 目的 采用监督式机器学习模型预测上海市小尺度湖北钉螺扩散趋势,为钉螺精准防控提供依据。方法 利用2016年上海市钉螺调查资料和钉螺分布相关气候、地理、植被、经济社会等数据,构建决策树、随机森林、广义推进模型、支持向量机、朴素贝叶斯、k⁃近邻和C5.0等7种机器学习模型预测上海市钉螺扩散风险。采用受试者工作特征曲线下面积(area under the curve,AUC)、F1值(F1⁃scores)和准确率(accuracy,ACC)等指标评价7种模型预测性能,并选择最优模型对上海市钉螺扩散环境因素和风险区进行预测。结果 成功建立了7种可用于预测上海市钉螺扩散风险的机器学习模型,其中随机森林模型(AUC = 0.901,F1 = 0.840,ACC = 0.797)和广义推进模型(AUC = 0.889,F1 = 0.869,ACC = 0.835)预测效果较好。随机森林模型显示,对上海市钉螺扩散影响较大的气候变量主要包括干燥度(11.87%)、≥ 0 ℃年积温(10.19%)、湿润指数(10.18%)和年均降雨量(9.86%);植被变量主要包括第一季度植被指数(8.30%)和第二季度植被指数(7.69%)。气候变量中,干燥度< 0.87、≥ 0 ℃年积温在5 550 ~ 5 675 ℃、湿润指数> 39%、年均降雨量> 1 180 mm,易发生钉螺扩散;植被因子中,第一季度植被指数> 0.4、第二季度植被指数> 0.6,易发生钉螺扩散。结合水利片区和乡(镇)行政地图,上海市钉螺扩散风险区域主要分布在10个街道(镇),涉及浦南西片区、浦南东片区和太南片区等3个水利片区。结论 监督式机器学习模型可用于预测小尺度范围钉螺扩散风险并可评估导致钉螺扩散的环境因素。上海市钉螺扩散风险区主要分布在松江区西南部地区、金山区西北部地区和青浦区东南部地区。

关键词: 湖北钉螺, 机器学习模型, 扩散, 预测, 上海市

CLC Number: