Chinese Journal of Schistosomiasis Control ›› 2026, Vol. 38 ›› Issue (1): 14-19, 53.

Previous Articles     Next Articles

Factors affecting and identification of key environmental determinants of the Oncomelania hupensis snail density in the Yangtze River Delta based on machine learning models

LI Yinlong1, LI Qin1, GUO Suying1, LI Shizhen1, ZHANG Lijuan1, CAO Chunli1, XU Jing1, 2*   

  1. 1 National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory on Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Ministry of Science and Technology, Shanghai 200025,China; 2 School of Global Health, Shanghai Jiao Tong University School of Medicine and Chinese Center for Tropical Diseases Research, Shanghai 200025, China
  • Online:2026-02-25 Published:2026-04-10

基于机器学习的长江三角洲地区钉螺密度影响因素分析及关键环境因子识别

李银龙1,李琴1,郭苏影1,李仕祯1,张利娟1,曹淳力1,许静1, 2*   

  1. 1 中国疾病预防控制中心寄生虫病预防控制所(国家热带病研究中心)、国家卫生健康委员会寄生虫病原与媒介生物学重点实验室、WHO热带病合作中心、科技部国家级热带病国际联合研究中心(上海 200025);2上海交通大学医学院⁃国家热带病研究中心全球健康学院(上海 200025)
  • 通讯作者: 许静 xujing@nipd.chinacdc.cn
  • 作者简介:李银龙,男,硕士,副研究员。研究方向:血吸虫病防治
  • 基金资助:
    上海市卫生健康委员会卫生行业临床研究专项(20214Y0212)

Abstract: Objective To identify factors affecting and key environmental factors of the Oncomelania hupensis snail density in the Yangtze River Delta region using machine learning methods. Methods Administrative village⁃level O. hupensis snail survey data in the Yangtze River Delta (including Shanghai Municipality, Jiangsu Province, Zhejiang Province and Anhui Province) from 2011 to 2021 were retrieved from the Information Management System for Parasitic Disease Control of Chinese Center for Disease Control and Prevention. Environmental factor data were captured from the Google Earth Engine platform, including elevation, slope, terrain, normalized difference vegetation index (NDVI), vegetation type, soil type, total petroleum hydrocarbon (TPH), ammonium nitrogen, inorganic nitrogen, dissolved oxygen, pH of water, chemical oxygen demand (COD) and inorganic phosphorus, and climatic factor data in the study region were retrieved from the Copernicus Climate Data Store, including annual precipitation, aridity index and annual mean temperature (AMT). O. hupensis snail survey data in the Yangtze River Delta region from 2011 to 2021 were randomly divided into a training set (70%) and a test set (30%), and five machine learning models were selected for machine learning model construction and comparative analysis of the O. hupensis snail density using the software R 4.3.0, including random forest (RF), eXtreme gradient boosting (XGBoost), support vector machine (SVM), gradient boosting machine (GBM) and neural network (NN). The XGBoost model was employed to construct a predictive model for the O. hupensis snail density, and the impact of each environmental factor on O. hupensis snail distribution was quantified. The SHapley Additive exPlanations (SHAPs) values were calculated to estimate the average contribution of each variable to the model prediction, and the core environmental factors affecting the O. hupensis snail population density were screened. Results Among the five machine learning models, the XGBoost model exhibited the optimal comprehensive performance, with the coefficient of determination (R2) of 0.855, mean squared error (MSE) of 0.188, root mean squared error (RMSE) of 0.434 and mean absolute error (MAE) of 0.155, respectively. Analysis of factors affecting the O. hupensis snail density with the XGBoost model showed that among the 16 environmental factors, the top four high⁃impact factors ranked by SHAPs values included annual precipitation, elevation, aridity index and NDVI, with cumulative SHAPs contributions of 75%, which was higher than that of other environmental factors. If NDVI was higher than 0.6, the O. hupensis snail density increased with NDVI and peaked if NDVI was 0.8 (1.60 snails/0.1 m2). The O. hupensis snail density increased with elevation if the elevation ranged from 14 to 40 m, and slowly rose if the annual precipitation ranged from 900 to 1 300 mm, and then increased rapidly to the peak (1.52 snails/0.1 m2) if the annual precipitation ranged from 1 300 to 1 500 mm. In addition, the O. hupensis snail density increased rapidly to the maximum (1.60 snails/0.1 m2) if the aridity index ranged from 0.8 to 1.1, and decreased gradually if the aridity index exceeded 1.1. Conclusions The XGBoost model shows excellent performance in prediction of the O. hupensis snail density and identification of key environmental factors in the Yangtze River Delta region. Annual precipitation, elevation, aridity index and NDVI are key environmental factors affecting the distribution and density of O. hupensis snails in the Yangtze River Delta region.

Key words: Oncomelania snail, Density, Influencing factor, Machine learning models, Yangtze River Delta, XGBoost model, Environmental factor

摘要: 目的 采用机器学习方法分析长江三角洲地区钉螺密度的影响因素,并识别关键环境因子,为钉螺精准控制提供参考。方法 在中国疾病预防控制中心寄生虫病防治信息管理系统中,获取2011—2021年长江三角洲(上海市、江苏省、浙江省和安徽省)以行政村为单位的钉螺调查数据。于谷歌地球引擎网站获取研究区域海拔、坡度、地形、归一化植被指数(normalized difference vegetation index,NDVI)、植被类型、土壤类型,总石油烃(total petroleum hydrocarbon,TPH)、铵态氮、无机氮、溶解氧含量,水体pH值、化学需氧量(chemical oxygen demand,COD)、无机磷含量等环境因子数据;于哥白尼气候数据存储库获取研究区域年降水量、干旱指数和年均温度(annual mean temperature,AMT)等气候因子数据。将2011—2021年长江三角洲地区钉螺调查数据随机分为训练集(占70%)与测试集(占30%),基于R 4.3.0软件,选取随机森林(random forest,RF)、极端梯度提升(eXtreme gradient boosting,XGBoost)、支持向量机(support vector machine,SVM)、梯度提升机(gradient boosting machine,GBM)和神经网络(neural network,NN)模型进行钉螺密度模型构建与对比分析。采用XGBoost模型构建钉螺密度预测模型,量化各环境因子对钉螺分布的影响程度。计算沙普利加性解释(Shapley additive explanations,SHAPs)值,估计各变量对模型预测结果的平均贡献度,筛选影响钉螺种群密度的核心环境因子。结果 5种机器学习模型中,XGBoost模型决定系数、均方误差、均方根误差和平均绝对误差分别为0.855、0.188、0.434和0.155,综合评价结果最优。基于XGBoost模型分析钉螺密度影响因素,16种环境因子中,SHAPs值排序居前4位的为年降水量、海拔、干旱指数和NDVI,累计SHAPs值贡献度为75%,高于其他环境因子。当NDVI > 0.6时,钉螺密度随NDVI值升高而增加,并于NDVI为0.8时达峰值(1.60只/0.1 m2)。当海拔处于14 ~ 40 m时,钉螺密度随海拔升高而增加。当年降水量为900 ~ 1 300 mm时,钉螺密度缓慢上升;年降水量为1 300 ~ 1 500 mm时,密度迅速增高至峰值(1.52只/0.1 m2)。当干旱指数在0.8 ~ 1.1时,钉螺密度迅速增高至峰值(1.60只/0.1 m2);当干旱指数> 1.1时,钉螺密度逐渐降低。结论       XGBoost模型在长江三角洲地区钉螺密度预测与关键环境因子识别中应用效果较优。年降水量、海拔、干旱指数和NDVI是影响该地区钉螺分布与密度的关键环境因子。

关键词: 钉螺, 密度, 影响因素, 环境因子, 机器学习模型, XGBoost模型, 长江三角洲地区

CLC Number: