中国血吸虫病防治杂志 ›› 2024, Vol. 36 ›› Issue (6): 572-576.

• 论著 • 上一篇    下一篇

基于5种机器学习模型的安徽省钉螺扩散面积预测研究

高风华*   

  1. 安徽省疾病预防控制中心(安徽 合肥 230601)
  • 出版日期:2024-12-25 发布日期:2024-12-31
  • 通讯作者: 高风华ahxfbb@126.com
  • 作者简介:高风华,男,本科,主任医师。研究方向:血吸虫病预防控制

Prediction of areas of Oncomelania hupensis snail spread in Anhui Province based on five machine learning models

GAO Fenghua*   

  1. Anhui Provincial Center for Disease Control and Prevention, Hefei, Anhui 230601, China
  • Online:2024-12-25 Published:2024-12-31

摘要: 目的 采用机器学习模型对1977—2023年安徽省钉螺扩散面积进行建模分析,比较不同机器学习模型预测钉螺扩散面积效果,为探索钉螺扩散面积变化趋势提供参考。方法 收集1977—2023年安徽省钉螺扩散数据建立数据库。采用Matlab R2019b软件分别建立支持向量回归(support vector regression,SVR)、非线性自回归(nonlinear autoregressive,NAR)神经网络、反向传播(back propagation,BP)神经网络、门控循环单元(gate recurrent unit,GRU)神经网络和长短期记忆(long short⁃term memory,LSTM)神经网络等5种机器学习模型,采用平均绝对误差(mean absolute error,MAE)、均方根误差(root mean squared error,RMSE)、决定系数(R2)对模型拟合效果进行评价。模型完成训练后,对2024—2030年安徽省钉螺扩散面积进行预测。结果 1977—2023年,安徽省累计钉螺扩散面积为40 241.32 hm2,不同年份间差异较大,每隔4 ~ 6年出现阶段高点。SVR、NAR神经网络、BP神经网络、GRU神经网络模型与LSTM神经网络模型拟合曲线与安徽省钉螺扩散面积真实值曲线的接近程度依次增加。对2024—2030年安徽省钉螺扩散面积变化趋势进行预测,SVR与NAR神经网络模型预测结果为近似“M”形曲线,BP神经网络、GRU神经网络模型预测结果为近似“W”形曲线,LSTM神经网络模型预测结果呈单峰锥形曲线。LSTM神经网络模型RMSE值为1 277 480,MAE值为797 422,R2值为0.978 9,拟合效果为各模型中最优。结论 在5种机器学习模型中,LSTM神经网络模型预测安徽省钉螺扩散面积变化趋势效果较好,可作为钉螺扩散变化趋势研究的工具之一。

关键词: 钉螺, 机器学习, 预测效果, 支持向量回归模型, 非线性自回归神经网络, 反向传播神经网络, 门控循环单元神经网络, 长短期记忆神经网络, 安徽省

Abstract: Objective To predict the areas of Oncomelania hupensis snail spread in Anhui Province from 1977 to 2023 using machine learning models, and to compare the effectiveness of different machine learning models for prediction of areas of O. hupensis snail spread, so as to provide insights into investigating the trends in areas of O. hupensis snail spread. Methods Data pertaining to O. hupensis snail spread in Anhui Province from 1977 to 2023 were collected and a database was created. Five machine learning models were created using the software Matlab R2019b, including support vector regression (SVR), nonlinear autoregressive (NAR) neural network, back propagation (BP) neural network, gated recurrent unit (GRU) neural network and long short⁃term memory (LSTM) neural network models, and the model fitting effect was evaluated with mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R2). Following model training, the areas of O. hupensis snail spread were predicted in Anhui Province from 2024 to 2030. Results The cumulative areas of O. hupensis snail spread were 40 241.32 hm2 in Anhui Province from 1977 to 2023, and the area of O. hupensis snail spread varied greatly among years, with a periodic peak every 4 to 6 years. The fitting curves of SVR, NAR neural network, BP neural network, GRU neural network and LSTM neural network models were increasingly closer to the real curves for areas of O. hupensis snail spread in Anhui Province. The trends in areas of O. hupensis snail spread in Anhui Province from 2024 to 2030 appeared approximately "M"⁃shaped curves by SVR and NAR neural network models, approximately "W"⁃shaped curves by BP and GRU neural network models, and a unimodal conical curve by the LSTM neural network model. The LSTM neural network model had the best effect for predicting areas of O. hupensis snail spread in Anhui Province, with the RMSE of 1 277 480, MAE of 797 422 and R2 of 0.978 9, respectively. Conclusions Among the five models, The LSTM neural network model has a high efficiency for predicting areas of O. hupensis snail spread in Anhui Province, which may serve as a tool to investigate the trends in areas of O. hupensis snail spread. 

Key words: Oncomelania hupensis, Machine learning, Predictive efficiency, Support vector regression model, Nonlinear autoregressive neural network, Back propagation neural network, Gated recurrent unit neural network, Long short?term memory neural network, Anhui Province

中图分类号: