基于ViT-WGAN-GP 的水稻病害图像生成方法

吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (4): 747-754.

基于ViT-WGAN-GP 的水稻病害图像生成方法

路阳¹, 许思源¹, 陶贤鹏², 刘启旺¹, 管闯³

1. 黑龙江八一农垦大学信息与电气工程学院,黑龙江大庆163319;2. 上海商泰汽车信息系统有限公司智能车控部, 上海200020; 3. 东北石油大学黑龙江省网络化与智能控制重点实验室,黑龙江大庆163318

收稿日期:2024-06-06 出版日期:2025-08-15 发布日期:2025-08-14
作者简介:路阳(1976— ), 男,黑龙江双城人,黑龙江八一农垦大学教授,博士生导师,主要从事模式识别与机器学习研究,(Tel) 86-13845989360(E-mail)luyanga@ sina. com。
基金资助:
国家自然科学基金资助项目(62476081); 黑龙江省自然科学基金联合引导基金资助项目(LH2024F048)

mage Generation Method of Rice Disease Based on ViT-WGAN-GP

LU Yang¹, XU Siyuan¹, TAO Xianpeng², LIU Qiwang¹, GUAN Chuang³

1. School of Information and Electrical Engineering, Heilongjiang Bayi Agricultural Reclamation University, Daqing 163319, China; 2. Intelligent Vehicle Control Department, Shanghai Shangtai Automotive Information System Limited, Shanghai 200020, China; 3. Heilongjiang Key Laboratory of Networking and Intelligent Control, Northeast Petroleum University, Daqing 163318, China

Received:2024-06-06 Online:2025-08-15 Published:2025-08-14

摘要/Abstract

摘要： 针对水稻病害图像数据集样本较少而影响深度神经网络模型学习的精度问题,提出一种改进的对抗生成网络模型ViT-WGAN-GP(Vision Transformer and Wasserstein Generative Adversarial Networks with Gradient Penalty) 用于对图像数据集进行增强。首先在生成模型引入Vision Transformer结构加强对全局特征的学习; 其次在判别模型采用WGAN-GP结构, 通过Wasserstein衡量函数和梯度惩罚项保证模型训练的稳定性, 提升生成图像的效果; 最后使用增强后的样本集训练深度神经网络模型。实验结果表明,针对水稻病害图像,ViT-WGAN-GP 模型与GAN、WGAN-GP相比生成图像效果提升显著。使用增强后的水稻病害样本集训练VGG16、ResNet34和 GoogLeNet 模型, 水稻病害识别平均准确率分别达到94.3%,96.2%,97.5%, 分别提升了9.7%,2.8%,4.8%。由此可见,该ViT-WGAN-GP模型能生成较为真实的水稻病害图像, 且能在小样本集下, 较大幅度提高深度神经网络模型的识别准确率

关键词: 图像生成, 视觉Transformer, 带梯度惩罚的Wasserstein距离生成对抗网络, 对抗生成网络, 水稻病害

Abstract: In order to solve the problem that the accuracy of deep neural network model learning is affected by the small sample of rice disease image dataset, an improved adversarial generative network model ViT-WGAN-GP (The Fusion of Vision Transformer and Wasserstein Generative Adversarial Networks with Gradient Penalty) is proposed for enhancing the image dataset. Firstly, the Vision Transformer structure is introduced in the generation model to enhance the learning of global features. Secondly, the WGAN-GP structure is used in the discrimination model to ensure the stability of the model training and improved the effect of the generated images through the Wasserstein measure function and the gradient penalty term. Finally, the enhanced sample set is used to train the deep neural network model. The experimental results show that the ViT-WGAN-GP model generates images with significant improvement compared with GAN and WGAN-GP. The average accuracy of rice disease recognition is 94. 3%,96. 2%, and 97. 5% for VGG16, ResNet34, which are improved by 9. 7%, 2. 8%, and 4.8%, respectively. The proposed ViT-WGAN-GP model can generate more realistic rice disease images and can improve the recognition accuracy of deep neural network models significantly with small sample sets.

Key words: image generation, vision transformer, wasserstein generative adversarial networks with gradient penalty(WGAN-GP), generative adversarial networks, rice diseases

中图分类号:

TP391.41

路阳, 许思源, 陶贤鹏, 刘启旺, 管闯. 基于ViT-WGAN-GP 的水稻病害图像生成方法 [J]. 吉林大学学报(信息科学版), 2025, 43(4): 747-754.

LU Yang, XU Siyuan, TAO Xianpeng, LIU Qiwang, GUAN Chuang. mage Generation Method of Rice Disease Based on ViT-WGAN-GP[J]. Journal of Jilin University (Information Science Edition), 2025, 43(4): 747-754.