吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (5): 1128-1137.

• • 上一篇    下一篇

基于双向时空特征融合 GAN 的视频异常检测框架 

赵玉刚1a, 杨雨佳1b, 项 婷2, 金弘林1a   

  1. 1. 广州理工学院 a. 计算机科学与工程学院;b. 智能制造与电气工程学院,广州510540; 2. 中山大学第一附属医院中医科,广州510080
  • 收稿日期:2023-10-07 出版日期:2025-09-28 发布日期:2025-11-20
  • 通讯作者: 杨雨佳(1983— ), 女, 吉林白城人, 广州理工学院高级实验师, 主要从事机器学习 研究, (Tel)86-13424139689(E-mail)290108658@ qq. com。 E-mail:290108658@ qq. com
  • 作者简介:赵玉刚(1979— ), 男, 河南南阳人, 广州理工学院讲师, 主要从事机器学习研究, (Tel)86-15989150567(E-mail) 565084380@ qq. com
  • 基金资助:
    广东省教育科学规划课题基金资助项目(2021GXJK275);广东省教育科学规划课题基金资助项目(2023GXJK607);广东省 重点领域专项基金资助项目(2021ZDZX1070)

Video Anomaly Detection Framework Based on Bidirectional Spatio-Temporal Feature Fusion GAN 

ZHAO Yugang1a, YANG Yujia1b, XIANG Ting2, JIN Honglin1a   

  1. 1a. School of Computer Science and Engineering; 1b. School of Intelligent Manufacturing and Electrical Engineering, Guangzhou Institute of Science and Technology, Guangzhou 510540, China; 2. Chinese Medicine Department, First Affiliated Hospital of Sun Yat-Sen University Traditional, Guangzhou 510080, China
  • Received:2023-10-07 Online:2025-09-28 Published:2025-11-20

摘要: 针对复杂场景下视频异常检测问题, 提出了基于改进对抗生成网络(GAN: Generative Adversarial Network)的视频异常检测框架。 利用两个判别器对生成器进行对抗训练,并通过回归损失函数增强双向预测的 一致性。 生成器为使用FusionNetLSTM(Long Short Term Memory)构建的时空特征融合网络, 取正向和反向的视频序列为输入,并分别输出预测视频帧和序列。 两个判别器均采用PatchGAN架构,帧判别器用于区分合成帧, 序列判别器用于判别帧序列中是否包含至少一个合成帧,以保持时间一致性,提高预测网络的鲁棒性和准确性。 最后,基于PNSR(PeakSignal to Noise Ratio)均值归一化完成异常得分计算。 实验结果表明,所提框架能很好地捕捉视频序列中的双向时空特征,并在难度较大的视频异常检测公开数据集UCF-Crime(University of Central Florida Crime) ShanghaiTech 上均取得了优于其他先进方法的性能。

关键词: 视频异常检测, 生成对抗网络, FusionNet模型, 长短时记忆网络, 时空特征融合

Abstract: In order to improve the accuracy of video anomaly detection in complex scenes, a video anomaly detection framework based on improved GAN(Generative Adversarial Network) is proposed. Two discriminators are used for the adversarial training of the generator, and the bidirectional prediction consistency is enhanced through a regression loss function. FusionNet and LSTM(Long Short Term Memory) are combined to form a generator structure based on spatio-temporal feature fusion. Forward and backward video sequences are taken as the inputs of the generator, and predicted video frames and predicted video sequences are output respectively. Patch GAN architecture is adopted for both of the discriminators, the frame discriminator is used to distinguish synthetic frames and the sequence discriminator is used to determine whether the frame sequence contains at least one synthetic frame to maintain temporal consistency of the predicted frames, to improve the robustness and accuracy of the predicted network. Finally, the anomaly score is calculated based on the normalized mean PNSR (Peak Signal to Noise Ratio). Experimental results show that the proposed framework can effectively capture the bidirectional spatio-temporal features in video sequences and outperforms other state-of-the-art methods on thechallenging public video anomaly detection datasets UCF-Crime ( University of Central Florida Crime) and ShanghaiTech.

Key words: video anomaly detection, generative adversarial network(GAN), FusionNet model, long short-term memory(LSTM), spatio-temporal feature fusion

中图分类号: 

  • TP391