吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (4): 1244-1252.doi: 10.13229/j.cnki.jdxbgxb201704033

• 论文 • 上一篇    下一篇

基于多级图像序列和卷积神经网络的人体行为识别

马淼, 李贻斌   

  1. 山东大学 控制科学与工程学院,济南 250061
  • 收稿日期:2016-03-24 出版日期:2017-07-20 发布日期:2017-07-20
  • 通讯作者: 李贻斌(1960-),男,教授,博士生导师.研究方向:智能机器人,特种机器人,智能车辆.E-mail:liyb@sdu.edu.cn
  • 作者简介:马淼(1989-),女,博士研究生.研究方向:机器视觉,智能机器人,模式识别与智能系统.E-mail:mamiaosdu@hotmail.com
  • 基金资助:
    “863”国家高技术研究发展计划项目(2015AA042201); 国家自然科学基金项目(61233014).

Multi-level image sequences and convolutional neural networks based human action recognition method

MA Miao, LI Yi-bin   

  1. College of Control Science and Engineering, Shandong University, Ji'nan 250061,China
  • Received:2016-03-24 Online:2017-07-20 Published:2017-07-20

摘要: 首先,构造出能获得更丰富人体行为信息的四级图像序列结构,并分别用卷积神经网络进行处理,从而得到包含表观、运动、前景和背景信息的特征。然后,提出了一种对视频中行为进行分解的方法,将完整行为分解为由粗略到细致的子行为,从而得到更细致的人体行为描述,获取到更具代表性的行为特征。最后,通过两个行为数据集上的验证及对比实验证明了该方法可有效提高行为识别的准确度。

关键词: 人工智能, 行为识别, 视频理解, 卷积神经网络

Abstract: A multi-level image sequences and convolutional neural networks human action recognition method is proposed. First, a four-level image sequence structure is constructed, which is able to obtain richer information of human actions. Then the four-level image sequences are processed by convolutional neural networks. This structure is able to use appearance, motion, foreground and background information more sufficiently. Besides, a decomposition method of video sequence is proposed, which is able to acquire more detailed human activity information. This method decomposes each level sequence into sub-sequences, and represents actions from coarse to fine, thus, achieving more representative human activity features. The efficiency of the proposed method is verified by two challenging human action databases. The experiment results show that the proposed method improves the action recognition accuracy efficiently.

Key words: artificial intelligence, action recognition, video understanding, convolutional neural network

中图分类号: 

  • TP183
[1] Wang H, Schmid C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013: 3551-3558.
[2] 王丹, 张祥合. 基于 HOG 和 SVM 的人体行为仿生识别方法[J]. 吉林大学学报: 工学版, 2013, 43(增刊1): 489-492.
Wang Dan, Zhang Xian-ghe. Biomimetic recognition method of human behavior based on HOG and SVM[J]. Journal of Jilin University(Engineering and Technology Edition), 2013, 43(Sup.1): 489-492.
[3] Prest A, Ferrari V, Schmid C. Explicit modeling of human-object interactions in realistic videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013, 35(4): 835-848.
[4] Wang H, Klaser A, Schmid C, et al. Action recognition by dense trajectories[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Colorado Springs, CO,USA,2011: 3169-3176.
[5] Iosifidis A, Tefas A, Pitas I. Discriminant bag of words based representation for human action recognition[J]. Pattern Recognition Letters, 2014, 49: 185-192.
[6] Peng X, Zou C, Qiao Y, et al. Action recognition with stacked fisher vectors[C]//European Conference on Computer Vision(ECCV),Zurich,Switzerland,2014: 581-595.
[7] Souly N, Shah M. Visual saliency detection using group lasso regularization in videos of natural scenes[J]. International Journal of Computer Vision, 2016,117(1):93-110.
[8] Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011: 3361-3368.
[9] Ma S, Zhang J, Ikizler-Cinbis N, et al. Action recognition and localization by hierarchical space-time segments[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013:2744-2751.
[10] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems,Lake Tahoe,Nevada,USA,2012: 1097-1105.
[11] Gkioxari G, Girshick R, Malik J. Contextual action recognition with r*cnn[C]//Proceedings of the IEEE International Conference on Computer Vision,Santiago,Chile,2015:1080-1088.
[12] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems,2014: 568-576.
[13] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Colu mbus,Ohio,USA,2014:1725-1732.
[14] Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping[C]//European Conference on Computer Vision(ECCV), Prague,Czech Republic,2004:25-36.
[15] Gkioxari G, Malik J. Finding action tubes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 759-768.
[16] Cherian A, Mairal J, Alahari K, et al. Mixing body-part sequences for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Columbus,Ohio,USA,2014:2353-2360.
[17] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets[J]. arXiv Preprint arXiv:1405.3531, 2014.
[18] Deng J, Dong W, Socher R, et al. Imagenet:a large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009: 248-255.
[19] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012.
[20] Ravanbakhsh M, Mousavi H, Rastegari M, et al. Action Recognition with Image Based CNN Features[J]. arXiv preprint arXiv:1512.03980, 2015.
[21] Hamming R W. Error detecting and errorcorrecting codes[J]. Bell System Technical Journal, 1950, 29(2): 147-160.
[22] Cheron G, Laptev I, Schmid C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision,Santiago,Chile,2015:3218-3226.
[23] Chatfield K, Lempitsky V S, Vedaldi A, et al. The devil is in the details: an evaluation of recent feature encoding methods[C]//BMVC,Dundee,UK,2011:1-12.
[24] Rodriguez M D, Ahmed J, Shah M. Action mach a spatio-temporal maximum average correlation height filter for action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Anchorage,AK,USA,2008:1-8.
[25] Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013: 3192-3199.
[1] 徐岩,孙美双. 基于卷积神经网络的水下图像增强方法[J]. 吉林大学学报(工学版), 2018, 48(6): 1895-1903.
[2] 董飒, 刘大有, 欧阳若川, 朱允刚, 李丽娜. 引入二阶马尔可夫假设的逻辑回归异质性网络分类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1571-1577.
[3] 顾海军, 田雅倩, 崔莹. 基于行为语言的智能交互代理[J]. 吉林大学学报(工学版), 2018, 48(5): 1578-1585.
[4] 王旭, 欧阳继红, 陈桂芬. 基于垂直维序列动态时间规整方法的图相似度度量[J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[5] 张浩, 占萌苹, 郭刘香, 李誌, 刘元宁, 张春鹤, 常浩武, 王志强. 基于高通量数据的人体外源性植物miRNA跨界调控建模[J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[6] 黄岚, 纪林影, 姚刚, 翟睿峰, 白天. 面向误诊提示的疾病-症状语义网构建[J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[7] 李雄飞, 冯婷婷, 骆实, 张小利. 基于递归神经网络的自动作曲算法[J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[8] 刘杰, 张平, 高万夫. 基于条件相关的特征选择方法[J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[9] 王旭, 欧阳继红, 陈桂芬. 基于多重序列所有公共子序列的启发式算法度量多图的相似度[J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[10] 杨欣, 夏斯军, 刘冬雪, 费树岷, 胡银记. 跟踪-学习-检测框架下改进加速梯度的目标跟踪[J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[11] 刘雪娟, 袁家斌, 许娟, 段博佳. 量子k-means算法[J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[12] 王方石, 王坚, 李兵, 王博. 基于深度属性学习的交通标志检测[J]. 吉林大学学报(工学版), 2018, 48(1): 319-329.
[13] 曲慧雁, 赵伟, 秦爱红. 基于优化算子的快速碰撞检测算法[J]. 吉林大学学报(工学版), 2017, 47(5): 1598-1603.
[14] 李嘉菲, 孙小玉. 基于谱分解的不确定数据聚类方法[J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
[15] 邵克勇, 陈丰, 王婷婷, 王季驰, 周立朋. 无平衡点分数阶混沌系统全状态自适应控制[J]. 吉林大学学报(工学版), 2017, 47(4): 1225-1230.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!