吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (4): 1244-1252.doi: 10.13229/j.cnki.jdxbgxb201704033

• Orginal Article • Previous Articles     Next Articles

Multi-level image sequences and convolutional neural networks based human action recognition method

MA Miao, LI Yi-bin   

  1. College of Control Science and Engineering, Shandong University, Ji'nan 250061,China
  • Received:2016-03-24 Online:2017-07-20 Published:2017-07-20

Abstract: A multi-level image sequences and convolutional neural networks human action recognition method is proposed. First, a four-level image sequence structure is constructed, which is able to obtain richer information of human actions. Then the four-level image sequences are processed by convolutional neural networks. This structure is able to use appearance, motion, foreground and background information more sufficiently. Besides, a decomposition method of video sequence is proposed, which is able to acquire more detailed human activity information. This method decomposes each level sequence into sub-sequences, and represents actions from coarse to fine, thus, achieving more representative human activity features. The efficiency of the proposed method is verified by two challenging human action databases. The experiment results show that the proposed method improves the action recognition accuracy efficiently.

Key words: artificial intelligence, action recognition, video understanding, convolutional neural network

CLC Number: 

  • TP183
[1] Wang H, Schmid C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013: 3551-3558.
[2] 王丹, 张祥合. 基于 HOG 和 SVM 的人体行为仿生识别方法[J]. 吉林大学学报: 工学版, 2013, 43(增刊1): 489-492.
Wang Dan, Zhang Xian-ghe. Biomimetic recognition method of human behavior based on HOG and SVM[J]. Journal of Jilin University(Engineering and Technology Edition), 2013, 43(Sup.1): 489-492.
[3] Prest A, Ferrari V, Schmid C. Explicit modeling of human-object interactions in realistic videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013, 35(4): 835-848.
[4] Wang H, Klaser A, Schmid C, et al. Action recognition by dense trajectories[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Colorado Springs, CO,USA,2011: 3169-3176.
[5] Iosifidis A, Tefas A, Pitas I. Discriminant bag of words based representation for human action recognition[J]. Pattern Recognition Letters, 2014, 49: 185-192.
[6] Peng X, Zou C, Qiao Y, et al. Action recognition with stacked fisher vectors[C]//European Conference on Computer Vision(ECCV),Zurich,Switzerland,2014: 581-595.
[7] Souly N, Shah M. Visual saliency detection using group lasso regularization in videos of natural scenes[J]. International Journal of Computer Vision, 2016,117(1):93-110.
[8] Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011: 3361-3368.
[9] Ma S, Zhang J, Ikizler-Cinbis N, et al. Action recognition and localization by hierarchical space-time segments[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013:2744-2751.
[10] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems,Lake Tahoe,Nevada,USA,2012: 1097-1105.
[11] Gkioxari G, Girshick R, Malik J. Contextual action recognition with r*cnn[C]//Proceedings of the IEEE International Conference on Computer Vision,Santiago,Chile,2015:1080-1088.
[12] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems,2014: 568-576.
[13] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Colu mbus,Ohio,USA,2014:1725-1732.
[14] Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping[C]//European Conference on Computer Vision(ECCV), Prague,Czech Republic,2004:25-36.
[15] Gkioxari G, Malik J. Finding action tubes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 759-768.
[16] Cherian A, Mairal J, Alahari K, et al. Mixing body-part sequences for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Columbus,Ohio,USA,2014:2353-2360.
[17] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets[J]. arXiv Preprint arXiv:1405.3531, 2014.
[18] Deng J, Dong W, Socher R, et al. Imagenet:a large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009: 248-255.
[19] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012.
[20] Ravanbakhsh M, Mousavi H, Rastegari M, et al. Action Recognition with Image Based CNN Features[J]. arXiv preprint arXiv:1512.03980, 2015.
[21] Hamming R W. Error detecting and errorcorrecting codes[J]. Bell System Technical Journal, 1950, 29(2): 147-160.
[22] Cheron G, Laptev I, Schmid C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision,Santiago,Chile,2015:3218-3226.
[23] Chatfield K, Lempitsky V S, Vedaldi A, et al. The devil is in the details: an evaluation of recent feature encoding methods[C]//BMVC,Dundee,UK,2011:1-12.
[24] Rodriguez M D, Ahmed J, Shah M. Action mach a spatio-temporal maximum average correlation height filter for action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Anchorage,AK,USA,2008:1-8.
[25] Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision,Sydney,NSW,Australia,2013: 3192-3199.
[1] XU Yan,SUN Mei-shuang. Enhancing underwater image based on convolutional neural networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1895-1903.
[2] DONG Sa, LIU Da-you, OUYANG Ruo-chuan, ZHU Yun-gang, LI Li-na. Logistic regression classification in networked data with heterophily based on second-order Markov assumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1571-1577.
[3] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
[4] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[5] ZHANG Hao, ZHAN Meng-ping, GUO Liu-xiang, LI Zhi, LIU Yuan-ning, ZHANG Chun-he, CHANG Hao-wu, WANG Zhi-qiang. Human exogenous plant miRNA cross-kingdom regulatory modeling based on high-throughout data [J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[6] HUANG Lan, JI Lin-ying, YAO Gang, ZHAI Rui-feng, BAI Tian. Construction of disease-symptom semantic net for misdiagnosis prompt [J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[7] LI Xiong-fei, FENG Ting-ting, LUO Shi, ZHANG Xiao-li. Automatic music composition algorithm based on recurrent neural network [J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[8] LIU Jie, ZHANG Ping, GAO Wan-fu. Feature selection method based on conditional relevance [J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[9] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[10] YANG Xin, XIA Si-jun, LIU Dong-xue, FEI Shu-min, HU Yin-ji. Target tracking based on improved accelerated gradient under tracking-learning-detection framework [J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[11] LIU Xue-juan, YUAN Jia-bin, XU Juan, DUAN Bo-jia. Quantum k-means algorithm [J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[12] WANG Fang-shi, WANG Jian, LI Bing, WANG Bo. Deep attribute learning based traffic sign detection [J]. 吉林大学学报(工学版), 2018, 48(1): 319-329.
[13] QU Hui-yan, ZHAO Wei, QIN Ai-hong. A fast collision detection algorithm based on optimization operator [J]. 吉林大学学报(工学版), 2017, 47(5): 1598-1603.
[14] LI Jia-fei, SUN Xiao-yu. Clustering method for uncertain data based on spectral decomposition [J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
[15] SHAO Ke-yong, CHEN Feng, WANG Ting-ting, WANG Ji-chi, ZHOU Li-peng. Full state based adaptive control of fractional order chaotic system without equilibrium point [J]. 吉林大学学报(工学版), 2017, 47(4): 1225-1230.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!