吉林大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (4): 1260-1265.doi: 10.13229/j.cnki.jdxbgxb201504034

Previous Articles     Next Articles

Plagiarism detection in student programs based on frequent closed sequence mining

WANG Ke-chao1, 2, WANG Tian-tian2, SU Xiao-hong2, MA Pei-jun2   

  1. 1.School of Software, Harbin University, Harbin 150086, China;
    2.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Received:2014-01-03 Online:2015-07-01 Published:2015-07-01

Abstract: Plagiarism in student programs is a common phenomenon, which decreases the credibility of assessment. However, manual detection loads a heavy burden on the teachers. To solve this problem, a plagiarism detection model is proposed. First, student programs are converted into token sequences through lexical analysis. Then, the token sequences are hashed to digital sequences. Then, the frequent closed sequences are mined by the BIDE algorithm. On this basis, the similar code fragments are detected and the plagiarism programs are identified by the calculated similarity. Experimental results show that, compared with the commonly used toll MOSS, the proposed method is more precise. It can not only give accurate statistical information of similar programs, but also explicitly display the plagiarized code fragments.

Key words: computer software, plagiarism detection, frequent closed sequence mining, similarity, similar code

CLC Number: 

  • TP311
[1] Shawky D M, Ali A F. An approach for assessing similarity metrics used in metric-based clone detection techniques[C]∥The 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), Chengdu,2010: 580-584.
[2] Brixtel R, Fontaine M, Lesner B, et al. Language-independent clone detection applied to plagiarism detection[C]∥The 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM),Timisoara,2010: 77-86.
[3] Dang Y, Ge S, Huang R, et al. Code clone detection experience at Microsoft[C]∥Proceedings of the 5th International Workshop on Software Clones, ACM, 2011: 63-64.
[4] Zibran M F, Roy C K. IDE-based real-time focused search for near-miss clones[C]∥Proceedings of the 27th Annual ACM Symposium on Applied Computing, ACM, 2012: 1235-1242.
[5] Higo Y, Kamiya T, Kusumoto S, et al. Method and implementation for investigating code clones in a software system[J]. Information and Software Technology, 2007, 49(9): 985-998.
[6] 邓爱萍. 程序代码相似度度量算法研究[J]. 计算机工程与设计, 2008, 29(17): 4636-4638.
Deng Ai-ping. Study on similarity measurement of program code[J]. Computer Engineering and Design, 2008, 29(17): 4636-4638.
[7] 古平, 张锋, 周海涛. 一种程序源代码相似度度量方法[J]. 计算机工程, 2012, 38(6): 37-39.
Gu Ping, Zhang Feng, Zhou Hai-tao. Method of program source code similarity measurement[J]. Computer Engineering, 2012, 38(6): 37-39.
[8] 张丽萍, 刘东升, 李彦臣, 等. 一种基于 AST 的代码抄袭检测方法[J]. 计算机应用研究, 2011, 28(12): 4616-4620.
Zhang Li-ping, Liu Dong-sheng, Li Yan-chen, et al. AST-based code plagiarism detection method[J]. Application Research of Computers, 2011, 28(12): 4616-4620.
[9] Schleimer S, Wilkerson D S, Aiken A. Winnowing: local algorithms for document fingerprinting[C]∥Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, 2003: 76-85.
[10] Wang J, Han J. BIDE: efficient mining of frequent closed sequences[C]∥IEEE 20th International Conference on Data Engineering, 2004: 79-90.
[1] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[2] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[3] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[4] MA Jian, FAN Jian-ping, LIU Feng, LI Hong-hui. The evolution model of objective-oriented software system [J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[5] TAN Si-qiao, ZHANG Xi, LI Qian, AI Chen. Information push model-building based on maximum mutual information coefficient [J]. 吉林大学学报(工学版), 2018, 48(2): 558-563.
[6] LUO Yang-xia, GUO Ye. Software recognition based on features of data dependency [J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[7] LUAN Wen-peng, LIU Yong-lei, WANG Peng, JIN Zhi-gang, WANG Jian. Novel universal security mechanism for energy internet based on trusted platform module [J]. 吉林大学学报(工学版), 2017, 47(6): 1933-1938.
[8] LIU Ying, ZHANG Kai, YU Xiang-jun. Multi-objective optimization of hydrostatic bearing of hollow shaft based on surrogate model [J]. 吉林大学学报(工学版), 2017, 47(4): 1130-1137.
[9] WANG Zhi-yuan, LI Guo-dong, WANG Yong-hua. Optimization decision model for bridge design based on AHP-TOPSIS [J]. 吉林大学学报(工学版), 2017, 47(2): 478-482.
[10] YING Huan, WANG Dong-hui, WU Cheng-gang, WANG Zhe, TANG Bo-wen, LI Jian-jun. Efficient deterministic replay technique on commodity system environment [J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[11] LI Yong, HUANG Zhi-qiu, WANG Yong, FANG Bing-wu. New approach of cross-project defect prediction based on multi-source data [J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[12] WANG Gui-shen, HUANG Lan, WANG Yan, SONG Li-ming, OU Ge. Link clustering method based on maxima and minima non-neighbor link similarity [J]. 吉林大学学报(工学版), 2016, 46(5): 1616-1621.
[13] WANG Nian-bin, ZHU Guan-wen, ZHOU Lian-ke, WANG Hong-wei. Novel dataspace index for efficient processing of path query [J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[14] TE Ri-gen, JIANG Sheng, LI Xiong-fei, LI Jun. Document compression scheme based on integer data [J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[15] CHEN Peng-fei, TIAN Di, YANG Guang. Design and implementation of LIBS software based on MVC architecture [J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!