吉林大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (6): 2034-2041.doi: 10.13229/j.cnki.jdxbgxb201606037

Previous Articles     Next Articles

New approach of cross-project defect prediction based on multi-source data

LI Yong1, 2, HUANG Zhi-qiu1, WANG Yong1, FANG Bing-wu1   

  1. 1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2.Key Laboratory of Network Information Security and Public Opinion Analysis, Xinjiang Normal University, Urumqi 830054, China
  • Received:2015-11-30 Online:2016-11-20 Published:2016-11-20

Abstract: Software defect prediction is significant to the optimization of quality assurance activities. The Within Project Defect Prediction (WPDP) can produce high quality results, but requires historical data of the project, which is often not available in practical scenarios. The cross-project Defect Prediction (CPDP) can effectively overcome the drawback of WPDP. However, existing research suggested that CPDP is particularly challenging and often yields poor performance, and very few studies investigated the practical guidelines on how to select suitable training data for CPDP from multi-source project data. A novel multi-source data driven approach for CPDP is proposed. First, the hierarchical filter strategy based on the characteristics of both projects and modules is developed to select training data. Then, the Naive Bayes algorithm is used to realize the prediction model. Experimental results of 14 open-source projects show that the proposed approach significantly improves CPDP performance, and can compete with WPDP.

Key words: computer software, cross project defects prediction, multi-source projects data, hierarchical data selection, Naive Bayes algorithm

CLC Number: 

  • TP311.5
[1] Caglayan B, Tosun M A, Bener A B, et al. Predicting defective modules in different test phases[J]. Software Quality Journal,2015,23(2):205-227.
[2] 陈媛,沈湘衡,王安邦,等. 似然关系模型在航天软件缺陷预测中的应用[J]. 光学精密工程, 2013,21(7):1865-1872.
Chen Yuan,Shen Xiang-heng,Wang An-bang,et al.Application of probabilistic relational model toaerospace software defect prediction[J]. Optics and Precision Engineering, 2013,21(7):1865-1872.
[3] 王红园,郭永飞,姬琪. 面向需求覆盖的航天软件测试用例优化方法[J]. 光学精密工程,2014,22(1):228-234.
Wang Hong-yuan, Guo Yong-fei, Ji Qi. Optimization of aerospace software test cases based on requirement coverage[J]. Optics and Precision Engineering, 2014,22(1):228-234.
[4] Shepperd M, Bowes D, Hall T. Researcher bias: the use of machine learning in software defect prediction[J]. IEEE Transactions on Software Engineering,2014,40(6):603-616.
[5] Turhan B. On the dataset shift problem in software engineering prediction models[J]. Empirical Software Engineering,2012,17(1):62-74.
[6] Watanabe S, Kaiya H, Kaijiri K. Adapting a fault prediction model to allow inter language reuse[C]∥Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, New York, NY, USA,2008:19-24.
[7] Nam J, Pan S J, Kim S. Transfer defect learning[C]∥Proceedings of the 2013 International Conference on Software Engineering, San Francisco,2013:382-391.
[8] He Peng, Li Bing, Liu Xiao, et al. An empirical study on software defect prediction with a simplified metric set[J]. Information and Software Technology,2015,59:170-190.
[9] Menzies T, Butcher A, Cok D, et al. Local versus global lessons for defect prediction and effort estimation[J]. IEEE Transactions on Software Engineering,2013,39(6):822-834.
[10] Turhan B, Menzies T, Bener A B, et al. On the relative value of cross-company and within-company data for defect prediction[J]. Empirical Software Engineering,2009,14(5):540-578.
[11] Peters F, Menzies T, Marcus A. Better cross company defect prediction[C]∥Proceedings of the Tenth International Workshop on Mining Software Repositories, San Francisco, CA, USA, 2013:409-418.
[12] He Z, Shu F, Yang Y, et al. An investigation on the feasibility of cross-project defect prediction[J]. Automated Software Engineering,2012,19(2):167-199.
[13] Herbold S. Training data selection for cross-project defect prediction[C]∥Proceedings of the 9th International Conference on Predictive Models in Software Engineering, Baltimore,2013:1-10.
[14] Turhan B, Bener A. Analysis of Naive Bayes' assumptions on software fault data:an empirical study[J]. Data & Knowledge Engineering,2009,68(2):278-290.
[15] Pang-Ning T, Steinbach M, Kumar V. Introduction to Data Mining[M]. New York: Pearson,2005: 231-236.
[16] Jureczko M, Spinellis D. Using object-oriented design metrics to predict software defects[C]∥Fifth International Conference on Dependability of Computer Systems DepCoS,Poland,2010: 69-81.
[17] Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem[J]. Information Sciences,2009,179(8):1040-1058.
[18] Rahman F, Devanbu P. How, and why, process metrics are better[C]∥Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA,2013:432-441.
[19] Lessmann S, Baesens B, Mues C, et al. Benchmarking classification models for software defect prediction: a proposed framework and novel findings[J]. IEEE Transactions on Software Engineering,2008,34(4): 485-496.
[20] Okutan A, Y 1 ld 1 z O T. Software defect prediction using Bayesian networks[J]. Empirical Software Engineering,2014,19(1):154-181.
[21] Rahman F, Posnett D, Devanbu P. Recalling the “imprecision” of cross-project defect prediction[C]∥Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, North Carolina,2012:1-11.
[22] Shull F, Basili V, Boehm B, et al. What we have learned about fighting defects[C]∥IEEE Symposium on Software Metrics, Washington, DC, USA, 2002:249-258.
[1] MA Jian, FAN Jian-ping, LIU Feng, LI Hong-hui. The evolution model of objective-oriented software system [J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[2] LUO Yang-xia, GUO Ye. Software recognition based on features of data dependency [J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[3] YING Huan, WANG Dong-hui, WU Cheng-gang, WANG Zhe, TANG Bo-wen, LI Jian-jun. Efficient deterministic replay technique on commodity system environment [J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[4] WANG Nian-bin, ZHU Guan-wen, ZHOU Lian-ke, WANG Hong-wei. Novel dataspace index for efficient processing of path query [J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[5] TE Ri-gen, JIANG Sheng, LI Xiong-fei, LI Jun. Document compression scheme based on integer data [J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[6] CHEN Peng-fei, TIAN Di, YANG Guang. Design and implementation of LIBS software based on MVC architecture [J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[7] LIU Lei, WANG Yan-yan, SHEN Chun, LI Yu-xiang, LIU Lei. Performance portable GPU parallel optimization technique on Bellman-Ford algorithm [J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.
[8] FENG Xiao-ning, WANG Zhuo, ZHANG Xu. Formal method for routing protocol of WSN based on L-π calculus [J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[9] LI Ming-zhe, WANG Jin-lin, CHEN Xiao, CHEN Jun. Architecture model of streaming media applications on network processors(VPL) [J]. 吉林大学学报(工学版), 2015, 45(5): 1572-1580.
[10] WANG Ke-chao, WANG Tian-tian, SU Xiao-hong, MA Pei-jun. Plagiarism detection in student programs based on frequent closed sequence mining [J]. 吉林大学学报(工学版), 2015, 45(4): 1260-1265.
[11] HUANG Hong-tao,WANG Jing,YE Hai-zhi,HUANG Shao-bin. Lazy slicing based method for verifying linear temporal logic property [J]. 吉林大学学报(工学版), 2015, 45(1): 245-251.
[12] FAN Da-juan, HUANG Zhi-qiu, XIAO Fang-xiong, ZHU Yi, WANG Jin. Compatibility analysis and adaptor generation for multi-service interaction [J]. 吉林大学学报(工学版), 2014, 44(4): 1094-1103.
[13] HE Qin-lu, LI Zhan-huai, WANG Le-xiao, WANG Rui. Testing technology for aggregate bandwidth of cloud storage system [J]. 吉林大学学报(工学版), 2014, 44(4): 1104-1111.
[14] LIU Guo-qi, LIU Hui, GAO Yu, LIU Ying, ZHU Zhi-liang. Resource dynamic pricing strategy based on utility in cloud computing [J]. 吉林大学学报(工学版), 2013, 43(06): 1631-1637.
[15] DENG Hui, WU Jin-zhao. Approximate bisimulation for linear semi-algebraic transition systems [J]. 吉林大学学报(工学版), 2013, 43(04): 1052-1058.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LIU Song-shan, WANG Qing-nian, WANG Wei-hua, LIN Xin. Influence of inertial mass on damping and amplitude-frequency characteristic of regenerative suspension[J]. 吉林大学学报(工学版), 2013, 43(03): 557 -563 .
[2] CHU Liang, WANG Yan-bo, QI Fu-wei, ZHANG Yong-sheng. Control method of inlet valves for brake pressure fine regulation[J]. 吉林大学学报(工学版), 2013, 43(03): 564 -570 .
[3] LI Jing, WANG Zi-han, YU Chun-xian, HAN Zuo-yue, SUN Bo-hua. Design of control system to follow vehicle state with HIL test beach[J]. 吉林大学学报(工学版), 2013, 43(03): 577 -583 .
[4] HU Xing-jun, LI Teng-fei, WANG Jing-yu, YANG Bo, GUO Peng, LIAO Lei. Numerical simulation of the influence of rear-end panels on the wake flow field of a heavy-duty truck[J]. 吉林大学学报(工学版), 2013, 43(03): 595 -601 .
[5] WANG Tong-jian, CHEN Jin-shi, ZHAO Feng, ZHAO Qing-bo, LIU Xin-hui, YUAN Hua-shan. Mechanical-hydraulic co-simulation and experiment of full hydraulic steering systems[J]. 吉林大学学报(工学版), 2013, 43(03): 607 -612 .
[6] ZHANG Chun-qin, JIANG Gui-yan, WU Zheng-yan. Factors influencing motor vehicle travel departure time choice behavior[J]. 吉林大学学报(工学版), 2013, 43(03): 626 -632 .
[7] MA Wan-jing, XIE Han-zhou. Integrated control of main-signal and pre-signal on approach of intersection with double stop line[J]. 吉林大学学报(工学版), 2013, 43(03): 633 -639 .
[8] YU De-xin, TONG Qian, YANG Zhao-sheng, GAO Peng. Forecast model of emergency traffic evacuation time under major disaster[J]. 吉林大学学报(工学版), 2013, 43(03): 654 -658 .
[9] WANG Guo-lin, FU Nai-ji, ZHANG Jian, PEI Zi-rong. Simulation of the radial tire curing process based on K-R kinetic model[J]. 吉林大学学报(工学版), 2013, 43(03): 659 -664 .
[10] XIAO Yun, LEI Jun-qing, ZHANG Kun, LI Zhong-san. Fatigue stiffness degradation of prestressed concrete beam under multilevel amplitude cycle loading[J]. 吉林大学学报(工学版), 2013, 43(03): 665 -670 .