吉林大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (6): 2034-2041.doi: 10.13229/j.cnki.jdxbgxb201606037

• • 上一篇    下一篇

基于多源数据的跨项目软件缺陷预测

李勇1, 2, 黄志球1, 王勇1, 房丙午1   

  1. 1.南京航空航天大学 计算机科学与技术学院,南京 211106;
    2.新疆师范大学 网络信息安全与舆情分析重点实验室,乌鲁木齐 830054
  • 收稿日期:2015-11-30 出版日期:2016-11-20 发布日期:2016-11-20
  • 通讯作者: 黄志球(1965-),男,教授,博士生导师.研究方向:软件工程.E-mail:zqhuang@nuaa.edu.cn
  • 作者简介:李勇(1983-),男,博士研究生.研究方向:实证软件工程.E-mail:liyong@live.com
  • 基金资助:
    国家自然科学基金项目(61562087,61272083); 江苏省普通高校研究生科研创新计划项目(CXLX13_160); 中央高校基本科研业务费专项资金项目

New approach of cross-project defect prediction based on multi-source data

LI Yong1, 2, HUANG Zhi-qiu1, WANG Yong1, FANG Bing-wu1   

  1. 1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2.Key Laboratory of Network Information Security and Public Opinion Analysis, Xinjiang Normal University, Urumqi 830054, China
  • Received:2015-11-30 Online:2016-11-20 Published:2016-11-20

摘要: 跨项目(CP)的软件缺陷预测方法可以解决传统基于目标项目(WP)实现预测时要求有历史积累数据以及缺陷标注代价较高等问题。针对已有CP方法中存在的预测性能较低和可操作性较差等不足,提出了一种基于多源数据的跨项目软件缺陷预测方法。首先获取与目标项目特征相似的多源项目为候选;然后以候选项目的软件模块引导训练数据的选择;最后基于朴素贝叶斯算法实现预测模型。采用真实的软件缺陷数据进行实验,结果表明该方法的性能优于传统的WP方法,可以代替WP方法用于软件工程实践。

关键词: 计算机软件, 跨项目缺陷预测, 多源项目数据, 分级数据选择, 朴素贝叶斯算法

Abstract: Software defect prediction is significant to the optimization of quality assurance activities. The Within Project Defect Prediction (WPDP) can produce high quality results, but requires historical data of the project, which is often not available in practical scenarios. The cross-project Defect Prediction (CPDP) can effectively overcome the drawback of WPDP. However, existing research suggested that CPDP is particularly challenging and often yields poor performance, and very few studies investigated the practical guidelines on how to select suitable training data for CPDP from multi-source project data. A novel multi-source data driven approach for CPDP is proposed. First, the hierarchical filter strategy based on the characteristics of both projects and modules is developed to select training data. Then, the Naive Bayes algorithm is used to realize the prediction model. Experimental results of 14 open-source projects show that the proposed approach significantly improves CPDP performance, and can compete with WPDP.

Key words: computer software, cross project defects prediction, multi-source projects data, hierarchical data selection, Naive Bayes algorithm

中图分类号: 

  • TP311.5
[1] Caglayan B, Tosun M A, Bener A B, et al. Predicting defective modules in different test phases[J]. Software Quality Journal,2015,23(2):205-227.
[2] 陈媛,沈湘衡,王安邦,等. 似然关系模型在航天软件缺陷预测中的应用[J]. 光学精密工程, 2013,21(7):1865-1872.
Chen Yuan,Shen Xiang-heng,Wang An-bang,et al.Application of probabilistic relational model toaerospace software defect prediction[J]. Optics and Precision Engineering, 2013,21(7):1865-1872.
[3] 王红园,郭永飞,姬琪. 面向需求覆盖的航天软件测试用例优化方法[J]. 光学精密工程,2014,22(1):228-234.
Wang Hong-yuan, Guo Yong-fei, Ji Qi. Optimization of aerospace software test cases based on requirement coverage[J]. Optics and Precision Engineering, 2014,22(1):228-234.
[4] Shepperd M, Bowes D, Hall T. Researcher bias: the use of machine learning in software defect prediction[J]. IEEE Transactions on Software Engineering,2014,40(6):603-616.
[5] Turhan B. On the dataset shift problem in software engineering prediction models[J]. Empirical Software Engineering,2012,17(1):62-74.
[6] Watanabe S, Kaiya H, Kaijiri K. Adapting a fault prediction model to allow inter language reuse[C]∥Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, New York, NY, USA,2008:19-24.
[7] Nam J, Pan S J, Kim S. Transfer defect learning[C]∥Proceedings of the 2013 International Conference on Software Engineering, San Francisco,2013:382-391.
[8] He Peng, Li Bing, Liu Xiao, et al. An empirical study on software defect prediction with a simplified metric set[J]. Information and Software Technology,2015,59:170-190.
[9] Menzies T, Butcher A, Cok D, et al. Local versus global lessons for defect prediction and effort estimation[J]. IEEE Transactions on Software Engineering,2013,39(6):822-834.
[10] Turhan B, Menzies T, Bener A B, et al. On the relative value of cross-company and within-company data for defect prediction[J]. Empirical Software Engineering,2009,14(5):540-578.
[11] Peters F, Menzies T, Marcus A. Better cross company defect prediction[C]∥Proceedings of the Tenth International Workshop on Mining Software Repositories, San Francisco, CA, USA, 2013:409-418.
[12] He Z, Shu F, Yang Y, et al. An investigation on the feasibility of cross-project defect prediction[J]. Automated Software Engineering,2012,19(2):167-199.
[13] Herbold S. Training data selection for cross-project defect prediction[C]∥Proceedings of the 9th International Conference on Predictive Models in Software Engineering, Baltimore,2013:1-10.
[14] Turhan B, Bener A. Analysis of Naive Bayes' assumptions on software fault data:an empirical study[J]. Data & Knowledge Engineering,2009,68(2):278-290.
[15] Pang-Ning T, Steinbach M, Kumar V. Introduction to Data Mining[M]. New York: Pearson,2005: 231-236.
[16] Jureczko M, Spinellis D. Using object-oriented design metrics to predict software defects[C]∥Fifth International Conference on Dependability of Computer Systems DepCoS,Poland,2010: 69-81.
[17] Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem[J]. Information Sciences,2009,179(8):1040-1058.
[18] Rahman F, Devanbu P. How, and why, process metrics are better[C]∥Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA,2013:432-441.
[19] Lessmann S, Baesens B, Mues C, et al. Benchmarking classification models for software defect prediction: a proposed framework and novel findings[J]. IEEE Transactions on Software Engineering,2008,34(4): 485-496.
[20] Okutan A, Y 1 ld 1 z O T. Software defect prediction using Bayesian networks[J]. Empirical Software Engineering,2014,19(1):154-181.
[21] Rahman F, Posnett D, Devanbu P. Recalling the “imprecision” of cross-project defect prediction[C]∥Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, North Carolina,2012:1-11.
[22] Shull F, Basili V, Boehm B, et al. What we have learned about fighting defects[C]∥IEEE Symposium on Software Metrics, Washington, DC, USA, 2002:249-258.
[1] 马健, 樊建平, 刘峰, 李红辉. 面向对象软件系统演化模型[J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[2] 罗养霞, 郭晔. 基于数据依赖特征的软件识别[J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[3] 应欢, 王东辉, 武成岗, 王喆, 唐博文, 李建军. 适用于商用系统环境的低开销确定性重放技术[J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[4] 王念滨, 祝官文, 周连科, 王红卫. 支持高效路径查询的数据空间索引方法[J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[5] 特日跟, 江晟, 李雄飞, 李军. 基于整数数据的文档压缩编码方案[J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[6] 康辉, 王家琦, 梅芳. 基于Pi演算的并行编程语言[J]. 吉林大学学报(工学版), 2016, 46(1): 235-241.
[7] 陈鹏飞, 田地, 杨光. 基于MVC架构的LIBS软件设计与实现[J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[8] 刘磊, 王燕燕, 申春, 李玉祥, 刘雷. Bellman-Ford算法性能可移植的GPU并行优化[J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.
[9] 冯晓宁, 王卓, 张旭. 基于L-π演算的WSN路由协议形式化方法[J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[10] 李明哲, 王劲林, 陈晓, 陈君. 基于网络处理器的流媒体应用架构模型(VPL)[J]. 吉林大学学报(工学版), 2015, 45(5): 1572-1580.
[11] 王克朝, 王甜甜, 苏小红, 马培军. 基于频繁闭合序列模式挖掘的学生程序雷同检测[J]. 吉林大学学报(工学版), 2015, 45(4): 1260-1265.
[12] 黄宏涛,王静,叶海智,黄少滨. 基于惰性切片的线性时态逻辑性质验证[J]. 吉林大学学报(工学版), 2015, 45(1): 245-251.
[13] 范大娟1, 2, 黄志球1, 肖芳雄1, 祝义1, 王进1. 面向多服务交互的相容性分析与适配器生成[J]. 吉林大学学报(工学版), 2014, 44(4): 1094-1103.
[14] 贺秦禄1, 李战怀1, 王乐晓1, 王瑞2. 云存储系统聚合带宽测试技术[J]. 吉林大学学报(工学版), 2014, 44(4): 1104-1111.
[15] 康辉, 张双双, 梅芳. 一种递归π演算向Petri网的转换方法[J]. 吉林大学学报(工学版), 2014, 44(01): 142-148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘松山, 王庆年, 王伟华, 林鑫. 惯性质量对馈能悬架阻尼特性和幅频特性的影响[J]. 吉林大学学报(工学版), 2013, 43(03): 557 -563 .
[2] 初亮, 王彦波, 祁富伟, 张永生. 用于制动压力精确控制的进液阀控制方法[J]. 吉林大学学报(工学版), 2013, 43(03): 564 -570 .
[3] 李静, 王子涵, 余春贤, 韩佐悦, 孙博华. 硬件在环试验台整车状态跟随控制系统设计[J]. 吉林大学学报(工学版), 2013, 43(03): 577 -583 .
[4] 胡兴军, 李腾飞, 王靖宇, 杨博, 郭鹏, 廖磊. 尾板对重型载货汽车尾部流场的影响[J]. 吉林大学学报(工学版), 2013, 43(03): 595 -601 .
[5] 王同建, 陈晋市, 赵锋, 赵庆波, 刘昕晖, 袁华山. 全液压转向系统机液联合仿真及试验[J]. 吉林大学学报(工学版), 2013, 43(03): 607 -612 .
[6] 张春勤, 姜桂艳, 吴正言. 机动车出行者出发时间选择的影响因素[J]. 吉林大学学报(工学版), 2013, 43(03): 626 -632 .
[7] 马万经, 谢涵洲. 双停车线进口道主、预信号配时协调控制模型[J]. 吉林大学学报(工学版), 2013, 43(03): 633 -639 .
[8] 于德新, 仝倩, 杨兆升, 高鹏. 重大灾害条件下应急交通疏散时间预测模型[J]. 吉林大学学报(工学版), 2013, 43(03): 654 -658 .
[9] 王国林, 傅乃霁, 张建, 裴紫嵘. 基于K-R动力学模型的子午线轮胎硫化过程仿真[J]. 吉林大学学报(工学版), 2013, 43(03): 659 -664 .
[10] 肖赟, 雷俊卿, 张坤, 李忠三. 多级变幅疲劳荷载下预应力混凝土梁刚度退化[J]. 吉林大学学报(工学版), 2013, 43(03): 665 -670 .