吉林大学学报(工学版) ›› 2012, Vol. 42 ›› Issue (02): 463-468.

• 论文 • 上一篇    下一篇

一种新的分类器性能评估方法

李军1,2, 李雄飞1, 董元方1,3, 赵海英4   

  1. 1. 吉林大学 符号计算与知识工程教育部重点实验室, 长春 130012;
    2. 长春理工大学 数学系, 长春 130022;
    3. 长春理工大学 经济管理学院, 长春130022;
    4. 新疆师范大学 计算机科学技术学院, 乌鲁木齐 830000
  • 收稿日期:2011-02-11 出版日期:2012-03-01 发布日期:2012-03-01
  • 通讯作者: 李雄飞(1963-),男,教授,博士生导师.研究方向:数据挖掘.E-mail:lxf@jlu.edu.cn E-mail:lxf@jlu.edu.cn
  • 作者简介:李军(1974-),男,副教授.研究方向:数据挖掘,机器学习.E-mail:lijun.yq@163.com
  • 基金资助:

    "973"国家重点基础研究发展计划项目(2010CB334709);国家自然科学基金项目(60863010);吉林省科技发展计划项目(20090704).

New performance evaluation method for classifier

LI Jun1,2, LI Xiong-fei1, DONG Yuan-fang1,3, ZHAO Hai-ying4   

  1. 1. Key Laboratory of Symbolic Computation and Knowledge Engineering for Ministry of Education, Jilin University, Changchun 130012, China;
    2. Department of Mathematics, Changchun University of Science and Technology, Changchun 130022, China;
    3. School of Economics and Management, Changchun University of Science and Technology, Changchun 130022, China;
    4. School of Computer Science and Technology, Xinjiang Normal University, Wulumuqi, 830000
  • Received:2011-02-11 Online:2012-03-01 Published:2012-03-01

摘要: 针对类不平衡或类分布偏斜数据分类器性能评估问题,提出了一种不平衡数据分类器的性能评估方法——加权AUC(wAUC),为区分不同类别上的正确率对总体性能的不同贡献,在计算ROC曲线下方面积的加权值时,根据真正率TPrate的取值,对不同区域采用不同的权值,使得评估度量更关注于正类准确度。讨论了权值函数应具有的性质,给出了wAUC的性质分析。理论分析和实验结果表明,加权AUC优于OP和AUC。

关键词: 计算机软件与理论, 不平衡数据, 分类, 性能评估, AUC

Abstract: An imbalanced classifier performance evaluation method, weighted Area Under the Curve (wAUC), is proposed to solve the evaluation problem of imbalanced or class-skewed data classifiers. This method makes use of different weights in different regions according to the values of the True Positive rate (TPrate) to focus on the accuracy of positive class when calculating the weighted area under the Receiver Operating Characteristic (ROC) curve. It is beneficial to distinguish the different contributions of the accuracies on different classes to the overall performance. The features of the weight function are discussed and the characteristics of the wAUC are analyzed. Theoretical analysis and experimental results show that the proposed wAUC method is superior to OP and AUC methods.

Key words: computer software and theory, imbalanced data, classification, performance evaluation, AUC

中图分类号: 

  • TG181
[1] Chawla N V.Data Mining for Imbalanced Datasets: an Overview.Data Mining and Knowledge Discovery Handbook [M],Heidelberg: Springer,2010: 875-886.

[2] Fawcett T.An introduction to ROC analysis[J].Pattern Recognition Letters,2006,27(8): 861-874.

[3] Egan J P.Signal Detection Theory and ROC Analysis,Series in Cognition and Perception [M].New York:Academic Press,1975.

[4] Spackman K A.Signal detection theory: Valuable tools for evaluating inductive learning //Proc Sixth International Workshop on Machine Learning.Morgan Kaufman,San Mateo,CA,1989: 160-163.

[5] Japkowicz N,Stephen S.The class imbalance problem: a systematic study [J].Intelligent Data Analysis,2002,l6:40-49.

[6] Chawla N V,Japkowicz N,Kotcz A.Editorial: special issue on learning from imbalanced data sets[J].SIGKDD Exploration Newsletters,2004,6(1):1-6.

[7] Elazmeh W,Japkowicz N,Matwin S.Evaluating misclassifications in imbalanced data //Proc 17th European Conference on Machine Learning,2006: 126-137.

[8] Huang J,Ling C X.Using AUC and accuracy in evaluating learning algorithms[J].IEEE Trans on Knowledge and Data Engineering,2005,17:299-310.

[9] Daskalaki S,Kopanas I,Avouris N.Evaluation of classifiers for an uneven class distribution problem [J].Applied Artificial Intelligence,2006,20: 381-417.

[10] Kubat M,Matwin S.Adressing the curse of imbalanced training sets: one-sided selection //Proc 14th Intl Conf on Machine Learning Nashville,USA,1997:179-186.

[11] Weng C G,Poon J.A new evaluation measure for imbalanced datasets //Proc Seventh Australasian Data Mining Conference (AusDM 2008),Glenelg,Australia.2008:27-32.

[12] Ranawana R,Palade V.Optimized precision-anew measure for classifier performance evaluation //Proc IEEE Congress on Evolutionary Computation,2006:2254-2261.

[13] Efron B,Tibshirani R.An Introduction to the Bootstrap [M].Chapman and Hall,1993.
[1] 董飒, 刘大有, 欧阳若川, 朱允刚, 李丽娜. 引入二阶马尔可夫假设的逻辑回归异质性网络分类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1571-1577.
[2] 刘杰, 张平, 高万夫. 基于条件相关的特征选择方法[J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[3] 陈绵书, 苏越, 桑爱军, 李培鹏. 基于空间矢量模型的图像分类方法[J]. 吉林大学学报(工学版), 2018, 48(3): 943-951.
[4] 陈涛, 崔岳寒, 郭立民. 适用于单快拍的多重信号分类改进算法[J]. 吉林大学学报(工学版), 2018, 48(3): 952-956.
[5] 杨宏宇, 徐晋. Android恶意软件静态检测模型[J]. 吉林大学学报(工学版), 2018, 48(2): 564-570.
[6] 范敏, 韩琪, 王芬, 宿晓岚, 徐浩, 吴松麟. 基于多层次特征表示的场景图像分类算法[J]. 吉林大学学报(工学版), 2017, 47(6): 1909-1917.
[7] 才华, 陈广秋, 刘广文, 程帅, 于化东. 遮挡环境下多示例学习分块目标跟踪[J]. 吉林大学学报(工学版), 2017, 47(1): 281-287.
[8] 董立岩, 隋鹏, 孙鹏, 李永丽. 基于半监督学习的朴素贝叶斯分类新算法[J]. 吉林大学学报(工学版), 2016, 46(3): 884-889.
[9] 董飒, 刘大有, 李丽娜, 欧阳若川, 柴晓丽. 基于类传播分布的关系近邻异质性网络分类方法[J]. 吉林大学学报(工学版), 2016, 46(2): 522-527.
[10] 马安香, 张长胜, 张斌, 张晓红. 一种求解分类问题的自适应人工蜂群算法[J]. 吉林大学学报(工学版), 2016, 46(1): 252-258.
[11] 张浩, 刘海明, 吴春国, 张艳梅, 赵天明, 李寿涛. 基于多特征融合的绿色通道车辆检测判定[J]. 吉林大学学报(工学版), 2016, 46(1): 271-276.
[12] 李娟, 刘晓龙, 卢长刚, 左英泽. 改进的粒子滤波重采样算法[J]. 吉林大学学报(工学版), 2015, 45(6): 2069-2074.
[13] 辛宇, 杨静, 谢志强. 一种基于LDA的k话题增量训练算法[J]. 吉林大学学报(工学版), 2015, 45(4): 1242-1252.
[14] 赵东, 韩晓艳, 赵宏伟, 于繁华. 基于分类优化的物联网节点负载均衡策略[J]. 吉林大学学报(工学版), 2015, 45(3): 926-931.
[15] 司伟建, 李晓林. 基于压缩思想的低运算近场信号估计算法[J]. 吉林大学学报(工学版), 2015, 45(3): 991-997.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!