Journal of Jilin University(Engineering and Technology Edition) ›› 2018, Vol. 48 ›› Issue (6): 1851-1858.doi: 10.13229/j.cnki.jdxbgxb20170455

Previous Articles     Next Articles

Ensemble of unrestricted K-dependence Bayesian classifiers based on Markov blanket

WANG Li-min(),LIU Yang,SUN Ming-hui(),LI Mei-hui   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012,China
  • Received:2017-05-05 Online:2018-11-20 Published:2018-12-11

Abstract:

In order to promote the conditional dependence expression ability of K-dependence Bayesian Classifier, in this paper we relax the conditional independence between predictive features according to the basic idea of Markov blanket for feature extraction, and carry out structure learning based on greedy search strategy. We propose a macro model from the training set, and a micro model from each test sample. The decision is made by the ensemble of both models. We select data sets from UCI machine learning repository for cross validation. The experimental results demonstrate the rationality and effectiveness of the proposed algorithm in terms of 0-1 loss, bias and variance.

Key words: computer application, Bayesian network, Markov blanket, conditional independence, macro model, micro model

CLC Number: 

  • TP301

Fig.1

Example of KDB(K=2) classifier"

Fig.2

Example of Markov blanket structure for C"

Fig.3

Algorithm flow chart for UKDB"

Table 1

UCI datasets"

序号 数据集名称 样本 属性 类标签
个数 个数 个数
1 Abalone 4177 8 3
2 Adult 48842 14 2
3 Anneal 898 38 6
4 Covtype 581012 54 7
5 German 1000 20 2
6 Hypo 3772 29 4
7 Hypothyroid 3163 25 2
8 Letter-recog 20000 16 26
9 Localization 164860 5 11
10 Optdigits 5620 64 10
11 Page-blocks 5473 10 5
12 Pendigits 10992 16 10
13 Phoneme 5438 7 50
14 Pima-ind-diabetes 768 8 2
15 Segment 2310 19 7
16 Sign 12546 8 3
17 Splice-c4.5 3177 60 3
18 Wall-following 5456 24 4
19 Waveform 100000 21 3
20 Waveform-5000 5000 40 3

Table 2

Result of average 0-1 loss (20 datasets)"

数据集 NB TAN KDB UKDBG UKDBL UKDB
Abalone 0.47 0.46 0.47 0.46 0.45 0.45
Adult 0.16 0.14 0.14 0.14 0.13 0.13
Anneal 0.04 0.01 0.01 0.01 0.01 0.01
Covtype 0.32 0.25 0.14 0.14 0.27 0.15
German 0.25 0.27 0.29 0.30 0.27 0.26
Hypo 0.01 0.01 0.01 0.01 0.01 0.01
Hypothyroid 0.02 0.01 0.01 0.01 0.01 0.01
Letter-recog 0.25 0.13 0.10 0.09 0.13 0.08
Localization 0.50 0.36 0.30 0.30 0.33 0.29
Optdigits 0.08 0.04 0.04 0.04 0.05 0.03
Page-blocks 0.06 0.04 0.04 0.04 0.04 0.03
Pendigits 0.12 0.03 0.03 0.03 0.03 0.02
Phoneme 0.26 0.27 0.20 0.20 0.24 0.19
Pima-ind-diabetes 0.25 0.24 0.25 0.25 0.23 0.24
Segment 0.08 0.04 0.05 0.04 0.04 0.04
Sign 0.36 0.28 0.25 0.24 0.30 0.25
Splice-c4.5 0.04 0.05 0.09 0.09 0.07 0.05
Wall-following 0.11 0.06 0.04 0.05 0.04 0.04
Waveform 0.02 0.02 0.03 0.02 0.02 0.02
Waveform-5000 0.20 0.18 0.20 0.20 0.18 0.17

Table 3

Result of bias (15 datasets)"

数据集 NB TAN KDB UKDBG UKDBL UKDB
Adult 0.14 0.11 0.10 0.11 0.10 0.10
Covtype 0.23 0.23 0.22 0.23 0.23 0.22
Hypo 0.01 0.01 0.01 0.01 0.01 0.01
Hypothyroid 0.01 0.01 0.01 0.01 0.01 0.01
Letter-recog 0.23 0.18 0.17 0.17 0.18 0.16
Optdigits 0.07 0.03 0.03 0.03 0.04 0.03
Page-blocks 0.04 0.03 0.03 0.03 0.04 0.03
Pendigits 0.11 0.04 0.04 0.03 0.04 0.03
Phoneme 0.23 0.25 0.17 0.16 0.20 0.15
Segment 0.09 0.05 0.04 0.04 0.04 0.04
Sign 0.31 0.25 0.24 0.22 0.24 0.23
Splice-c4.5 0.03 0.04 0.04 0.10 0.05 0.06
Wall-following 0.10 0.06 0.05 0.04 0.03 0.03
Waveform 0.03 0.01 0.02 0.02 0.02 0.02
Waveform-5000 0.16 0.12 0.12 0.11 0.13 0.11

Table 4

Result of variance (15 datasets)"

数据集 NB TAN KDB UKDBG UKDBL UKDB
Adult 0.03 0.07 0.06 0.08 0.05 0.05
Covtype 0.12 0.17 0.17 0.16 0.19 0.16
Hypo 0.01 0.01 0.01 0.01 0.01 0.01
Hypothyroid 0.00 0.00 0.00 0.00 0.00 0.00
Letter-recog 0.14 0.17 0.16 0.17 0.18 0.16
Optdigits 0.03 0.03 0.03 0.03 0.04 0.03
Page-blocks 0.01 0.01 0.02 0.02 0.01 0.01
Pendigits 0.03 0.04 0.04 0.05 0.04 0.04
Phoneme 0.18 0.25 0.17 0.13 0.19 0.14
Segment 0.02 0.02 0.02 0.03 0.03 0.02
Sign 0.08 0.08 0.11 0.09 0.10 0.09
Splice-c4.5 0.01 0.03 0.03 0.08 0.07 0.06
Wall-following 0.03 0.04 0.05 0.05 0.03 0.03
Waveform 0.00 0.01 0.01 0.01 0.01 0.01
Waveform-5000 0.04 0.08 0.08 0.10 0.07 0.07

Table 5

W/D/L comparison result of average 0-1 loss (20 datasets)"

W/D/L NB TAN KDB UKDBG UKDBL
TAN 14/4/2 - - - -
KDB 14/3/3 10/4/6 - - -
UKDBG 14/3/3 11/4/5 6/12/2 - -
UKDBL 16/2/2 6/7/7 6/7/7 7/5/8 -
UKDB 16/3/1 15/4/1 15/4/1 15/5/0 14/6/0

Table 6

W/D/L comparison result of bias (15 datasets)"

W/D/L NB TAN KDB UKDBG UKDBL
TAN 11/2/2 - - - -
KDB 13/1/1 5/8/2 - - -
UKDBG 13/1/1 10/3/2 9/3/3 - -
UKDBL 12/2/1 6/4/5 4/5/6 4/3/8 -
UKDB 14/0/1 11/1/3 9/4/2 7/5/3 8/6/1

Table 7

W/D/L comparison result of variance (15 datasets)"

W/D/L NB TAN KDB UKDBG UKDBL
TAN 0/1/14 - - - -
KDB 1/1/13 4/8/3 - - -
UKDBG 1/1/13 3/1/11 2/5/8 - -
UKDBL 0/2/13 6/2/7 6/1/8 8/3/4 -
UKDB 1/3/11 9/3/3 10/3/2 11/4/0 10/3/2

Fig.4

Friedman test rank for different evaluation function"

Table 8

Orders of time complexity for different BNCs"

模型 训练时间 测试时间
NB O(tn) O(cn)
TAN O(tn2+c(nv)2+n2logn) O(cn)
KDB O(tn2+c(nv)2+tnk) O(cnk)
UKDBG O(tn2+c(nv)2+tn2k) O(cnk)
UKDBL O(tn+cnv+tn2k) O(cnk)
UKDB O(tn2+c(nv)2+tn2k+tn+cnv+tn2k) O(cnk)

Fig.5

Training time comparisons for KDB and UKDB in terms of different values of K"

[1] Das M, Ghosh S, Gupta P , et al. Forward: a model for forecasting reservoir water dynamics using spatial Bayesian network (SpaBN)[J]. IEEE Transactions on Knowledge and Data Engineering, 2017,29(4):842-855.
doi: 10.1109/TKDE.2016.2647240
[2] Kang Z, Yang J, Zhong R . A Bayesian-network-based classification method integrating airborne LiDAR data with optical images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017,10(4):1651-1661.
doi: 10.1109/JSTARS.2016.2628775
[3] Fan X B, Li X . Network tomography via sparse Bayesian learning[J]. IEEE Communications Letters, 2017,21(4):781-784.
doi: 10.1109/LCOMM.2017.2649494
[4] Chen S L, Martinez A M, Webb G I , et al. Sample-based attribute selective AnDE for large data[J]. IEEE Transactions on Knowledge and Data Engineering, 2017,29(1):172-185.
doi: 10.1109/TKDE.2016.2608881
[5] Domingos P, Pazzani M . On the optimality of the simple bayesian classifier under zero-one loss[J]. Machine Learning, 1997,29(2):103-130.
doi: 10.1023/A:1007413511361
[6] Hand D J, Yu K . Idiot's Bayes—not so stupid after all?[J]. International Statistical Review, 2001,69(3):385-398.
[7] Friedman N, Dan G, Goldszmidt M . Bayesian network classifiers[J]. Machine Learning, 1997,29(2):131-163.
doi: 10.1023/A:1007465528199
[8] Jiang L X, Cai Z H, Wang D H , et al. Improving tree augmented naive Bayes for class probability estimation[J]. Knowledge-Based Systems, 2012,26:239-245.
doi: 10.1016/j.knosys.2011.08.010
[9] Sahami M . Learning limited dependence Bayesian classifiers[EB/OL].[2017-04-25]..
[10] Wang L, Zhao H, Sun M , et al. General and local: averaged k-dependence bayesian classifiers[J]. Entropy, 2015,17(6):4134-4154.
doi: 10.3390/e17064134
[11] Jiang Liang-xiao, Li Chao-qun, Wang Sha-sha , et al. Deep feature weighting for naive Bayes and its application to text classification[J]. Engineering Applications of Artificial Intelligence, 2016,52:26-39.
doi: 10.1016/j.engappai.2016.02.002
[12] Meehan A, de Campos C P . Averaged extended tree augmented naive classifier[J]. Entropy, 2015,17(7):5085-5100.
doi: 10.3390/e17075085
[13] Martinez A M, Webb G I, Chen S , et al. Scalable learning of Bayesian network classifiers[J]. Journal of Machine Learning Research, 2016,17:1-35.
[14] Wang S C, Xu G L, Du R J . Restricted Bayesian classification networks[J]. Science China Information Sciences, 2013,56(7):1-15.
[15] Vergara J R, Estévez P A . A review of feature selection methods based on mutual information[J]. Neural Computing and Applications, 2014,24(1):175-186.
doi: 10.1007/s00521-013-1368-0
[16] Koller D, Sahami M. Toward optimal attribute selection [C]//In Proceedings of the 13th International Conference on Machine Learning,Bari, Italy, 1996: 284-292.
[17] Aliferis C F, Statnikov A, Tsamardinos I , et al. Local causal and markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation[J]. Journal of Machine Learning Research, 2010,11:171-234.
doi: 10.1007/s11430-007-0106-9
[18] Sechidis K, Brown G . Markov blanket discovery in positive-unlabelled and semi-supervised data[DB/OL].[2017-04-26]..
[19] Wang L M, Yuan S M . Induction of hybrid decision tree based on post-discretization strategy[J]. Progress in Natural Science, 2004,14(6):541-545.
doi: 10.1080/10020070412331343911
[20] MacKay D J . Information Theory, Inference, and Learning Algorithms[M]. Cambridge: Cambridge University Press, 2003.
[21] Breiman L . Bagging predictors[J]. Machine Learning, 1996,24(2):123-140.
[22] Friedman J, Hastie T, Tibshirani R . The Elements of Statistical Learning[M]. Berlin: Springer, 2009.
[23] Fayyad U M, Irani K B . Multi-interval discretization of continuous-valued attributes for classification learning[DB/OL].[2017-04-28]..
[24] Hu B, Rakthanmanon T, Hao Y , et al. Towards Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL[M]. Berlin: Springer Berlin Heidelberg, 2013.
[25] Demsar J . Statistical comparisons of classifiers over multiple data sets[J]. Journal of Machine Learning Research, 2006,7(1):1-30.
doi: 10.1007/s10846-005-9016-2
[26] Ishitaki T, Oda T, Barolli L. A neural network based user identification for tor networks: data analysis using friedman test [C]//30th International Conference on Advanced Information Networking and Applications Workshops,Crans-Montana, Switzerland, 2016: 16022267.
[1] Nan WANG,Jin⁃bao LI,Yong LIU,Yu⁃jie ZHANG,Ying⁃li ZHONG. TPR⁃TF: time⁃aware point of interest recommendation model based on tensor factorization [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(3): 920-933.
[2] LIU Fu,ZONG Yu-xuan,KANG Bing,ZHANG Yi-meng,LIN Cai-xia,ZHAO Hong-wei. Dorsal hand vein recognition system based on optimized texture features [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1844-1850.
[3] JIN Shun-fu,WANG Bao-shuai,HAO Shan-shan,JIA Xiao-guang,HUO Zhan-qiang. Synchronous sleeping based energy saving strategy of reservation virtual machines in cloud data centers and its performance research [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1859-1866.
[4] ZHAO Dong,SUN Ming-yu,ZHU Jin-long,YU Fan-hua,LIU Guang-jie,CHEN Hui-ling. Improved moth-flame optimization method based on combination of particle swarm optimization and simplex method [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1867-1872.
[5] LIU En-ze,WU Wen-fu. Agricultural surface multiple feature decision fusion disease judgment algorithm based on machine vision [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1873-1878.
[6] OUYANG Dan-tong, FAN Qi. Clause-level context-aware open information extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1563-1570.
[7] LIU Fu, LAN Xu-teng, HOU Tao, KANG Bing, LIU Yun, LIN Cai-xia. Metagenomic clustering method based on k-mer frequency optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1593-1599.
[8] CHE Xiang-jiu, WANG Li, GUO Xiao-xin. Improved boundary detection based on multi-scale cues fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1621-1628.
[9] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[10] LIU Yuan-ning, LIU Shuai, ZHU Xiao-dong, CHEN Yi-hao, ZHENG Shao-ge, SHEN Chun-zhuang. LOG operator and adaptive optimization Gabor filtering for iris recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1606-1613.
[11] SUN Xiao-ying, HU Ze-zheng, YANG Jin-peng. Assessment method of electromagnetic pulse sensitivity of vehicle engine system based on hierarchical Bayesian networks [J]. 吉林大学学报(工学版), 2018, 48(4): 1254-1264.
[12] CAO Jie, SU Zhe, LI Xiao-xu. Image annotation method based on Corr-LDA model [J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
[13] ZHAO Hong-wei, LIU Yu-qi, DONG Li-yan, WANG Yu, LIU Pei. Dynamic route optimization algorithm based on hybrid in ITS [J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[14] HUANG Hui, FENG Xi-an, WEI Yan, XU Chi, CHEN Hui-ling. An intelligent system based on enhanced kernel extreme learning machine for choosing the second major [J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[15] FU Wen-bo, ZHANG Jie, CHEN Yong-le. Network topology discovery algorithm against routing spoofing attack in Internet of things [J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] Liu Qing-min,Wang Long-shan,Chen Xiang-wei,Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] Li Yue-ying,Liu Yong-bing,Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .
[4] Nie Jian-jun,Du Fa-rong,Gao Feng . Finite time thermodynamics of real combined power cycle operating
between internal combustion engine and Stirling engine with heat leak
[J]. 吉林大学学报(工学版), 2007, 37(03): 518 -0523 .
[5] Liu Yao-dong, Li Guang-yu, Lian Jian-she. Effect of Al-doping on electrical and optical properties of ZnO films[J]. 吉林大学学报(工学版), 2007, 37(03): 495 -0498 .
[6] Du Li-jie,Zhao Xiao-hui,Luo Si-wei . Improved synchronization iterative algorithm based on cyclic prefix in OFDM system[J]. 吉林大学学报(工学版), 2007, 37(01): 177 -181 .
[7] Shen Chuan-liang,Liu Guo-jun,Dong Jing-shi,Yang Zhi-gang,Cheng Guang-ming . Piezoelectric multi-vibrator and single-chamber pump for precise drug delivery[J]. 吉林大学学报(工学版), 2007, 37(01): 89 -94 .
[8] Peng Ya-ping1,Guo Ying-nan, Huang Wei-jun,Tan Man-zhi,Dong Lei,Wang Zhi-wei . Cyclebycycle variation of ethanol homogeneous
charge compression ignition combustion
[J]. 吉林大学学报(工学版), 2007, 37(02): 301 -0306 .
[9] Yang Wen, Shi Yong-jiu, Wang Yuan-qing, Shi Gang . Threedimensional finite element analysis on welding residual
stresses of construction steel
[J]. 吉林大学学报(工学版), 2007, 37(02): 347 -0352 .
[10] Duan Fu-qing,Zhou Ming-quan,Zhang Jia-cai . Non-linear scale space filtering based on mean shift[J]. 吉林大学学报(工学版), 2007, 37(03): 634 -0639 .