Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (4): 1447-1453.doi: 10.13229/j.cnki.jdxbgxb20190963

Previous Articles    

Improvement of fuzzy c-harmonic mean algorithm on unbalanced data

Fu LIU1,2(),Yi-xin LIANG2,Tao HOU2,Yang SONG2,Bing KANG2,Yun LIU2()   

  1. 1.State Key Laboratory of Automotive Simulation and Control,Jilin University,Changchun 130022,China
    2.College of Communication Engineering,Jilin University,Changchun 130022,China
  • Received:2019-10-18 Online:2021-07-01 Published:2021-07-14
  • Contact: Yun LIU E-mail:liufu@jlu.edu.cn;liuyun313@jlu.edu.cn

Abstract:

A new fuzzy c-harmonic means clustering algorithm, which is based on cluster volumes constraint, is proposed in this paper to solve the problem of imperfect clustering performance of traditional algorithm for imbalanced data set. Firstly, a quantity is defined by the membership matrix to measure the volume of each cluster, which is then used to construct a new objective function by combining with that of traditional algorithm. Secondly, new membership matrix and cluster center formulas are obtained by minimizing this new objective function. The proposed algorithm was tested on the UCI data sets, simulated imbalanced data sets and actual machine vibration detection imbalanced data sets. Experimental results show that, compared with several peer algorithms, the proposed algorithm achieved good clustering performance for imbalanced data sets while maintaining the global optimal performance of the traditional one.

Key words: artificial intelligence, clustering, fuzzy c-harmonic mean algorithm, global optimality, unbalanced data

CLC Number: 

  • TP391

Fig.1

Spatial distribution of unbalanced data sets (a)-(f)"

Table 1

Specific information of UCI data set"

数据集

样本

数量

特征

数量

类的

个数

样本组成
Iris15042或350,50
50,50,50
Ionosphere351342126,225
Pima Indians Diabetes76882538,230
Breast Cancer1683102458,225
Glass21492163,51
Breast Cancer2569302212,357
振动数据集21548 3122159,56

Table 2

Five experimental results of proposed algorithm on 4 UCI datasets"

数据集cJopt本文算法
JENt
Iris2152.35152.370.01160.03
152.370.01140.01
152.370.01180.01
152.370.01140.01
152.370.01150.01
378.8579.120.33390.05
79.120.33300.02
79.120.33250.02
79.120.33370.03
79.120.33300.02
Ionosphere22419.42420.010.03330.09
2420.010.03440.12
2420.010.03360.11
2420.010.03300.06
2420.010.03530.14
Pima Indians Diabetes25.14e65.18e60.68430.19
5.18e60.68340.12
5.18e60.68430.15
5.18e60.68360.14
5.18e60.68350.13
Breast Cancer121.93e51.93e50.00130.05
1.93e50.00150.06
1.93e50.00180.06
1.93e50.00190.06
1.93e50.00160.05

Table 3

Global optimal performance comparison of GKM, MGKM and proposed algorithm"

数据集cJoptGKMMGKM本文算法
ENtENtENt
Iris2152.350.001.26e40.000.003.55e40.000.01130.01
378.850.011.78e40.000.006.34e40.000.33220.02
Ionosphere22.42e30.006.63e40.030.001.90e50.050.03160.03
Pima Indians Diabetes25.14e60.003.18e50.060.009.09e50.090.68350.12
Breast Cancer121.93e50.002.42e50.050.007.09e50.060.00120.04

Fig.2

Clustering results of FCHM algorithm on unbalanced data sets"

Fig.3

Clustering results of siibFCM algorithm on unbalanced data sets"

Fig.4

Clustering results of the proposed algorithm on unbalanced data sets"

Table 4

F-value and G-mean values of FCHM, siibFCM, and proposed algorithm"

模拟数据集FCHMsiibFCM本文算法

F-

value

G-mean

F-

value

G-mean

F-

value

G-mean
a0.86210.98390.96150.99600.99500.9950
b0.96390.99250.67000.89610.99750.9975
c0.97090.99100.87720.95710.99670.9967
d0.99130.99650.94670.97720.99370.9967
e0.99300.99650.97180.98540.99400.9970
f0.98930.99320.97480.98440.99330.9943
Glass0.79610.86580.78050.86590.86670.8745

Breast

Cancer2

0.71120.74290.87790.90300.90550.9161
Ionosphere0.63830.70990.62860.70250.77490.8278
1 Henao-Restrepo A M, Camacho A, Longini I M, et al. Efficacy and effectiveness of an rVSV-vectored vaccine in preventing Ebola virus disease: final results from the guinea ring vaccination open-label cluster-randomised trial[J]. Lancet, 2017, 389(10068): 505-518.
2 Huzurbazar S, Kuang D Y, Lee L. Landmark-based algorithms for group average and pattern recognition[J]. Pattern Recognition, 2019, 86: 172-187.
3 刘云, 刘富, 侯涛,等. 优化核参数的模糊C均值聚类算法[J]. 吉林大学学报:工学版, 2016, 46(1):246-251.
Liu Yun, Liu Fu, Hou Tao, et al. Kernel-based fuzzy C-means clustering method based on parameter optimization[J]. Journal of Jilin University(Engineering and Technology Edition), 2016, 46(1): 246-251.
4 Gao Z M, Wang L, Zhou L P. A probabilistic approach to cross-region matching-based image retrival[J]. IEEE Transactions on Image Processing, 2019, 28(3): 1191-1204.
5 Zhang B. Generalized k-harmonic Means-Boosting in unsupervised Learning[M]. Palo Alto: Hewlett, 2000.
6 Zhang B, Hsu M, Dayal U. K-harmonic means-a spatial clustering algorithm with boosting[C]∥International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining, Lyon, France, 2000, 12: 31-45.
7 Wu X H, Wu B, Sun J, et al. A hybrid fuzzy K-harmonic means clustering algorithm[J]. Applied Mathematical Modelling, 2015, 39(12): 3398-3409.
8 Jiang H, Yi S, Li J, et al. Ant clustering algorithm with K -harmonic means clustering[J]. Expert Systems with Applications, 2010, 37(12): 8679-8684.
9 汪中, 刘贵全, 陈恩红. 基于模糊K-harmonic means的谱聚类算法[J]. 智能系统学报, 2009,4(2): 95-99.
Wang Zhong, Liu Gui-quan, Chen En-hong, et al. A spectral clustering algorithm based on fuzzy K-harmonic means[J]. CAAI Tranctions on Intelligent Systems, 2009, 4(2): 95-99.
10 Yang F, Sun T, Zhang C. An efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization[J]. Expert Systems with Applications, 2009, 36(6): 9847-9852.
11 赵恒, 杨万海, 张高煜. 模糊K-Harmonic Means聚类算法[J]. 西安电子科技大学学报:自然科学版, 2005,32(4):603-606, 638.
Zhao Heng, Yang Wan-hai, Zhang Gao-yu, et al. Fuzzy K-Harmonic Means clustering algorithm[J]. Journal of Xidian University(Natural Science), 2005, 32(4): 603-606, 638.
12 Bensaid A M, Hall L O, Bezdek J C. Partially supervised clustering for image segmentation[J]. Pattern Recognition, 1996, 29(5): 859-871.
13 Noordam J C, van den Broek W H A M, Buydens L M C. Multivariate image segmentation with cluster size insensitive fuzzy C-means[J]. Chemometrics and Intelligent Laboratory Systems, 2002, 64(1): 65-78.
14 Liang J Y, Bai L, Dang C Y, et al. The K-Means-Type algorithms versus imbalanced data distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4): 728-745.
15 Likas A, Vlassis N, Verbeek J J. The global k-means clustering algorithm[J]. Pattern Recognition, 2002, 36(2): 451-461.
16 Bagirov A M. Modified global k-means algorithm for minimum sum-of-squares clustering problems[J]. Pattern Recognition, 2008, 41(10): 3192-3199.
17 Zhang Bin, Hsu M, Dayal U, et al. K-Harmonic means—a data clustering algorithm[J]. Hewlett Packard Research Laboratory Technical Report, 1999.
18 Kim D W, Lee K H, Lee D. On cluster validity index for estimation of the optimal number of fuzzy clusters[J]. Pattern Recognition, 2004, 37(10): 2009-2025.
19 Capitaine H L, Frelicot C. A cluster-validity index combining an overlap measure and a separation measure based on fuzzy-aggregation operators[J]. IEEE Transactions on Fuzzy Systems, 2011, 19(3): 580–588.
20 Zahid N, Limouri M, Essaid A. A new cluster-validity for fuzzy clustering[J]. Pattern Recognition, 1999, 32(7): 1089-1097.
21 Wu K L, Yang M S. A cluster validity index for fuzzy clustering[J]. Pattern Recognition Letters, 2005, 26(9): 1275-1291.
22 Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[J]. International Conference on Intelligent Computing, 2005, 3644(5): 878-887.
23 Su C T, Chen L S, Yih Y. Knowledge acquisition through information granulation for imbalanced data[J]. Expert Systems with Applications, 2006, 31(3): 531-541.
[1] Ya-hui ZHAO,Fei-yang YANG,Zhen-guo ZHANG,Rong-yi CUI. Korean text structure discovery based on reinforcement learning and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1387-1395.
[2] Yan-hua DONG,Jing-wei LIU,Jing-hua ZHAO,Liang LI,Fang-xi XIE. Real-time torque tracking control based on BPNN online learning prediction model [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1405-1413.
[3] Fu-hua SHANG,Mao-jun CAO,Cai-zhi WANG. Local outlier data mining based on artificial intelligence technology [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 692-696.
[4] Hai-ying ZHAO,Wei ZHOU,Xiao-gang HOU,Xiao-li ZHANG. Double-layer annotation of traditional costume images based on multi-task learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 293-302.
[5] Dan-tong OUYANG,Cong MA,Jing-pei LEI,Sha-sha FENG. Knowledge graph embedding with adaptive sampling [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(2): 685-691.
[6] Yi-bin LI,Jia-min GUO,Qin ZHANG. Methods and technologies of human gait recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(1): 1-18.
[7] Qiao-bin LIU,Wen-ku SHI,Zhi-yong CHEN,Lian-meng LUO,Zhi-yong SU,Kai-jun HUANG. Parameter estimation of mixed reliability model based on kernel density optimal grouping and gravity search algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 1818-1825.
[8] Jun-jun LI,Jian-nong CAO,Bei-bei CHENG,Juan LIAO,Ying-ying ZHU. High spatial resolution remote sensing imagery segmentation based on combination of pixels and multi⁃scaleobjects using spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2098-2108.
[9] Bin LI,Xu ZHOU,Fang MEI,Shuai-ning PAN. Location recommendation algorithm based on K-means and matrix factorization [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1653-1660.
[10] Qian XU,Ying LI,Gang WANG. Pedestrian-vehicle detection based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1661-1667.
[11] Yu-mei LIU,Ning-guo QIAO,Jiao-jiao ZHUANG,Peng-cheng LIU,Ting HU,Li-jun CHEN. Anomaly detection of rail vehicle gearbox based on multi⁃sensor data fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1465-1470.
[12] Wan-fu GAO,Ping ZHANG,Liang HU. Nonlinear feature selection method based on dynamic change of selected features [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1293-1300.
[13] Dan⁃tong OUYANG,Jun XIAO,Yu⁃xin YE. Distant supervision for relation extraction with weakconstraints of entity pairs [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(3): 912-919.
[14] LIU Zhong-min,WANG Yang,LI Zhan-ming,HU Wen-jin. Image segmentation algorithm based on SLIC and fast nearest neighbor region merging [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1931-1937.
[15] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!