Journal of Jilin University Science Edition ›› 2023, Vol. 61 ›› Issue (3): 631-640.

Previous Articles     Next Articles

Feature Selection and Text Clustering Algorithm Based on Binary Mayfly Optimization

GAO Xincheng1, ZHOU Zhongyu2, WANG Lili2, SHAO Guoming2, ZHANG Qiang2   

  1. 1. Modern Education Technique Center, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China; 
    2. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China
  • Received:2022-03-07 Online:2023-05-26 Published:2023-05-26

Abstract: Aiming at the problem of low clustering accuracy caused by redundant text features, we proposed a feature selection and text clustering algorithm based on binary mayfly optimization. Firstly, we improved the strategy of location update, mating, and mutation of the traditional mayfly algorithm.  Secondly, we  combined it with a feature selection model to select text features using the inverse document frequency as the objective function. Finally,  on the basis of new feature subset, K-means++ algorithm was used to cluster text and obtain the optimal text clustering results. The results of experiments conducted on multiple datasets show that the proposed algorithm can effectively shorten the feature dimension and improve the efficiency of text clustering.

Key words: binary mayfly algorithm, text clustering, convergence rate, feature selection

CLC Number: 

  • TP393