Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (5): 583-588.

Previous Articles     Next Articles

Spam Message Recognition Based on Self-Clustering and Self-Learning Algorithm

LI Gen 1 , WANG Kefeng 1 , BEN Weiguo 1 , SONG Wei 1 , LIU Hongru 2 , XU Yijin 2   

  1. 1. Jilin Province Branch, China United Network Communications Group Company Limited, Changchun 130021, China; 2. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2021-03-30 Online:2021-10-01 Published:2021-10-01

Abstract: The spam message senders continually try to modify spam content for cheating filter system, causing the recognition accuracy to decrease. Aiming at this problem, a new recognition method based on self-clustering and self-learning algorithm is presented. First, the spam relation chain is built by the minimum edit distance to realize self-clustering function using MeanShift algorithm is used on the chain. Second, the core of each cluster is computed, and the weight of each sample is computed by the distance from the cluster core. Then train the classifier by the samples with its weights. When the new spam is recognized by the classifier, it will be classified to a cluster. The core and sample weights of this cluster will be recomputed, and update the classifier to realize the self-learning function this process is repeated. Experiment results demonstrate that the new method can improve the recognition accuracy by 2. 51% ~ 5. 14% , and can keep the high accuracy for a long time.

Key words: edit distance, clustering, self-learning, spam message

CLC Number: 

  • TP393