Journal of Jilin University (Information Science Edition) ›› 2025, Vol. 43 ›› Issue (4): 844-850.

Previous Articles     Next Articles

Digital Information Resource Filtering and Deduplication Method Based on GRNN Algorithm

 ZHANG Lingyun   

  1. Library, Shaanxi Xueqian Normal University, Xi’an 710100, China
  • Received:2023-08-22 Online:2025-08-15 Published:2025-08-15

Abstract: Due to the fact that resource filtering and deduplication are essential steps in ensuring the efficient operation of digital libraries, the process is susceptible to interference from redundant data, resource types, and differences in customer groups. Therefore, a digital information resource filtering and deduplication method based on GRNN algorithm is proposed. Firstly, the GRNN(General Regression Neural Network) algorithm is used to detect outliers in digital information resources, and the outliers are filtered through PSO-LSSVM(Purticle Swarm Optimization-Least Squares Support Vector Machine) to avoid interference from outlier data in the deduplication process. Then, a locally sensitive hash algorithm is used to convert the resource data into binary hash codes, and the filtering and deduplication of digital information resources are completed by detecting the Hamming distance similarity between hash codes. The experimental results show that this method takes short time and has high precision and rate of deduplication.

Key words: generalized regression neural network, filtering of abnormal data, binary hash code, hash function, hamming distance

CLC Number: 

  • TP391