Journal of Jilin University (Information Science Edition) ›› 2025, Vol. 43 ›› Issue (4): 844-850.
Previous Articles Next Articles
ZHANG Lingyun
Received:
Online:
Published:
Abstract: Due to the fact that resource filtering and deduplication are essential steps in ensuring the efficient operation of digital libraries, the process is susceptible to interference from redundant data, resource types, and differences in customer groups. Therefore, a digital information resource filtering and deduplication method based on GRNN algorithm is proposed. Firstly, the GRNN(General Regression Neural Network) algorithm is used to detect outliers in digital information resources, and the outliers are filtered through PSO-LSSVM(Purticle Swarm Optimization-Least Squares Support Vector Machine) to avoid interference from outlier data in the deduplication process. Then, a locally sensitive hash algorithm is used to convert the resource data into binary hash codes, and the filtering and deduplication of digital information resources are completed by detecting the Hamming distance similarity between hash codes. The experimental results show that this method takes short time and has high precision and rate of deduplication.
Key words: generalized regression neural network, filtering of abnormal data, binary hash code, hash function, hamming distance
CLC Number:
ZHANG Lingyun. Digital Information Resource Filtering and Deduplication Method Based on GRNN Algorithm[J].Journal of Jilin University (Information Science Edition), 2025, 43(4): 844-850.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://xuebao.jlu.edu.cn/xxb/EN/
http://xuebao.jlu.edu.cn/xxb/EN/Y2025/V43/I4/844
Cited