J4 ›› 2010, Vol. 48 ›› Issue (05): 811-816.

Previous Articles     Next Articles

A Semantic Similarity Computing Approach Based onWordNet and Corpus Statistics

ZHANG Dongna, ZHOU Chunguang, LIU Yanbin, GUO Dongwei   

  1. College of Computer Science and Technology, Jilin University, Chan
    gchun 130012, China
  • Received:2009-12-15 Online:2010-09-26 Published:2010-09-21
  • Contact: GUO Dongwei E-mail:guodw@jlu.edu.cn

Abstract:

We first proposed a new method calculating semantic similarity parameter information content. The new algorithm is based on the concept semantic information in the knowledge base called WordNet and the probability in the corpus called selfinformation. Then, considering the existing algorithms are all domainrelated and the calculating processes are complicated, we  proposed a universal method based on corpus statistics and WordNet  calculating semantic similarity which can be used in information extraction, information retrieval, document clustering and ontology learning. The  proposed method makes a substantial improvement experimenting on the benchmark data setR&B concept pairs.

Key words: semantic similarity of concepts, Brown corpus, information content method

CLC Number: 

  • TP391.1