J4

• 计算机科学 • Previous Articles     Next Articles

Realization of Focused Crawler Based on Page Segmentation

LI Xiaoya, HE Fengling, ZUO Wanli   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2006-11-17 Revised:1900-01-01 Online:2007-11-26 Published:2007-11-26
  • Contact: ZUO Wanli

Abstract: In the light of result returned currently by generalpurpose search engines being excessive, and having no strong similarity with the topic, this paper covers a technique of dividing the web page to chunks to implement a focused crawler. With this method, Crawler1, a prototype of a focused crawler has been realized. Experimental results indicate that Crawler1 has better performance. The number of topic web pages crawled by Crawler1 attains more than 55%.

Key words: topicspecific search, focused crawling, relevance analysis, page segmentation

CLC Number: 

  • TP311