J4

• 计算机 • Previous Articles     Next Articles

Importance of Text about Table Elements Used in Focused Crawling

HUANG Fengyun, WANG Hui, ZUO Wanli   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2006-07-06 Revised:1900-01-01 Online:2007-05-26 Published:2007-05-26
  • Contact: WANG Hui

Abstract: In this paper, some experiments are conducted to analyze and verify the importance of table elements which lied in a Web page. In contr ast with Web pages, table elements can provide a large quantity of information ( beyond eighty percent) which is relevant to users’ information need. This conclusion can be utilizedto parse Web pages in the domain of focused crawling. After getting rid of elements except tables and headers, the efficiency of a focused crawler can be augmented distinctly and substantially.

Key words: focused crawling, URL, TFIDF, similarity

CLC Number: 

  • TP31