J4

• 计算机科学 • 上一篇    下一篇

用有向图法解决网页爬行中循环链接问题

赫枫龄, 左万利   

  1. 吉林大学计算机科学与技术学院, 长春 130012
  • 收稿日期:2003-12-17 修回日期:1900-01-01 出版日期:2004-07-26 发布日期:2004-07-26
  • 通讯作者: 赫枫龄

Solve the cycle links problem in Internet crawlingby directe d graph

HE Feng-ling, ZUO Wan-Li   

  1. College of Computer Science and Technology, Jilin University, Chan gchun 130012, China
  • Received:2003-12-17 Revised:1900-01-01 Online:2004-07-26 Published:2004-07-26
  • Contact: HE Feng-ling

摘要: 提出网页构成的有向回路问题, 描述了由网页构成有向图的形式定义, 并给出了用有向图法发现网页构成的有向回路算法. 所给定的算法能使网页爬行器避免掉入由已爬行过的网页构成的有向回路陷阱.

关键词: 爬行器, 网络搜索引擎, 超链接, 有向图

Abstract: The present paper deals with the technique how to solve the problem of cycle links in internet crawling by directed graph. First, the problem is proposed. Then, the formal definition of cycle links in internet crawling is described. Finally, the algorithm to solve the problem by directed graph is given. The key problem to a crawler is how to find directed loops effectively in web pages crawled by the crawler. The algorithm described in this paper can make the crawler avoid dropping in the pitfall created by cycle links.

Key words: crawler, internet search engine, hyperlink, directed graph

中图分类号: 

  • TP393.09