J4 ›› 2012, Vol. 50 ›› Issue (06): 1199-1203.

• 计算机科学 • 上一篇    下一篇

一种改进的基于树路径匹配的网页结构相似度算法

廖浩伟, 杨燕, 贾真, 尹红风   

  1. 西南交通大学 信息科学与技术学院, 成都 610031
  • 收稿日期:2012-05-21 出版日期:2012-11-26 发布日期:2012-11-26
  • 通讯作者: 杨燕 E-mail:yyang@swjtu.edu.cn

An Improved Web Structure Similarity Based on MatchingAlgorithm of Tree Paths

LIAO Haowei, YANG Yan, JIA Zhen, YIN Hongfeng   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
  • Received:2012-05-21 Online:2012-11-26 Published:2012-11-26
  • Contact: YANG Yan E-mail:yyang@swjtu.edu.cn

摘要:

提出一种改进的基于树路径匹配的网页结构相似度算法, 该算法定义了树路径的序列相似度和位置相似度, 找出网页的树路径集合, 通过网页间的最佳树路径匹配计算结构相似度. 实验结果表明, 用改进后的算法计算网页结构相似度比传统树路径匹配方法更符合实际, 更合理有效.

关键词: 网页结构相似度, 序列相似度, 位置相似度

Abstract:

An improved algorithm of Web structure similarity based on tree path matching was proposed, which defines the sequence similarity and position similarity of the tree path, finds out all the Web tree paths, and calculates the structural similarity by best tree path matching between two Web pages. Experiments show that the proposed algorithm to calculate the Web structure similarity is more realistic and effective than the original algorithm.

Key words: Web structure similarity, sequence similarity, position similarity

中图分类号: 

  • TP391