J4 ›› 2011, Vol. 49 ›› Issue (03): 487-492.

• 计算机科学 • 上一篇    下一篇

基于模型匹配的Deep Web数据库分类

郭东伟, 李三义, 张仲明, 刘淼   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2010-01-24 出版日期:2011-05-26 发布日期:2011-06-15
  • 通讯作者: 张仲明 E-mail:zhangzm@jlu.edu.cn

Classification of Deep Web Based on Model Matching

GUO Dongwei, LI Sanyi, ZHANG Zhongming, LIU Miao   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2010-01-24 Online:2011-05-26 Published:2011-06-15
  • Contact: ZHANG Zhongming E-mail:zhangzm@jlu.edu.cn

摘要:

提出一种基于模型匹配的深网(Deep Web)在线专业数据库查询接口特征抽取方法, 该方法通过分析网页结构中特征词的深度自动抽取查询接口特征向量, 同时考虑频度和集中度两种因素定义特征词向量空间中的权值, 并在传统向量模型的基础上加入特征词个数作为一个新的分量, 构建一个数据库查询接口, 使用模型匹配的分类方法对其进行分类. 实验验证了该方法的有效性.

关键词: 深网, 数据集成, 模型匹配

Abstract:

The present paper presents a new method of information extraction from the Deep Web based on model matching. It extracts the characteristic vector of the Deep Web query interface by means of analysising the depth of feature of web page structure automatically. The frequency and concentration rate are both considered when the weight in vector space model is defined. The characteristic word vector is used to construct the database query interface with the number of characteristic word taken into account. At last, model matching is used to classify different databases. This method is validated by experiment results.

Key words: Deep Web, data integration, model matching

中图分类号: 

  • TP391.1