J4

• 计算机科学 • 上一篇    下一篇

一种增量倒排索引结构的设计与实现

王 冬, 左万利, 赫枫龄, 彭 涛, 张长利   

  1. 吉林大学 计算机科学与技术学院, 长春130012; 吉林大学 符号计算与知识工程教育部重点实验室, 长春130012
  • 收稿日期:2007-01-05 修回日期:1900-01-01 出版日期:2007-11-26 发布日期:2007-11-26
  • 通讯作者: 王 冬

esign and Implementation of an IncrementalInverted Index Framework

WANG Dong, ZUO Wanli, HE Fengling, PENG Tao, ZHANG Changli   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory ofSymbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Received:2007-01-05 Revised:1900-01-01 Online:2007-11-26 Published:2007-11-26
  • Contact: WANG Dong

摘要: 针对主题爬行器获取网页更新速度快的特点, 提出一种用于网络搜索引擎的增量索引结构. 在建立倒排索引时, 每个词项的记录表以链接块的形式存放于倒排索引文件中, 每次新分配的块大小递增. 该索引结构解决了倒排索引连续存储所带来的难以更新问题. 实验结果表明, 与支持实时更新的传统链表式存储方式相比, 这种索引结构能提供更高效的检索, 采用以空间换时间的方法有效地提高了索引的更新效率.

关键词: 主题式搜索引擎, 增量倒排索引, 实时更新

Abstract: In the present paper is proposesd an incremental index structure used in web search engine in orderto deal with the high update frequency of the web pages crawled by domainspecific crawler. In the inverted index structure, the posting list of each term is partitioned into linked blocks, whose sizes form an arithmeticalseries. The incremental index structure resolves the problem of document update, which is expensive in inverted index of continuous storage, and experimental results show that it provides much higher retrieval efficiency than naive linked list structure, which also supports realtime update. The spacefortime approach effectively raises the update rate of index.

Key words: domainspecific search engine, incremental inverted index, realtime update

中图分类号: 

  • TP393.09