J4 ›› 2011, Vol. 29 ›› Issue (5): 473-476.

• 论文 • 上一篇    下一篇

基于改进哈夫曼编码的全文索引结构压缩算法

阚君满   

  1. 吉林工商学院 计算机科学系|长春 130062
  • 出版日期:2011-09-24 发布日期:2011-11-29
  • 作者简介:阚君满(1973—)|男|长春人| 吉林工商学院副研究员|主要从事计算智能研究|(Tel)86-13159612001(E-mail)kanjm@163.com
  • 基金资助:

    吉林省教育厅科技发展规划基金资助项目(2008158)

Compressed Format Full-Text Index Based on Improved Huffman Code and Its Implement

 KAN Jun-Man   

  1. Department of Computer Science,Jilin Technology and Business College,Changchun 130062,China
  • Online:2011-09-24 Published:2011-11-29

摘要:

为解决全文索引的索引结构压缩问题,提出了文本的基于正规哈夫曼编码小波树形式,并将该结构与后缀数组结合,实现了基于正规哈夫曼编码的小波树和高效构造算法。实验结果表明,在不降低运行效率的前提下,存储空间得到有效的压缩,从而证明了改进方法的有效性。

关键词: 全文索引, 压缩, 正规哈夫曼编码

Abstract:

To solve the problem of index structures compression of full-text indexes,we introduce the canonical Huffman code to encode the BWT(Burrows-Wheeler Transform) of a text. In the end, we present an efficient construction algorithm for this index, which is on-line and linear.Experimental results show that, without reducing the efficiency, the effective storage space compression is gained, which improves the effectiveness of the method.

Key words: full-text indexes, compressed, canonical Huffman code

中图分类号: 

  • TP391