J4 ›› 2010, Vol. 48 ›› Issue (1): 79-84.

• 计算机科学 • 上一篇    下一篇

一种无监督文本特征计算模型

王小芳1, 王瑞芳2, 张树功1   

  1. 1. 吉林大学 数学研究所, 长春 130012; 2. 大连大学 信息工程学院, 辽宁 大连 116622
  • 收稿日期:2009-02-27 出版日期:2010-01-26 发布日期:2010-01-27
  • 通讯作者: 张树功 E-mail:sgzh@mail.jlu.edu.cn.

An Effective Unsupervised Feature Computing Model

WANG Xiaofang1, WANG Ruifang2, ZHANG Shugong1   

  1. 1. Institute of Mathematics, Jilin University, Changchun 130012, China;2. College of Information Engineering, Dalian University, Dalian 116622, Liaoning Province, China
  • Received:2009-02-27 Online:2010-01-26 Published:2010-01-27
  • Contact: ZHANG Shugong E-mail:sgzh@mail.jlu.edu.cn.

摘要:

提出一种基于语义显量子勾连模型和潜量子共现模型的无监督特征提取方法, 解决了当前文本聚类不支持增量式和分布式计算的问题, 为后续互联网环境下海量文本聚类、 单文本摘要以及多文本摘要的发展奠定了基础. 实验结果表明, 该模型无需领域知识库的支持, 在移走约96%的冗余信息后仍能保持较好的聚类效果.

关键词: 无监督, 特征提取, 勾连模型, 窗函数

Abstract:

This paper presents a new unsupervised feature extraction method based on the obvious quantum entangled model and the latent quantum co-occurrence model to solve the problems that current text clustering methods don’t support incremental clustering and distributed computing, which is the foundation for the text clustering in Internet environment and single and multitext summary. The model without the support of domain knowledge maintains a good information clustering effect after moving ca 96% of the redundant features.Theory analysis and numerical experiments show that this model is effective.

Key words: unsupervised, feature selection, entangling model, window function

中图分类号: 

  • TP391