J4 ›› 2012, Vol. 50 ›› Issue (02): 320-322.

• 计算机科学 • 上一篇    下一篇

基于隐马尔可夫模型的转录因子文本挖掘算法

吴晓洲1, 万里明2, 韩霄松1, 梁艳春1, 吴春国1,3   

  1. 1. 吉林大学 计算机科学与技术学院, 符号计算与知识工程教育部重点实验室, 长春 130012;2. 中国人民解放军空军装备研究院 装备总体论证研究所, 北京 100076|3. 上海理工大学 管理学院, 上海 200093
  • 收稿日期:2011-12-29 出版日期:2012-03-26 发布日期:2012-03-21
  • 通讯作者: 吴春国 E-mail:wucg@jlu.edu.cn

An HMM Based Transcription Factor Name Mining Algorithm

WU Xiaozhou1, WAN Liming2, HAN Xiaosong1, LIANG Yanchun1, WU Chunguo1,3   

  1. 1. College of Computer Science and Technology, Key Laboratory for Symbol Computation and Knowledge Engineeringof National Education Ministry, Jilin University, Changchun 130012, China|2. Research Institute on General Development and Evaluation of Equipment, EAAF of PLA, Beijing 100076, China;3. School of Business, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2011-12-29 Online:2012-03-26 Published:2012-03-21
  • Contact: WU Chunguo E-mail:wucg@jlu.edu.cn

摘要:

提出一种基于隐马尔可夫模型的转录因子文本挖掘算法(HMM-TFM), 该方法通过建立转录因子名称的词库, 利用谓语筛选策略判断句子是否描述转
录因子, 使用隐马尔可夫模型预测单词词性, 并根据前后文单词词性识别转录因子的名称. 实验结果表明, HMM-TFM在英文文献中抽取转录因子名称的查全率和查准率分别可达74.2%和77.9%.

关键词: 隐马尔可夫模型; 转录因子; 文本挖掘; 启动子; 生物信息

Abstract:

A text mining algorithm named HMMTFM (hidden Markov model based transcription factor name mining) was presented. The proposed algorithm does not need a dictionary of transcription factor names. A small verb set is defined to filter sentences. Transcription factor names are mined according to the part of speech tagged by hidden Markov model. Experimental results show that the recall rate and precision of HMMTFM come to 74.2% and 77.9%, respectively.

Key words: hidden Markov model, transcription factor, text mining, promoter, bioinformatics

中图分类号: 

  • TP18