基于特征和HMM的信息提取

J4 ›› 2009, Vol. 27 ›› Issue (04): 396-.

Previous Articles Next Articles

Information Extraction Based on Character Extraction and HMM

JI Xiang,LIU Hua-xiao,WU Fen-fen,LIU Lei

College of Computer Science and Technology,Jilin University,Changchun 130012,China

Online:2009-07-20 Published:2009-08-27

Abstract

Abstract:

An improved HMM(Hidden Markov Models) was proposed for text information extraction by utilizing the semanteme characteristic and structure characteristic of the text to make certain the states with characteristic. We carry on extracting the remainder states having no characteristic with the improved HMM. It can solve the problem which the recall rate and the precision rate are not high in information extraction.We have tested 100 pieces of headers of computer science paper of the data provided by the search-engine research group from CMU(Carnegie Mellon Univerisity) of USA.The result shows that the recall and precision rate are all improved compared with existing methods which are based on words and traditional HMM.Recall rate and precision rate are 91.99％and 94.79％.

Key words: text block, characterextraction, machine learning, hidden markov models(HMM)

CLC Number:

JI Xiang,LIU Hua-xiao,WU Fen-fen,LIU Lei. Information Extraction Based on Character Extraction and HMM[J].J4, 2009, 27(04): 396-.

Information Extraction Based on Character Extraction and HMM

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Metrics

Comments

Recommended 0