J4 ›› 2009, Vol. 27 ›› Issue (04): 396-.
Previous Articles Next Articles
JI Xiang,LIU Hua-xiao,WU Fen-fen,LIU Lei
Online:
Published:
Abstract:
An improved HMM(Hidden Markov Models) was proposed for text information extraction by utilizing the semanteme characteristic and structure characteristic of the text to make certain the states with characteristic. We carry on extracting the remainder states having no characteristic with the improved HMM. It can solve the problem which the recall rate and the precision rate are not high in information extraction.We have tested 100 pieces of headers of computer science paper of the data provided by the search-engine research group from CMU(Carnegie Mellon Univerisity) of USA.The result shows that the recall and precision rate are all improved compared with existing methods which are based on words and traditional HMM.Recall rate and precision rate are 91.99%and 94.79%.
Key words: text block, characterextraction, machine learning, hidden markov models(HMM)
CLC Number:
JI Xiang,LIU Hua-xiao,WU Fen-fen,LIU Lei. Information Extraction Based on Character Extraction and HMM[J].J4, 2009, 27(04): 396-.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://xuebao.jlu.edu.cn/xxb/EN/
http://xuebao.jlu.edu.cn/xxb/EN/Y2009/V27/I04/396
Cited