吉林大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 1843-1848.doi: 10.13229/j.cnki.jdxbgxb201406047

Previous Articles     Next Articles

Named entity recognition in Chinese medical records based on cascaded conditional random field

YAN Yang1, 2, WEN Dun-wei3, WANG Yun-ji1, WANG Ke1   

  1. 1.College of Communication Engineering, Jilin University, Changchun 130012, China;
    2.College of Computer Science and Engineering, Changchun Normal University of Technology, Changchun 130032, China;
    3.School of Computing and Information Systems, Athabasca University, Athabasca, Alberta T9S3A3, Canada
  • Received:2013-08-12 Online:2014-11-01 Published:2014-11-01

Abstract: A new method for named entity recognition in Chinese medical records based on cascaded Conditional Random Fields (CRFs) is proposed. The first layer of the cascaded CRFs is used to identify the basic named entities of body parts and diseases. Then, the identified results are fed to the second layer for recognition of nested named entities for complex diseases and clinical symptoms. A new combination feature, composed of part-of-speech features and named entity features, is defined. This new feature together with the character features, word boundary features and context features in a sentence are taken as the feature set of the second layer. In the experiments based on CRF++, the proposed method yields a 3% higher F-score than cascaded CRF without the combination feature. Moreover, compared to single layer CRF method, it yields a 7% higher F-score, a significant increase in overall performance.

Key words: information processing, conditional random field, cascaded conditional random field, Chinese medical records, named entity recognition

CLC Number: 

  • TP391
[1] Gu B. Recognizing nested named entities in GENIA corpus[C]∥Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology. Association for Computational Linguistics, 2006: 112-113.
[2] Tanabe L, Wilbur W J. A priority model for named entities[C]∥Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis. Association for Computational Linguistics, 2006: 33-40.
[3] Kim J D, Ohta T, Tsuruoka Y, et al. Introduction to the bio-entity recognition task at JNLPBA[C]∥Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Association for Computational Linguistics, 2004: 70-75.
[4] 夏涵. 基于本体的医学命名实体识别技术研究[D]. 上海:上海交通大学软件学院, 2012:60-65. Xia Han. Research of medical named entity recognition technology based on ontology[D]. Shanghai: College of Software,Shanghai Jiaotong University, 2012:60-65.
[5] Leaman R, Miller C, Gonzalez G. Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark[C]∥Proceedings of the 2009 Symposium on Languages in Biology and Medicine,2009.
[6] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报,2009,23(2):6-7.[6] Zhao Jun.A survey on named entity recognition, disambiguation and cross-lingual coreference resolution[J].Journal of Chinese Information Prosessing,2009,23(2):6-7.
[7] 郑强,刘齐军,王正华,等.生物医学命名实体识别的研究与进展[J].计算机应用研究, 2010, 27(3):812-814. Zheng Qiang,Liu Qi-jun,Wang Zheng-hua,et al. Research and development on biomedical named entity recognition[J] Application Research of Computers,2010, 27(3):812-814.
[8] Li D, Kipper-Schuler K, Savova G. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]∥Current Trends in Biomedical Natural Language Processing (BioNLP) 2008:94-95.
[9] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989, 77(2): 257-286.
[10] McCallum A, Freitag D, Pereira F C N. Maximum entropy markov models for information extraction and segmentation[C]∥ICML,2000: 591-598.
[11] Lafferty J, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[Z]. 2001.
[12] Mc Donald R, Pereira F. Identifying gene and protein mentions in text using conditional random fields[J]. BMC Bioinformatics, 2005, 6(Suppl 1): S6.
[13] Leaman R, Gonzalez G. BANNER: an executable survey of advances in biomedical named entity recognition[C]∥Pacific Symposium on Biocomputing. 2008, 13: 652-663.
[14] Wang Ya-qiang,Liu Yi-guang.A preliminary work on symptom name recognition from free-text clinical records of traditional chinese medicine using conditional random fields and reasonable features[C]∥BioNLP2012:223-230.
[15] Sutton C, McCallum A. Composition of conditional random fields for transfer learning[C]∥Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005:748-754.
[16] 周俊生, 戴新宇, 尹存燕, 等. 基于层叠条件随机场模型的中文机构名自动识别[J]. 电子学报, 2006, 34(5): 804-809. Zhou Jun-sheng, Dai Xin-yu,Yin Cun-yan,et al. Automatic rrecognition of Chinese organization name based on cascaded conditional random fields[J].Chinese Journal of Electronics,2006,34(5):804-809.
[17] Ratinov L,Roth D. Design challengesand misconceptions in named entity recognition[C]∥InCoNLL,2009:147-155.
[1] YAO Hai-yang, WANG Hai-yan, ZHANG Zhi-chen, SHEN Xiao-hong. Reverse-joint signal detection model with double Duffing oscillator [J]. 吉林大学学报(工学版), 2018, 48(4): 1282-1290.
[2] QUAN Wei, HAO Xiao-ming, SUN Ya-dong, BAI Bao-hua, WANG Yu-ting. Development of individual objective lens for head-mounted projective display based on optical system of actual human eye [J]. 吉林大学学报(工学版), 2018, 48(4): 1291-1297.
[3] CHEN Tao, CUI Yue-han, GUO Li-min. Improved algorithm of multiple signal classification for single snapshot [J]. 吉林大学学报(工学版), 2018, 48(3): 952-956.
[4] CHEN Mian-shu, SU Yue, SANG Ai-jun, LI Pei-peng. Image classification methods based on space vector model [J]. 吉林大学学报(工学版), 2018, 48(3): 943-951.
[5] MENG Guang-wei, LI Rong-jia, WANG Xin, ZHOU Li-ming, GU Shuai. Analysis of intensity factors of interface crack in piezoelectric bimaterials [J]. 吉林大学学报(工学版), 2018, 48(2): 500-506.
[6] LIN Jin-hua, WANG Yan-jie, SUN Hong-hai. Improved feature-adaptive subdivision for Catmull-Clark surface model [J]. 吉林大学学报(工学版), 2018, 48(2): 625-632.
[7] WANG Ke, LIU Fu, KANG Bing, HUO Tong-tong, ZHOU Qiu-zhan. Bionic hypocenter localization method inspired by sand scorpion in locating preys [J]. 吉林大学学报(工学版), 2018, 48(2): 633-639.
[8] YU Hua-nan, DU Yao, GUO Shu-xu. High-precision synchronous phasor measurement based on compressed sensing [J]. 吉林大学学报(工学版), 2018, 48(1): 312-318.
[9] LIU Dong-liang, WANG Qiu-shuang. Instantaneous velocity extraction method on NGSLM data [J]. 吉林大学学报(工学版), 2018, 48(1): 330-335.
[10] WANG Fang-shi, WANG Jian, LI Bing, WANG Bo. Deep attribute learning based traffic sign detection [J]. 吉林大学学报(工学版), 2018, 48(1): 319-329.
[11] LI Juan, MENG Ke-xin, LI Yue, LIU Hui-li. Seismic signal noise suppression based on similarity matched Wiener filtering [J]. 吉林大学学报(工学版), 2017, 47(6): 1964-1968.
[12] TANG Kun, SHI Rong-hua. Detection of wireless sensor network failure area based on butterfly effect signal [J]. 吉林大学学报(工学版), 2017, 47(6): 1939-1948.
[13] YANG Chao-yu, LI Ce, LIANG Yin-cheng, YANG Feng. Blurred object detection based on improved particle filter in coal mine underground surveilance [J]. 吉林大学学报(工学版), 2017, 47(6): 1976-1985.
[14] LIU Rang, WANG De-jiang, ZHANG Liu, ZHOU Da-biao, JIA Ping, DING Peng. Non-uniformity correction and point target detection based on gradient sky background [J]. 吉林大学学报(工学版), 2017, 47(5): 1625-1633.
[15] SUI Yan-lin, HE Bin, ZHANG Li-guo, WANG Wen-hua, CHEN Jia-nan. Ultra-high speed CameraLink image transmission based on FPGA [J]. 吉林大学学报(工学版), 2017, 47(5): 1634-1643.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!