吉林大学学报(工学版) ›› 2012, Vol. 42 ›› Issue (01): 234-239.

• 论文 • 上一篇    下一篇

基于隐含主题和语义树的医学文本自动批注

李博1, 文敦伟2, 王珂1, 刘景鑫3   

  1. 1. 吉林大学 通信工程学院,长春 130012;
    2. 阿萨巴斯卡大学 计算机与信息系统学院,阿萨巴斯卡 T9S3A3, 加拿大;
    3. 吉林大学 中日联谊医院,长春 130033
  • 收稿日期:2011-06-27 出版日期:2012-01-01 发布日期:2012-01-01
  • 作者简介:李博(1985-),男,博士研究生.研究方向:图像处理.E-mail:barbarianasia@163.com
  • 基金资助:

    国家自然科学基金项目(60372062).

Automatic annotation for medical texts based on hidden topic and semantic tree

LI Bo1, WEN Dun-wei2, WANG Ke1, LIU Jing-xin3   

  1. 1. School of Communication Engineering, Jilin University, Changchun 130012, China;
    2. School of Computing and Information Systems, Athabasca University, Athabasca T9S3A3, Canada;
    3. China-Japan Union Hospital, Jilin University, Changchun 130033, China
  • Received:2011-06-27 Online:2012-01-01 Published:2012-01-01

摘要:

针对医学文本缺乏可量化数据结构,基于关键词模型的文本处理方法不适用的问题,在研究词之间潜在语义关联和关键词树结构的基础上,构造了一种基于潜在语义树的语义分析模型用于医学文本的数据挖掘。进一步地将隐含主题与潜在语义的研究相关联,设计出一种基于潜在狄利克雷分配和潜在语义树模型的文本处理方法,可针对不同类型的医学文本生成有一定可读性的自动批注。该方法形成的自动批注主观性低,其准确度和可读性均高于关键词模型的处理结果,可辅助医生进行医学文本的批注和分类,从而减轻其工作量。程序结果表明,该方法目前可应用于对医学图像所见形成诊断意见、对病人病历进行摘要形成和对病症描述给出对症处方等方面,批注的语义匹配度可达67.7%,文本的平均可读性为60.02%。

关键词: 信息处理技术, 医学文本, 自动批注, 潜在狄利克雷分配, 潜在语义分析, 语义树

Abstract:

Medical texts lack quantifiable data structure, thus text keyword model based processing method is not practicable. On the basis of research on latent semantic association between words and keywords tree structure, a semantic analysis model based on latent semantic tree was constructed for medical text data mining. Furthermore, the hidden topic is associated with latent semantic research, and a text processing method was designed based on potential Dirichlet allocation and latent semantic tree model, which can form certain readable automatic annotation according to different types of medical texts. This automatic annotation has lower subjectivity, higher accuracy and readability than the keywords model method. Besides, it can assist medical doctors with text notation and classification, reducing their workload. Program results show that this method can be applied to medical image views and to form diagnosis opinion, patient medical records, produce symptomatic prescription. The semantic matching degree for annotation is 67.7%, and the readability of the text can reach 60.02%.

Key words: information processing, medical texts, automatic annotation, latent Dirichlet allocation, latent semantic analysis, semantic tree

中图分类号: 

  • TN919.8


[1] Valerie Bertaud, Jeremy Lasbleiz, Fleur Mougin, et al. A unified representation of findings in clinical radiology using the UMLS and DICOM
[J]. International Journal of Medical Informatics, 2008, 77: 621-629.

[2] Newman David, Karimi Sarvnaz, Cavedon Lawrence. Using topic models to interpret MEDLINE's medical subject headings
[J]. Lecture Notes in Computer Science, 2009, 5866:270-279.

[3] Jihen Majdoubi, Mohamed Tmar, Faiez Gargouri. Using the mesh thesaurus to index a medical article combination of content, structure and semantics
[J]. Lecture Notes in Computer Science, 2009, 5711: 277-284.

[4] 赵军,金千里,徐波. 面向文本检索的语义计算
[J]. 计算机学报,2005,28(12):2068-2078. Zhao Jun, Jin Qian-li, Xu Bo. Semantic computation for text retrieval
[J]. Chinese Journal of Computers, 2005, 28(12):2068-2078.

[5] Trevor Cohen, Brett Blatter, Vimla Patel. Simulating expert clinical comprehension: adapting latent semantic analysis to accurately extract clinical concepts from psychiatric narrative
[J]. Journal of Biomedical Informatics, 2008, 41: 1070-1087.

[6] Blei David M, Ng Andrew Y, Jordan Michael I. Latent dirichlet allocation
[J]. Journal of Machine Learning Research,2003, 3:993-1022.

[7] Marco Cuturi, Jean-Philippe Vert. The context-tree kernel for strings
[J]. Neural Networks, 2005, 18:1111-1123.

[8] Tsochantaridis Ioannis, Hofmann Thomas, Joachims Thorsten, et al. Support vector machine learning for interdependent and structured output spaces//Proceedings of the 21st International Conference on Machine Learning,Banff, Alta, Canada: Association for Computing Machinery, 2004: 823-830.

[1] 苏寒松,代志涛,刘高华,张倩芳. 结合吸收Markov链和流行排序的显著性区域检测[J]. 吉林大学学报(工学版), 2018, 48(6): 1887-1894.
[2] 徐岩,孙美双. 基于卷积神经网络的水下图像增强方法[J]. 吉林大学学报(工学版), 2018, 48(6): 1895-1903.
[3] 黄勇,杨德运,乔赛,慕振国. 高分辨合成孔径雷达图像的耦合传统恒虚警目标检测[J]. 吉林大学学报(工学版), 2018, 48(6): 1904-1909.
[4] 李居朋,张祖成,李墨羽,缪德芳. 基于Kalman滤波的电容屏触控轨迹平滑算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1910-1916.
[5] 应欢,刘松华,唐博文,韩丽芳,周亮. 基于自适应释放策略的低开销确定性重放方法[J]. 吉林大学学报(工学版), 2018, 48(6): 1917-1924.
[6] 陆智俊,钟超,吴敬玉. 星载合成孔径雷达图像小特征的准确分割方法[J]. 吉林大学学报(工学版), 2018, 48(6): 1925-1930.
[7] 刘仲民,王阳,李战明,胡文瑾. 基于简单线性迭代聚类和快速最近邻区域合并的图像分割算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1931-1937.
[8] 单泽彪,刘小松,史红伟,王春阳,石要武. 动态压缩感知波达方向跟踪算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1938-1944.
[9] 姚海洋, 王海燕, 张之琛, 申晓红. 双Duffing振子逆向联合信号检测模型[J]. 吉林大学学报(工学版), 2018, 48(4): 1282-1290.
[10] 全薇, 郝晓明, 孙雅东, 柏葆华, 王禹亭. 基于实际眼结构的个性化投影式头盔物镜研制[J]. 吉林大学学报(工学版), 2018, 48(4): 1291-1297.
[11] 陈绵书, 苏越, 桑爱军, 李培鹏. 基于空间矢量模型的图像分类方法[J]. 吉林大学学报(工学版), 2018, 48(3): 943-951.
[12] 陈涛, 崔岳寒, 郭立民. 适用于单快拍的多重信号分类改进算法[J]. 吉林大学学报(工学版), 2018, 48(3): 952-956.
[13] 孟广伟, 李荣佳, 王欣, 周立明, 顾帅. 压电双材料界面裂纹的强度因子分析[J]. 吉林大学学报(工学版), 2018, 48(2): 500-506.
[14] 林金花, 王延杰, 孙宏海. 改进的自适应特征细分方法及其对Catmull-Clark曲面的实时绘制[J]. 吉林大学学报(工学版), 2018, 48(2): 625-632.
[15] 王柯, 刘富, 康冰, 霍彤彤, 周求湛. 基于沙蝎定位猎物的仿生震源定位方法[J]. 吉林大学学报(工学版), 2018, 48(2): 633-639.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!