吉林大学学报(工学版) ›› 2023, Vol. 53 ›› Issue (12): 3529-3535.doi: 10.13229/j.cnki.jdxbgxb.20221137
• 计算机科学与技术 • 上一篇
Yue-kun MA1,2,3,4(),Yi-feng HAO1
摘要:
首先,通过过滤标点符号选择适当的特征,并构建向量,分割两个及两个以上词语组成特定语义,标注词性,找出相对词类。其次,利用潜在狄利克雷分配(LDA)模型令话题与文档间存在相关性,明确文档主题,降低数据特征稀疏特性。再次,本文双向长短期记忆网络条件随机场(BR-BiLSTM-CRF)模型通过双向LSTM模型检测文本命名实体的边界,与链式条件随机场层的输出实体类型相结合,增加了词汇和词类的特征,实现对文本整体序列实体边缘的检测。最后,采用交叉熵和梯度下降修正网络参数,直至误差不超过指定数值,实现文本命名实体的识别。实验结果表明:本文方法识别速度快、精度高、整体性能强;该方法能够更好地通过计算机识别语言明确文本词性,提高命名实体识别的准确性和效率。
中图分类号:
1 | 石鑫, 赵池航, 张小琴, 等. 基于融合特征稀疏编码模型的车辆品牌识别方法[J]. 筑路机械与施工机械化, 2020, 37(3): 59-63. |
Shi Xin, Zhao Chi-hang, Zhang Xiao-qin, et al. Recognition of vehicle brands based on sparse coding model of fused features[J]. Road Machinery & Construction Mechanization, 2020,37(3): 59-63. | |
2 | 李宝昌, 郭卫斌. 词典信息分层调整的中文命名实体识别方法[J]. 华东理工大学学报: 自然科学版, 2023, 49(2): 276-283. |
Li Bao-chang, Guo Wei-bin. Chinese named entity recognition method based on hierarchical adjustment of dictionary information [J]. Journal of East China University of Science and Technology, 2023, 49(2): 276-283. | |
3 | Guan F, Cui W, Li L, et al. A method of false alarm recognition in built-in test considering its time series characteristics[J]. IEEE Transactions on Industrial Electronics, 2021, 68(11): 11428-11437. |
4 | 张虹, 左鑫兰, 黄瑶. 基于稀疏表示系数相关性的特征选择及SAR目标识别方法[J]. 激光与光电子学进展, 2020, 57(14): 271-278. |
Zhang Hong, Zuo Xin-lan, Huang Yao. Feature selection based on the correlation of sparse coefficient vectors with application to SAR target recognition[J]. Laser & Optoelectronics Progress, 2020, 57(14): 271-278. | |
5 | 刘宇鹏, 栗冬冬. 基于BLSTM-CNN-CRF的中文命名实体识别方法[J]. 哈尔滨理工大学学报, 2020, 25(1): 115-120. |
Liu Yu-peng, Li Dong-dong. Chinese named entity recognition method based on BLSTM-CNN-CRF[J]. Journal of Harbin University of Science and Technology, 2020, 25(1): 115-120. | |
6 | Li S, Yang K, Ma J, et al. Anti-interference recognition method of aerial infrared targets based on the Bayesian network[J]. Journal of Optics, 2021, 50(2): 264-277. |
7 | 刘奕洋, 余正涛, 高盛祥, 等. 基于机器阅读理解的中文命名实体识别方法[J]. 模式识别与人工智能, 2020, 33(7): 653-659. |
Liu Yi-yang, Yu Zheng-tao, Gao Sheng-xiang, et al. Chinese named entity recognition method based on machine reading comprehension[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(7): 653-659. | |
8 | 田雨, 张桂平, 蔡东风, 等. 基于多颗粒度文本表征的中文命名实体识别方法[J]. 中文信息学报, 2022, 36(4): 90-99. |
Tian Yu, Zhang Gui-ping, Cai Dong-feng, et al. Chinese named entity recognition method based on multi-granularity text representation[J]. Journal of Chinese Information Processing, 2022, 36(4): 90-99. | |
9 | Wang X, Wang S, Guo Y, et al. Dielectric and geometric feature extraction and recognition method of coal and gangue based on VMD-SVM[J]. Powder Technology, 2021, 392: 241-250. |
10 | Tao D, Gao F, Chen D, et al. Recognition method of the dual-objective in a linear array CCD-based improved photoelectric measurement system using two lasers with different wavelengths[J]. Optik, 2020, 217: No. 164857. |
11 | 严红, 陈兴蜀, 王文贤, 等. 基于深度神经网络的法语命名实体识别模型[J]. 计算机应用, 2019, 39(5): 1288-1292. |
Yan Hong, Chen Xing-shu, Wang Wen-xian, et al. Recognition model for French named entities based on deep neural network[J]. Journal of Computer Applications, 2019, 39(5): 1288-1292. | |
12 | Zhang W, Zhou T, Zhao J, et al. Recognition of the idle state based on a novel IFB-OCN method for an asynchronous brain-computer interface[J]. Journal of Neuroscience Methods, 2020, 341: No. 108776. |
13 | 杨阳, 刘恩博, 顾春华, 等. 稀疏数据下结合词向量的短文本分类模型研究[J]. 计算机应用研究, 2022, 39(3): 711-715, 750. |
Yang Yang, Liu En-bo, Gu Chun-hua, et al. Research on short text classification model combined with word vectors under sparse data[J]. Application Research of Computers, 2022,39(3):711-715, 750. | |
14 | Li Y, Du G, Xiang Y, et al. Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge[J]. Journal of Biomedical Informatics, 2020, 106: No. 103435. |
15 | 范维克, 张绍阳, 陈博远,等. 交通信息标准条款BLSTM和CNN链式模型分类方法[J]. 江苏大学学报: 自然科学版, 2020, 41(2): 143-148. |
Fan Wei-ke, Zhang Shao-yang, Chen Bo-yuan, et al. Classification method of BLSTM and CNN chain model of traffic information standard clauses[J]. Journal of Jiangsu University (Natural Science Edition), 2020, 41(2): 143-148. | |
16 | Wang Y, Liu T, Yang M, et al. A handheld testing device for the fast and ultrasensitive recognition of cardiac troponin I via an ion-sensitive field-effect transistor[J]. Biosensors & Bioelectronics, 2021, 193(2): No. 113554. |
17 | 李静, 程芃森, 许丽丹, 等. 基于局部对抗训练的命名实体识别方法研究[J]. 四川大学学报: 自然科学版, 2021, 58(2): 113-120. |
Li Jing, Cheng Peng-sen, Xu Li-dan, et al. Name entity recognition based on local adversarial training[J]. Journal of Sichuan University (Natural Science Edition), 2021, 58(2): 113-120. | |
18 | Cao W, Wang R, Fan M, et al. A new froth image classification method based on the MRMR-SSGMM hybrid model for recognition of reagent dosage condition in the coal flotation process[J]. Applied Intelligence, 2022, 52(1): 732-752. |
19 | 孙同晶, 刘桐, 杨阳. 多阶次分数阶傅里叶域特征融合的主动声呐目标稀疏表示分类方法[J].电子与信息学报, 2021, 43(3): 809-816. |
Sun Tong-jing, Liu Tong, Yang Yang. Sparse representation classification method for active sonar target based on multi-order fractional Fourier domain feature fusion[J]. Journal of Electronics & Information Technology, 2021, 43(3): 809-816. | |
20 | 王进, 徐巍, 丁一, 等. 基于图嵌入和区域注意力的多标签文本分类[J]. 江苏大学学报: 自然科学版, 2022, 43(3): 310-318. |
Wang Jin, Xu Wei, Ding Yi, et al. Multi-label text classification based on graph embedding and regional attention[J]. Journal of Jiangsu University (Natural Science Edition), 2022, 43(3): 310-318. | |
21 | Cesari M, Heidbreder A, Gaig C, et al. Automatic analysis of muscular activity in the flexor digitorum superficialis muscles: a fast screening method for rapid eye movement sleep without atonia[J]. Sleep, 2023, 46(3): No. zsab299. |
22 | 李大湘, 陈梦思, 刘颖. 基于STA-LSTM的自发微表情识别算法[J]. 吉林大学学报: 工学版, 2022, 52(4): 897-909. |
Li Da-xiang, Chen Meng-si, Liu Ying. Spontaneous micro-expression recognition based on STA-LSTM[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(4): 897-909. | |
23 | 王进, 李颖, 蒋晓翠, 等. 基于层级残差连接LSTM的命名实体识别[J]. 江苏大学学报: 自然科学版, 2022, 43(4): 446-452. |
Wang Jin, Li Ying, Jiang Xiao-cui, et al. Named entity recognition based on hierarchical residuals connected LSTM[J]. Journal of Jiangsu University (Natural Science Edition), 2022, 43(4): 446-452. |
[1] | 车翔玖,徐欢,潘明阳,刘全乐. 生物医学命名实体识别的两阶段学习算法[J]. 吉林大学学报(工学版), 2023, 53(8): 2380-2387. |
[2] | 白天,徐明蔚,刘思铭,张佶安,王喆. 基于深度神经网络的诉辩文本争议焦点识别[J]. 吉林大学学报(工学版), 2022, 52(8): 1872-1880. |
[3] | 赵亚慧,杨飞扬,张振国,崔荣一. 基于强化学习和注意力机制的朝鲜语文本结构发现[J]. 吉林大学学报(工学版), 2021, 51(4): 1387-1395. |
[4] | 周炫余, 刘娟, 邵鹏, 罗飞, 刘洋. 基于层次过滤模型的中文指代消解[J]. 吉林大学学报(工学版), 2016, 46(4): 1209-1215. |
[5] | 辛宇, 杨静, 谢志强. 一种基于LDA的k话题增量训练算法[J]. 吉林大学学报(工学版), 2015, 45(4): 1242-1252. |
[6] | 李抵非,田地,胡雄伟. 基于深度学习的中文标准文献语言模型[J]. 吉林大学学报(工学版), 2015, 45(2): 596-599. |
|