考虑特征稀疏特性的短文本命名实体快速识别方法

doi:10.13229/j.cnki.jdxbgxb.20221137

Abstract

Abstract:

The proposed method selects appropriate features by filtering punctuation marks， constructs vectors， segments two or more words to form specific semantics， and labels parts of speech to identify relative parts of speech； Utilizing the Latent dirichlet allocation （LDA） model to establish correlation between topics and documents， clarify document topics， and reduce data feature sparsity； The Bidirectional long short-term memory-conditional random field （BR-BiLSTM-CRF）model detects the boundaries of text named entities through a bidirectional LSTM model， which is combined with the output entity types of the chain conditional random field layer. After adding features of vocabulary and parts of speech， the overall sequence entity edge of the text is detected. The network parameters are corrected using cross entropy and gradient descent until the error does not exceed the specified value， achieving text named entity recognition. Through experiments， it has been proven that the proposed method has fast recognition speed， high accuracy， and strong overall performance. The proposed method can better recognize language through computers， clarify the part of speech of text， and improve the accuracy and efficiency of named entity recognition.

Key words: natural language processing, feature sparsity, short text naming, fast recognition of short text entities, text preprocessing, characteristic weight

CLC Number:

TP391.1

Yue-kun MA,Yi-feng HAO. Fast recognition method of short text named entities considering feature sparsity[J].Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3529-3535.

Figures/Tables 8

Table 1

Table 2

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

References 23

1	石鑫, 赵池航, 张小琴, 等. 基于融合特征稀疏编码模型的车辆品牌识别方法[J]. 筑路机械与施工机械化, 2020, 37(3): 59-63.
	Shi Xin, Zhao Chi-hang, Zhang Xiao-qin, et al. Recognition of vehicle brands based on sparse coding model of fused features[J]. Road Machinery & Construction Mechanization, 2020,37(3): 59-63.
2	李宝昌, 郭卫斌. 词典信息分层调整的中文命名实体识别方法[J]. 华东理工大学学报: 自然科学版, 2023, 49(2): 276-283.
	Li Bao-chang, Guo Wei-bin. Chinese named entity recognition method based on hierarchical adjustment of dictionary information [J]. Journal of East China University of Science and Technology, 2023, 49(2): 276-283.
3	Guan F, Cui W, Li L, et al. A method of false alarm recognition in built-in test considering its time series characteristics[J]. IEEE Transactions on Industrial Electronics, 2021, 68(11): 11428-11437.
4	张虹, 左鑫兰, 黄瑶. 基于稀疏表示系数相关性的特征选择及SAR目标识别方法[J]. 激光与光电子学进展, 2020, 57(14): 271-278.
	Zhang Hong, Zuo Xin-lan, Huang Yao. Feature selection based on the correlation of sparse coefficient vectors with application to SAR target recognition[J]. Laser & Optoelectronics Progress, 2020, 57(14): 271-278.
5	刘宇鹏, 栗冬冬. 基于BLSTM-CNN-CRF的中文命名实体识别方法[J]. 哈尔滨理工大学学报, 2020, 25(1): 115-120.
	Liu Yu-peng, Li Dong-dong. Chinese named entity recognition method based on BLSTM-CNN-CRF[J]. Journal of Harbin University of Science and Technology, 2020, 25(1): 115-120.
6	Li S, Yang K, Ma J, et al. Anti-interference recognition method of aerial infrared targets based on the Bayesian network[J]. Journal of Optics, 2021, 50(2): 264-277.
7	刘奕洋, 余正涛, 高盛祥, 等. 基于机器阅读理解的中文命名实体识别方法[J]. 模式识别与人工智能, 2020, 33(7): 653-659.
	Liu Yi-yang, Yu Zheng-tao, Gao Sheng-xiang, et al. Chinese named entity recognition method based on machine reading comprehension[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(7): 653-659.
8	田雨, 张桂平, 蔡东风, 等. 基于多颗粒度文本表征的中文命名实体识别方法[J]. 中文信息学报, 2022, 36(4): 90-99.
	Tian Yu, Zhang Gui-ping, Cai Dong-feng, et al. Chinese named entity recognition method based on multi-granularity text representation[J]. Journal of Chinese Information Processing, 2022, 36(4): 90-99.
9	Wang X, Wang S, Guo Y, et al. Dielectric and geometric feature extraction and recognition method of coal and gangue based on VMD-SVM[J]. Powder Technology, 2021, 392: 241-250.
10	Tao D, Gao F, Chen D, et al. Recognition method of the dual-objective in a linear array CCD-based improved photoelectric measurement system using two lasers with different wavelengths[J]. Optik, 2020, 217: No. 164857.
11	严红, 陈兴蜀, 王文贤, 等. 基于深度神经网络的法语命名实体识别模型[J]. 计算机应用, 2019, 39(5): 1288-1292.
	Yan Hong, Chen Xing-shu, Wang Wen-xian, et al. Recognition model for French named entities based on deep neural network[J]. Journal of Computer Applications, 2019, 39(5): 1288-1292.
12	Zhang W, Zhou T, Zhao J, et al. Recognition of the idle state based on a novel IFB-OCN method for an asynchronous brain-computer interface[J]. Journal of Neuroscience Methods, 2020, 341: No. 108776.
13	杨阳, 刘恩博, 顾春华, 等. 稀疏数据下结合词向量的短文本分类模型研究[J]. 计算机应用研究, 2022, 39(3): 711-715, 750.
	Yang Yang, Liu En-bo, Gu Chun-hua, et al. Research on short text classification model combined with word vectors under sparse data[J]. Application Research of Computers, 2022,39(3):711-715, 750.
14	Li Y, Du G, Xiang Y, et al. Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge[J]. Journal of Biomedical Informatics, 2020, 106: No. 103435.
15	范维克, 张绍阳, 陈博远,等. 交通信息标准条款BLSTM和CNN链式模型分类方法[J]. 江苏大学学报: 自然科学版, 2020, 41(2): 143-148.
	Fan Wei-ke, Zhang Shao-yang, Chen Bo-yuan, et al. Classification method of BLSTM and CNN chain model of traffic information standard clauses[J]. Journal of Jiangsu University (Natural Science Edition), 2020, 41(2): 143-148.
16	Wang Y, Liu T, Yang M, et al. A handheld testing device for the fast and ultrasensitive recognition of cardiac troponin I via an ion-sensitive field-effect transistor[J]. Biosensors & Bioelectronics, 2021, 193(2): No. 113554.
17	李静, 程芃森, 许丽丹, 等. 基于局部对抗训练的命名实体识别方法研究[J]. 四川大学学报: 自然科学版, 2021, 58(2): 113-120.
	Li Jing, Cheng Peng-sen, Xu Li-dan, et al. Name entity recognition based on local adversarial training[J]. Journal of Sichuan University (Natural Science Edition), 2021, 58(2): 113-120.
18	Cao W, Wang R, Fan M, et al. A new froth image classification method based on the MRMR-SSGMM hybrid model for recognition of reagent dosage condition in the coal flotation process[J]. Applied Intelligence, 2022, 52(1): 732-752.
19	孙同晶, 刘桐, 杨阳. 多阶次分数阶傅里叶域特征融合的主动声呐目标稀疏表示分类方法[J].电子与信息学报, 2021, 43(3): 809-816.
	Sun Tong-jing, Liu Tong, Yang Yang. Sparse representation classification method for active sonar target based on multi-order fractional Fourier domain feature fusion[J]. Journal of Electronics & Information Technology, 2021, 43(3): 809-816.
20	王进, 徐巍, 丁一, 等. 基于图嵌入和区域注意力的多标签文本分类[J]. 江苏大学学报: 自然科学版, 2022, 43(3): 310-318.
	Wang Jin, Xu Wei, Ding Yi, et al. Multi-label text classification based on graph embedding and regional attention[J]. Journal of Jiangsu University (Natural Science Edition), 2022, 43(3): 310-318.
21	Cesari M, Heidbreder A, Gaig C, et al. Automatic analysis of muscular activity in the flexor digitorum superficialis muscles: a fast screening method for rapid eye movement sleep without atonia[J]. Sleep, 2023, 46(3): No. zsab299.
22	李大湘, 陈梦思, 刘颖. 基于STA-LSTM的自发微表情识别算法[J]. 吉林大学学报: 工学版, 2022, 52(4): 897-909.
	Li Da-xiang, Chen Meng-si, Liu Ying. Spontaneous micro-expression recognition based on STA-LSTM[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(4): 897-909.
23	王进, 李颖, 蒋晓翠, 等. 基于层级残差连接LSTM的命名实体识别[J]. 江苏大学学报: 自然科学版, 2022, 43(4): 446-452.
	Wang Jin, Li Ying, Jiang Xiao-cui, et al. Named entity recognition based on hierarchical residuals connected LSTM[J]. Journal of Jiangsu University (Natural Science Edition), 2022, 43(4): 446-452.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

名称	样式	名称	样式
单位符号	￥＄℃￠￡	叹号	！！
百分号、千分号	%、‰	左括号	））｛［【《<
破折号	——	右括号	））｝］】》>
省略号	…...	左引号	“‘
冒号	：	右引号	”’
顿号	、	句号	。
分号	；；	问号	？？
逗号	，，

虚词词性	缩写	实词词性	缩写
介词	p	动词	v
副词	d	名词	n
助词	u	形容词	a
叹词	e	量词	q
语气词	o	数词	m
连词	c	代词	r

名称	样本数
名称	人名	地名	组织名	其他
MSRA数据集	1521	875	479	1017
OntoNotes数据集	1042	1084	328	1108
Iris数据集	804	481	1193	1054

[1]	Xiang-jiu CHE,Huan XU,Ming-yang PAN,Quan-le LIU. Two-stage learning algorithm for biomedical named entity recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2380-2387.
[2]	Tian BAI,Ming-wei XU,Si-ming LIU,Ji-an ZHANG,Zhe WANG. Dispute focus identification of pleading text based on deep neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1872-1880.
[3]	Ya-hui ZHAO,Fei-yang YANG,Zhen-guo ZHANG,Rong-yi CUI. Korean text structure discovery based on reinforcement learning and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1387-1395.
[4]	ZHOU Xuan-yu, LIU Juan, SHAO Peng, LUO Fei, LIU Yang. Chinese anaphora resolution based on multi-pass sieve model [J]. 吉林大学学报(工学版), 2016, 46(4): 1209-1215.
[5]	XIN Yu, YANG Jing, XIE Zhi-qiang. K-topic increment training algorithm based on LDA [J]. 吉林大学学报(工学版), 2015, 45(4): 1242-1252.
[6]	LI Di-fei, TIAN Di, HU Xiong-wei. Standard literature language model based on deep learning [J]. 吉林大学学报(工学版), 2015, 45(2): 596-599.

Fast recognition method of short text named entities considering feature sparsity

RICH HTML

PDF (PC)

Abstract

Cite this article

share this article

Figures/Tables 8

References 23

Related Articles 6

Metrics

Comments

Recommended 0