吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (1): 232-239.doi: 10.13229/j.cnki.jdxbgxb.20220239

• 计算机科学与技术 • 上一篇    

基于BERT与密集复合网络的长文本语义匹配模型

陈岳林1(),高铸成1,蔡晓东2()   

  1. 1.桂林电子科技大学 机电工程学院,广西 桂林 541000
    2.桂林电子科技大学 信息与通信学院,广西 桂林 541000
  • 收稿日期:2022-03-12 出版日期:2024-01-30 发布日期:2024-03-28
  • 通讯作者: 蔡晓东 E-mail:370883566@qq.com;caixiaodong@guet.edu.com
  • 作者简介:陈岳林(1963-),男,教授.研究方向:自然语言处理.E-mail:370883566@qq.com
  • 基金资助:
    广西创新驱动发展专项项目(桂科AA20302001)

Long text semantic matching model based on BERT and dense composite network

Yue-lin CHEN1(),Zhu-cheng GAO1,Xiao-dong CAI2()   

  1. 1.School of Mechanical and Electrical Engineering,Guilin University of Electronic Technology,Guilin 541000,China
    2.School of Information and Communication,Guilin University of Electronic Technology,Guilin 541000,China
  • Received:2022-03-12 Online:2024-01-30 Published:2024-03-28
  • Contact: Xiao-dong CAI E-mail:370883566@qq.com;caixiaodong@guet.edu.com

摘要:

针对长文本语义匹配中词向量前后之间联系不易捕获以及主题信息可能不唯一,通常使得语义匹配效果不佳的问题,提出了一种基于BERT与密集复合网络的长文本语义匹配方法,通过BERT嵌入与复合网络的密集连接,显著提高了长语义匹配的准确率。首先,将句子对输入BERT预训练模型,通过迭代反馈得到精准的词向量表示,进而得到高质量的句子对语义信息。其次,设计了一种密集复合网络,先由双向长短期记忆网络(Bi-LSTM)获得句子对的全局语义信息,然后由TextCNN提取并整合局部语义信息得到每个句子的关键特征和句子对间的对应关系,并将BERT与Bi-LSTM的隐藏输出与TextCNN的池化输出融合。最后,汇总训练过程中网络之间的关联状态,可以有效防止网络退化和增强模型判断能力。实验结果表明,在社区问题回答(CQA)长文本数据集上,本文方法平均提升幅度达到45%。

关键词: 深度学习, 长文本语义匹配, BERT, 密集复合网络, Bi-LSTM, TextCNN

Abstract:

In the semantic matching of long texts, it is challenging to capture the before-and-after connections and topic information, which often results in poor semantic matching. This paper proposes a long text semantic matching method based on BERT and dense composite network. Through the dense connection of BERT embedding and composite network, the accuracy of long semantic matching is significantly improved. First, the sentence pair is input into the BERT pre-training model, and accurate word vector representation is obtained through iterative feedback, and then high-quality sentence pair semantic information is obtained. Secondly, a dense composite network is designed. Bi-LSTM first obtains the global semantic information of sentence pairs, and then TextCNN extracts and integrates local semantic information to obtain the key features of each sentence and the correspondence between sentence pairs, and the BERT Fusion with the hidden output of Bi-LSTM and the pooled output of TextCNN. Finally, summarizing the association state between networks during the training process can effectively prevent network degradation and enhance the model’s judgment ability. The experimental results show that on the community question answering (CQA) long text dataset, the method in this paper has a significant effect, with an average improvement of 45%.

Key words: deep learning, long text semantic matching, BERT, dense composite network, Bi-LSTM, TextCNN

中图分类号: 

  • TP391.1

图1

模型整体结构图"

表1

CQA数据集样例"

句子1句子2标签
playing basketball I am basketball player; who lives in umm ghuwlina (doha). I want to play basketball. if people play basketball can contact me to arrange gamesNew to Doha Hi folks, I'm moving to Doha on the 7th January, just looking for a bit of advice on what sort of things I could do for entertainment while I'm there. I'm interested in trying out new sports and golf, also wondered where the best places to go to watch the English football, also anywhere that would be likely to show Scotland internationals and the 6 nations rugby? Cheers0
playing basketball I am basketball player; who lives in umm ghuwlina (doha). i want to play basketball. if people play basketball can contact me to arrange gamesBasketball Ok so we're talked about this before but didn't seem to find a place to play. Good news guys! I found a court to play on :-) It's in Education City, anyone who doesn't know where that is no worries I'll give u directions or we can meet up elsewhere and u can follow me. Anyways, I just need to know who's in so I can book it cause it can get really busy soon so I need to do this in advance...so who's in? P.S. don't ask what I'm doing up at this hour! I don't even know!1

表2

CQA数据集参数"

数据集CQA
SemEval-ASemEval-BSemEval-C
训练集20 3403 16931 690
验证集3 7207007 000
测试集2 9308808 800

图2

A任务训练结果"

图3

B任务训练结果"

图4

C任务训练结果"

表3

CQA数据集上实验结果对比"

模型F1
SemEval-ASemEval-BSemEval-C
文献[1-0.506-
文献[20.777--
文献[4--0.197
文献[7-0.433-
文献[120.7680.5240.273
BERT-DCN0.7500.5870.612

表4

消融实验结果"

模型F1
BERT-DCN0.627
-Bi-LSTM0.620
-TextCNN0.593
-DCN0.610

表5

CQA数据集中有缺陷的样例"

句子1句子2标签

Mixed marriages with the world turning into a small village; mixed marriages have become more and more common. If you are in a mixed marriage relationship and living in Doha; Could you provide details on how you and your significant other met? where? how long have you been married for? and what are the goods and bads in a mixed marriage? Thank you

(异族婚姻将世界变成一个小村庄;异族通婚变得越来越普遍。如果您处于异族婚姻关系并居住在多哈;您能否详细说明您和您的另一半是如何认识的?在哪里?你结婚多久了?异族婚姻有什么好处和坏处?谢谢)

Mix marriages do you think that 2 from 2 different cultures shall have a good marriage i mean will it last; (believes; customs; even language) not the same! i'm not talking about 2 from 2 different countries in Europe; or 2 in Asia; or 2 in Africa; ... i'm talking about the one's where completely no similarity?

(异族婚姻你认为来自2种不同文化的2人应该有一段美好的婚姻吗?我的意思是这会持续下去吗?(相信;习俗;甚至语言)不一样!我不是在谈论来自欧洲2个不同国家的2个;或2个在亚洲;或2个在非洲;...我说的是完全没有相似之处的那个?)

1

表6

清洗后的数据"

数据集(修改量/样本总量)CQA
SemEval-ASemEval-BSemEval-C
训练集382/20 340339/3 169630/31 690
验证集129/3 72032/700180/7 000
测试集91/2 93024/88055/8 800

表7

修改样本后实验结果对比"

数据集F1
SemEval-ASemEval-BSemEval-C
初始数据0.7500.5870.612
修改后0.7680.6070.622

表8

Chinese-SNLI与Quora数据集参数"

数据集Chinese-SNLILCQMC
训练集545 859238 766
验证集9 3148 802
测试集9 17612 500

表9

Chinese-SNLI数据集实验结果"

模型Acc/%
Embed+add-attention*75.1
BiLSTM+self-attention*81.0
DiSAN*81.5
BERT*87.0
BERT-DCN87.2

表10

LCQMC数据集实验结果"

模型Acc/%
CNN*72.8
CBOW*73.7
BiMPM*83.4
BERT*87.4
BERT-DCN87.7
1 Filice Simone 1,Giovanni Da San Martinoao,Alessandro Moschitti,et al. SemEval-2017 task 3-learning pairwise patterns in community question answering[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, 2017: 326-333.
2 Wu Guo-shun, Sheng Yi-xuan, Lan Man, et al. Using traditional and deep learning methods to address community question answering task[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, 2017: 365-369.
3 Feng Wen-zheng, Wu Yu, Wu Wei, et al. Ranking system with neural matching features for community question answering[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, 2017: 280-286.
4 Koreeda Yuta, Hashito Takuya, Niwa Yoshiki, et al. Combination of neural similarity features and comment plausibility features[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, 2017: 353-359.
5 Wang Zhi-guo, Wael Hamza, Radu Florian. Bilateral multi-perspective matching for natural language sentences[C]∥Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 2017: 4144-4150.
6 Tan Chuan-qi, Wei Fu-ru, Wang Wen-hui, et al. Multiway attention networks for modeling sentence pairs[C]∥Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 4411-4417.
7 Jan Milan Deriu, Cieliebak Mark. Attention-based convolutional neural network for community question answering[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, 2017: 334-338.
8 Jacob Devlin, Chang Ming-wei, Lee Kenton, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, USA, 2019: 4171-4186.
9 陈源,丘心颖. 结合自监督学习的多任务文本语义匹配方法[J]. 北京大学学报: 自然科学版, 2022,58(1): 83-90.
Chen Yuan, Qiu Xin-ying. Multi-task semantic matching with self-supervised learning[J]. Acta Scientiarum Naturalium Universitatis Pekinensis,2022,58(1): 83-90.
10 Reimers Nils, Gurevych Iryna. Sentence BERT: Sentence embeddings using siamese BERTnetworks[C]∥Proceedings of the 3rd Workshop on Neural Generation and Translation, Hong Kong, China, 2019: 3982-3992.
11 Li Bo-han, Zhou Hao, He Jun-xian, et al. On the sentence embeddings from pre-trained language models[C]∥The Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2020: 9119-9130.
12 Peinelt N, Nguyen D, Liakata M. tBERT: topic models and BERT joining forces for semantic similarity detection[C]∥Proceedings of the 58rd Annual Meeting of the Association for Computational Linguistics, Tokyo, Japan, 2020: 7047-7055.
13 Chen Han-jie, Zheng Guang-tao, Ji Yang-feng. Generating hierarchical explanations on text classification via feature interaction detection[C]∥Proceedings of the 58rd Annual Meeting of the Association for Computational Linguistics, Tokyo, Japan,2020:5578-5593.
14 Tsang M, Cheng D, Liu H, et al. Feature interaction interpretability: a case for explaining ad-recommendation systems via neural interaction detection[J/OL]. [2020-12-11]. .
15 Gao J, He D, Tan X, et al. Representation degeneration problem in training natural language generation models[J/OL].[2019-12-10]. .
16 Ethayarajh K. How contextual are contextualized word representations?[C]∥Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 2019, 55-65.
17 Yan Yuan-meng, Li Ru-mei, Wang Si-rui, et al. ConSERT: a contrastive framework for self-supervised sentence representation transfer[C]∥Association for Computational Linguistics and International Joint Conference on Natural Language Processing,Online, 2021: 5065–5075.
18 Schick T, Schütze H. Generating datasets with pretrained language models[J/OL][2021-10-21]. .
19 Chen Han-jie, Song Feng, Ganhotra Jatin, et al. Explaining neural network predictions on sentence pairs via learning word-group masks[C]∥Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, Mexico City, Mexico, 2021: 3917-3930.
20 Balazs Jorge A, Matsuo Y. Gating mechanisms for combining character and word-level word representations: an empirical study[C]∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, USA, 2019: 110-124.
21 Choi H, Kim J, Joe S, et al. Evaluation of BERT and albert sentence embedding performance on downstream NLP tasks[C]∥The 25th International Conference on Pattern Recognition,Online, 2021: 5482-5487.
[1] 霍光,林大为,刘元宁,朱晓冬,袁梦,盖迪. 基于多尺度特征和注意力机制的轻量级虹膜分割模型[J]. 吉林大学学报(工学版), 2023, 53(9): 2591-2600.
[2] 金小俊,孙艳霞,于佳琳,陈勇. 基于深度学习与图像处理的蔬菜苗期杂草识别方法[J]. 吉林大学学报(工学版), 2023, 53(8): 2421-2429.
[3] 耿庆田,刘植,李清亮,于繁华,李晓宁. 基于一种深度学习模型的土壤湿度预测[J]. 吉林大学学报(工学版), 2023, 53(8): 2430-2436.
[4] 巫威眺,曾坤,周伟,李鹏,靳文舟. 基于多源数据和响应面优化的公交客流预测深度学习方法[J]. 吉林大学学报(工学版), 2023, 53(7): 2001-2015.
[5] 张振海,季坤,党建武. 基于桥梁裂缝识别模型的桥梁裂缝病害识别方法[J]. 吉林大学学报(工学版), 2023, 53(5): 1418-1426.
[6] 吴飞,农皓业,马晨浩. 基于粒子群优化算法长短时记忆模型的刀具磨损预测方法[J]. 吉林大学学报(工学版), 2023, 53(4): 989-997.
[7] 何科,丁海涛,赖宣淇,许男,郭孔辉. 基于Transformer的轮式里程计误差预测模型[J]. 吉林大学学报(工学版), 2023, 53(3): 653-662.
[8] 刘春晖,王思长,郑策,陈秀连,郝春蕾. 基于深度学习的室内导航机器人避障规划算法[J]. 吉林大学学报(工学版), 2023, 53(12): 3558-3564.
[9] 吕晓琪,李浩,谷宇. 基于深度学习算法的人脸图像活体特征变换尺度提取[J]. 吉林大学学报(工学版), 2023, 53(11): 3201-3206.
[10] 王军,王华琳,黄博文,付强,刘俊. 基于联邦学习和自注意力的工业物联网入侵检测[J]. 吉林大学学报(工学版), 2023, 53(11): 3229-3237.
[11] 孙舒杨,程玮斌,张浩桢,邓向萍,齐红. 基于深度学习的两阶段实时显式拓扑优化方法[J]. 吉林大学学报(工学版), 2023, 53(10): 2942-2951.
[12] 曲优,李文辉. 基于多任务联合学习的多目标跟踪方法[J]. 吉林大学学报(工学版), 2023, 53(10): 2932-2941.
[13] 高金武,贾志桓,王向阳,邢浩. 基于PSO-LSTM的质子交换膜燃料电池退化趋势预测[J]. 吉林大学学报(工学版), 2022, 52(9): 2192-2202.
[14] 申铉京,张雪峰,王玉,金玉波. 像素级卷积神经网络多聚焦图像融合算法[J]. 吉林大学学报(工学版), 2022, 52(8): 1857-1864.
[15] 李晓英,杨名,全睿,谭保华. 基于深度学习的不均衡文本分类方法[J]. 吉林大学学报(工学版), 2022, 52(8): 1889-1895.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!