基于增强正例与层间负例的语义相似性模型

doi:10.13229/j.cnki.jdxbgxb.20240237

摘要/Abstract

摘要：

在基于对比学习的语义相似性模型中，不同正例句子间的信息交互不充分及传统的负样本采集策略中困难负例稀缺，导致模型难以捕捉句子间的细微特征差异，进而无法准确捕捉文本间的相似性。本文提出了一种基于增强正例与层间负例的语义相似性方法，通过设计动态邻域机制融合不同正例的信息，并提出了困难负例生成方法，显著提高了语义相似判断的相关性。首先，从动态邻域中检索与正例语义特征相似的句子嵌入，将其与正例拼接，并通过自注意力聚合得到增强正例，从而融合不同正例的信息；其次，提出了困难负例生成方法，将模型中间层的句子表示作为原始正例的困难负例，并引入交叉熵损失进行惩罚，以此改进负例采样策略。实验结果表明：在语义相似性任务数据集STS2012~STS2016、STS-B、SICK-R上，本文方法效果显著，Spearman相关系数较先进模型在BERT-base、BERT-large的基础上分别平均提升1.09和0.34个百分点。

关键词: 深度学习, 对比学习, 语义相似性, BERT

Abstract:

In the semantic similarity model based on contrastive learning， insufficient information exchange between sentences with different positive examples and the scarcity of hard negative examples in traditional negative sample collection strategies make it difficult to capture the subtle feature differences between sentences， thereby inability to accurately capture the similarity between texts. This article proposes a semantic similarity method based on augmented positives and interlayer negative. By designing a dynamic neighborhood mechanism to fuse information between different positive examples and proposing a method for generating difficult negative examples， the correlation of semantic similarity judgment is significantly improved. Firstly， retrieve sentence embeddings with similar semantic features to positive examples from dynamic neighborhoods， concatenate them with positive examples， and then obtain augmented positive examples through self attention aggregation， thereby fusing information from different positive examples. Secondly， a method for generating difficult negative examples is proposed， which takes the sentence representation in the middle layer of the model as the original positive example of difficult negative examples， and intraduces cross entropy loss as punishment to improve the negative example sampling strategy. The experimental results show that in the semantic similarity task dataset STS2012~STS2016， STS-B， SICK-R， the method proposed in this paper has a significant effect， with Spearman correlation coefficients increasing by an average of 1.09 and 0.34 percentage points compared to advanced models on the basis of BERT-base and BERT-large， respectively.

Key words: deep learning, contrastive learning, semantic similarity, BERT

中图分类号:

TP391.1

蔡晓东,黄业洋,董丽芳. 基于增强正例与层间负例的语义相似性模型[J]. 吉林大学学报(工学版), 2025, 55(11): 3705-3714.

Xiao-Dong CAI,Ye-yang HUANG,Li-fang DONG. Semantic similarity model based on augmented positives and interlayer negatives[J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3705-3714.

图/表 11

图1

表1

表2

表3

表4

图2

图3

表5

图4

表6

表7

MRPC和SUBJ数据集实验结果 (%)"

模型	MRPC	SUBJ
$B E R T$	71.13	94.21
$S i m C S E$	74.43	94.45
$A P I N C S E$ _bert-base	74.8	94.81

表7

参考文献 19

[1]	Cer D, Diab M, Agirre E, et al. SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg, PA: ACL, 2017: 1-14.
[2]	Radford A, Narasimhar K. Improving language understanding by GenerativePre-Training[EB/OL].(2018-06-11)[2023-12-11]..
[3]	Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] ∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4171-4186.
[4]	Liu Y H, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretrainingapproach[EB/OL]. (2019-07-26)[2023-12-11]..
[5]	Yang Z L, Dai Z H, Yang Y M, et al. XLNet: generalized autogressive pretraining for language understanding[C] ∥Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2019: 5753-5763.
[6]	Li B H, Zhou H, et al. On the sentenceembeddings from pre-trained language models[C]∥Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020:9119-9130.
[7]	Reimers N, Gureuych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks[C]∥Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2019: 3982-3992.
[8]	Gao T Y, Yao X C, Chen D Q. SimCSE: simple contrastive learning of sentence embeddings[C]∥Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 6894-6910.
[9]	Wu X, Gao C C, Zang L J, et al. ESimCSE: enhanced sample building method for contrastive learning of unsupervised sentence embedding[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York: ACM Press,2022:3898-3907.
[10]	Zhang Y H, Zhu H J, Wang Y L, et al. A contrastive framework for learning sentence representations from pairwise and triple-wise perspective in angular space[C]∥Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: CAL, 2022: 4892–4903.
[11]	Chuang Y S, Dangovski R, Luo H Y, et al. DiffCSE: difference-based contrastive learning for sentence embeddings[C]∥Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2023: 4207-4218.
[12]	Liu J D, Liu J H, Wang Q F, et al. RankCSE: unsupervised sentence representations learning via learning to rank[C]∥Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2023: 13785-13802.
[13]	Wang H, Li Y G, Huang Z, et al. SNCSE: contrastive learning for unsupervised sentence embedding with soft negative samples[C]∥International Conference on Intelligent Computing. New York, USA: ICIC, 2023: 419-431.
[14]	He H L, Zhang J L, Lan Z Z, et al.Instance smoothed contrastive learning for unsupervised sentence embedding[C]∥Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. Washington, DC: AAAI Press, 2023: 12863-12871.
[15]	Robinson J, Chuang C Y, Sra S, et al. Contrastive learning with hard negative samples[C]∥9th International Conference on Learning Representations. Virtual, 2021: joshr17.
[16]	Wu X, Gao C C, Su Y P, et al.Smoothed contrastive learning for unsupervised sentence embedding[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York, USA: ICCL, 2022: 4902-4906.
[17]	Kim T, Yoo K M, Lee S G. Self-guided contrastive learning for BERT sentence representations[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 2528-2540.
[18]	Oh D, Kim Y J, Lee H D, et al.Don't judge a language model by its last layer: contrastive learning with layer-wise attention pooling[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York, USA: ICCL, 2022: 4585-4592.
[19]	Deng J H, Wan F Q, Yang T, et al. Clustering-aware negative sampling for unsupervised sentence representation[C] ∥Findings of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2023: 8713-8729.

相关文章 15

[1]	姚宗伟,陈辰,高振云,靳鸿鹏,荣浩,李学飞,黄虹溥,毕秋实. 基于合成图像数据集的挖掘机关键点识别[J]. 吉林大学学报(工学版), 2026, 56(1): 76-85.
[2]	王琳虹,刘宇阳,刘子昱,鹿应佳,张宇恒,黄桂树. 基于YOLOv5的轻量化桥梁缺陷识别[J]. 吉林大学学报(工学版), 2025, 55(9): 2958-2968.
[3]	廉敬,张继保,刘冀钊,张家骏,董子龙. 基于文本引导的人脸图像修复[J]. 吉林大学学报(工学版), 2025, 55(8): 2732-2740.
[4]	刘元宁,王星喆,黄子彧,张家晨,刘震. 基于多模态数据融合的胃癌患者生存预测模型[J]. 吉林大学学报(工学版), 2025, 55(8): 2693-2702.
[5]	李文辉,杨晨. 基于对比学习文本感知的小样本遥感图像分类[J]. 吉林大学学报(工学版), 2025, 55(7): 2393-2401.
[6]	袁靖舒,李武,赵兴雨,袁满. 基于BERTGAT-Contrastive的语义匹配模型[J]. 吉林大学学报(工学版), 2025, 55(7): 2383-2392.
[7]	徐慧智,郝东升,徐小婷,蒋时森. 基于深度学习的高速公路小目标检测算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2003-2014.
[8]	张汝波,常世淇,张天一. 基于深度学习的图像信息隐藏方法综述[J]. 吉林大学学报(工学版), 2025, 55(5): 1497-1515.
[9]	李健,刘欢,李艳秋,王海瑞,关路,廖昌义. 基于THGS算法优化ResNet-18模型的图像识别[J]. 吉林大学学报(工学版), 2025, 55(5): 1629-1637.
[10]	文斌,丁弈夫,杨超,沈艳军,李辉. 基于自选择架构网络的交通标志分类算法[J]. 吉林大学学报(工学版), 2025, 55(5): 1705-1713.
[11]	李振江,万利,周世睿,陶楚青,魏巍. 基于时空Transformer网络的隧道交通运行风险动态辨识方法[J]. 吉林大学学报(工学版), 2025, 55(4): 1336-1345.
[12]	赵孟雪,车翔玖,徐欢,刘全乐. 基于先验知识优化的医学图像候选区域生成方法[J]. 吉林大学学报(工学版), 2025, 55(2): 722-730.
[13]	金虎,申玉生,方勇,于丽,周佳媚. 基于深度学习SSD算法的公路隧道衬砌细小裂缝识别[J]. 吉林大学学报(工学版), 2025, 55(11): 3653-3659.
[14]	姜来为,王策,杨宏宇. 基于深度学习的多目标跟踪研究进展综述[J]. 吉林大学学报(工学版), 2025, 55(11): 3429-3445.
[15]	刘元宁,臧子楠,张浩,刘震. 基于深度学习的核糖核酸二级结构预测方法[J]. 吉林大学学报(工学版), 2025, 55(1): 297-306.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

句子A	句子B	分数
A girl in white is dancing.	A girl is wearing white clothes and is dancing.	4.9
A woman is riding a horse.	A man is opening a small package that contains headphones.	1.0
Three boys in karate costumes aren't fighting.	Three boys in karate costumes are fighting.	3.3

网络架构	模型	STS2012	STS2013	STS2014	STS2015	STS2016	STS-B	SICK-R	平均值
BERT-base	SG-OPT	66.84	80.13	71.23	81.56	77.17	77.23	68.16	74.62
	SimCSE	68.40	82.41	74.38	80.91	78.56	76.85	72.23	76.25
	LAPCSE	70.27	80.22	75.65	80.71	79.74	79.51	72.18	76.90
	ClusterNS	69.93	83.57	76.00	82.44	80.01	78.85	72.03	77.55
	APINCSE	71.43±0.14	84.22±0.11	76.47±0.21	83.90±0.13	79.42±0.31	80.87±0.18	74.19±0.19	78.64±0.09
BERT-large	SG-OPT	67.02	79.42	70.38	81.72	76.35	76.16	70.2	74.46
	SimCSE	70.88	84.16	76.43	84.50	79.76	79.26	73.88	78.41
	LAPCSE	69.26	83.34	74.64	83.56	78.45	78.36	74.45	77.44
	ClusterNS	71.64	85.97	77.74	83.48	79.68	80.80	75.02	79.19
	APINCSE	73.83±0.12	85.28±0.23	77.46±0.18	85.65±0.32	79.85±0.30	81.12±0.21	73.55±0.17	79.53±0.28
RoBERTa-base	SG-OPT	62.57	78.96	69.24	79.99	77.17	77.60	68.42	73.42
	SimCSE	70.16	81.77	73.24	81.36	80.65	80.22	68.56	76.57
	LAPCSE	68.96	78.83	75.37	81.05	81.53	80.99	69.03	76.54
	ClusterNS	71.17	83.53	75.29	82.47	82.25	81.95	69.22	77.98
	APINCSE	72.16±0.32	82.76±0.13	75.56±0.15	83.45±0.20	81.28±0.26	81.41±0.23	70.05±0.41	77.95±0.17
RoBERTa-large	SG-OPT	64.29	76.36	68.48	80.10	76.60	78.14	67.97	73.13
	SimCSE	72.86	83.99	75.62	84.77	81.80	81.98	71.26	78.90
	LAPCSE	71.52	79.86	76.86	83.50	82.38	84.56	71.46	78.59
	ClusterNS	-	-	-	-	-	-	-	-
	APINCSE	71.95±0.29	83.85±0.18	75.85±0.05	84.98±0.14	82.21±0.28	82.58±0.11	71.43±0.10	78.98±0.20

N	Spearman相关系数
N	STS2012	STS2013	STS2014	STS2015	STS2016	STS-B	SICK-R	平均值
4	69.41	83.1	75.61	83.26	79.61	78.97	70.92	77.27
8	71.41	83.83	76.16	83.31	78.56	80.37	72.52	78.02
12	67.15	80.25	72.91	80.41	76.7	77.01	70.7	75.02
16	71.43	84.22	76.47	83.9	79.42	80.87	74.19	78.64
20	70.55	82.06	74.97	83.36	78.64	79.13	73	77.39
32	70.11	82.99	76.04	82.53	79.61	80.10	71.98	77.61

M	STS 2012	STS 2013	STS 2014	STS 2015	STS 2016	STS -B	SICK -R	Avg. STS
M	Spearman相关系数
1	67.91	81.6	72.97	80.62	75.72	75.47	71.16	75.06
2	70.72	82.03	75.37	82.26	79.06	79.33	68.62	76.77
3	72.06	83.21	76.07	82.00	78.96	79.76	71.33	77.63
4	70.49	82.52	75.23	81.88	79.05	79.65	67.58	76.63
5	70.39	82.15	74.12	82.4	78.00	78.23	71.12	76.63
6	69.77	81.03	74.96	81.98	79.26	78.27	71.25	76.65
7	70.43	83.03	75.27	82.08	78.71	78.57	71.04	77.02
8	70.87	82.27	75.20	82.31	77.71	78.07	71.23	76.81
9	70.94	81.96	75.36	81.98	79.32	79.69	69.11	76.91
10	69.8	80.69	74.71	83.53	76.95	78.23	71.82	76.53
11	71.43	84.22	76.47	83.90	79.42	80.87	74.19	78.64

数据集	MRPC	SUBJ
正样本	4 076	5 000
负样本	1 725	5 000