Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (11): 3705-3714.doi: 10.13229/j.cnki.jdxbgxb.20240237

Previous Articles    

Semantic similarity model based on augmented positives and interlayer negatives

Xiao-Dong CAI(),Ye-yang HUANG,Li-fang DONG   

  1. School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China
  • Received:2024-03-09 Online:2025-11-01 Published:2026-02-03

Abstract:

In the semantic similarity model based on contrastive learning, insufficient information exchange between sentences with different positive examples and the scarcity of hard negative examples in traditional negative sample collection strategies make it difficult to capture the subtle feature differences between sentences, thereby inability to accurately capture the similarity between texts. This article proposes a semantic similarity method based on augmented positives and interlayer negative. By designing a dynamic neighborhood mechanism to fuse information between different positive examples and proposing a method for generating difficult negative examples, the correlation of semantic similarity judgment is significantly improved. Firstly, retrieve sentence embeddings with similar semantic features to positive examples from dynamic neighborhoods, concatenate them with positive examples, and then obtain augmented positive examples through self attention aggregation, thereby fusing information from different positive examples. Secondly, a method for generating difficult negative examples is proposed, which takes the sentence representation in the middle layer of the model as the original positive example of difficult negative examples, and intraduces cross entropy loss as punishment to improve the negative example sampling strategy. The experimental results show that in the semantic similarity task dataset STS2012~STS2016, STS-B, SICK-R, the method proposed in this paper has a significant effect, with Spearman correlation coefficients increasing by an average of 1.09 and 0.34 percentage points compared to advanced models on the basis of BERT-base and BERT-large, respectively.

Key words: deep learning, contrastive learning, semantic similarity, BERT

CLC Number: 

  • TP391.1

Fig.1

Overall framework of the APINCSE model"

Table 1

Statistics of the relevant datasets"

数据集

STS

2012

STS

2013

STS

2014

STS

2015

STS

2016

STS

-B

SICK

-R

句子对数量3 1082 2503 7507 5009 1838 6289 927

Table 2

Dataset examples"

句子A句子B分数
A girl in white is dancing.A girl is wearing white clothes and is dancing.4.9
A woman is riding a horse.A man is opening a small package that contains headphones.1.0
Three boys in karate costumes aren't fighting.Three boys in karate costumes are fighting.3.3

Table 3

Comparison of experimental result of various models"

网络架构模型STS2012STS2013STS2014STS2015STS2016STS-BSICK-R平均值
BERT-baseSG-OPT66.8480.1371.2381.5677.1777.2368.1674.62
SimCSE68.4082.4174.3880.9178.5676.8572.2376.25
LAPCSE70.2780.2275.6580.7179.7479.5172.1876.90
ClusterNS69.9383.5776.0082.4480.0178.8572.0377.55
APINCSE71.43±0.1484.22±0.1176.47±0.2183.90±0.1379.42±0.3180.87±0.1874.19±0.1978.64±0.09
BERT-largeSG-OPT67.0279.4270.3881.7276.3576.1670.274.46
SimCSE70.8884.1676.4384.5079.7679.2673.8878.41
LAPCSE69.2683.3474.6483.5678.4578.3674.4577.44
ClusterNS71.6485.9777.7483.4879.6880.8075.0279.19
APINCSE73.83±0.1285.28±0.2377.46±0.1885.65±0.3279.85±0.3081.12±0.2173.55±0.1779.53±0.28
RoBERTa-baseSG-OPT62.5778.9669.2479.9977.1777.6068.4273.42
SimCSE70.1681.7773.2481.3680.6580.2268.5676.57
LAPCSE68.9678.8375.3781.0581.5380.9969.0376.54
ClusterNS71.1783.5375.2982.4782.2581.9569.2277.98
APINCSE72.16±0.3282.76±0.1375.56±0.1583.45±0.2081.28±0.2681.41±0.2370.05±0.4177.95±0.17
RoBERTa-largeSG-OPT64.2976.3668.4880.1076.6078.1467.9773.13
SimCSE72.8683.9975.6284.7781.8081.9871.2678.90
LAPCSE71.5279.8676.8683.5082.3884.5671.4678.59
ClusterNS--------
APINCSE71.95±0.2983.85±0.1875.85±0.0584.98±0.1482.21±0.2882.58±0.1171.43±0.1078.98±0.20

Table 4

Effect of N value on model performance"

NSpearman相关系数
STS2012STS2013STS2014STS2015STS2016STS-BSICK-R平均值
469.4183.175.6183.2679.6178.9770.9277.27
871.4183.8376.1683.3178.5680.3772.5278.02
1267.1580.2572.9180.4176.777.0170.775.02
1671.4384.2276.4783.979.4280.8774.1978.64
2070.5582.0674.9783.3678.6479.137377.39
3270.1182.9976.0482.5379.6180.1071.9877.61

Fig.2

Effectiveness analysis of key components of the APINCSE model"

Fig.3

Similarity of sentence representation between different layers"

Table 5

Effect of M value on model performance"

M

STS

2012

STS

2013

STS

2014

STS

2015

STS

2016

STS

-B

SICK

-R

Avg.

STS

Spearman相关系数
167.9181.672.9780.6275.7275.4771.1675.06
270.7282.0375.3782.2679.0679.3368.6276.77
372.0683.2176.0782.0078.9679.7671.3377.63
470.4982.5275.2381.8879.0579.6567.5876.63
570.3982.1574.1282.478.0078.2371.1276.63
669.7781.0374.9681.9879.2678.2771.2576.65
770.4383.0375.2782.0878.7178.5771.0477.02
870.8782.2775.2082.3177.7178.0771.2376.81
970.9481.9675.3681.9879.3279.6969.1176.91
1069.880.6974.7183.5376.9578.2371.8276.53
1171.4384.2276.4783.9079.4280.8774.1978.64

Fig.4

Effect of L value on model performance"

Table 6

MRPC and SUBJ datasets parameters"

数据集MRPCSUBJ
正样本4 0765 000
负样本1 7255 000

Table 7

Experimental results on MRPC and SUBJ datasets"

模型MRPCSUBJ
BERT71.1394.21
SimCSE74.4394.45
APINCSEbert-base74.894.81
[1] Cer D, Diab M, Agirre E, et al. SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation[C]∥Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg, PA: ACL, 2017: 1-14.
[2] Radford A, Narasimhar K. Improving language understanding by GenerativePre-Training[EB/OL].(2018-06-11)[2023-12-11]..
[3] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] ∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4171-4186.
[4] Liu Y H, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretrainingapproach[EB/OL]. (2019-07-26)[2023-12-11]..
[5] Yang Z L, Dai Z H, Yang Y M, et al. XLNet: generalized autogressive pretraining for language understanding[C] ∥Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2019: 5753-5763.
[6] Li B H, Zhou H, et al. On the sentenceembeddings from pre-trained language models[C]∥Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020:9119-9130.
[7] Reimers N, Gureuych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks[C]∥Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2019: 3982-3992.
[8] Gao T Y, Yao X C, Chen D Q. SimCSE: simple contrastive learning of sentence embeddings[C]∥Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 6894-6910.
[9] Wu X, Gao C C, Zang L J, et al. ESimCSE: enhanced sample building method for contrastive learning of unsupervised sentence embedding[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York: ACM Press,2022:3898-3907.
[10] Zhang Y H, Zhu H J, Wang Y L, et al. A contrastive framework for learning sentence representations from pairwise and triple-wise perspective in angular space[C]∥Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: CAL, 2022: 4892–4903.
[11] Chuang Y S, Dangovski R, Luo H Y, et al. DiffCSE: difference-based contrastive learning for sentence embeddings[C]∥Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2023: 4207-4218.
[12] Liu J D, Liu J H, Wang Q F, et al. RankCSE: unsupervised sentence representations learning via learning to rank[C]∥Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2023: 13785-13802.
[13] Wang H, Li Y G, Huang Z, et al. SNCSE: contrastive learning for unsupervised sentence embedding with soft negative samples[C]∥International Conference on Intelligent Computing. New York, USA: ICIC, 2023: 419-431.
[14] He H L, Zhang J L, Lan Z Z, et al.Instance smoothed contrastive learning for unsupervised sentence embedding[C]∥Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. Washington, DC: AAAI Press, 2023: 12863-12871.
[15] Robinson J, Chuang C Y, Sra S, et al. Contrastive learning with hard negative samples[C]∥9th International Conference on Learning Representations. Virtual, 2021: joshr17.
[16] Wu X, Gao C C, Su Y P, et al.Smoothed contrastive learning for unsupervised sentence embedding[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York, USA: ICCL, 2022: 4902-4906.
[17] Kim T, Yoo K M, Lee S G. Self-guided contrastive learning for BERT sentence representations[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 2528-2540.
[18] Oh D, Kim Y J, Lee H D, et al.Don't judge a language model by its last layer: contrastive learning with layer-wise attention pooling[C]∥Proceedings of the 29th International Conference on Computational Linguistics. New York, USA: ICCL, 2022: 4585-4592.
[19] Deng J H, Wan F Q, Yang T, et al. Clustering-aware negative sampling for unsupervised sentence representation[C] ∥Findings of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2023: 8713-8729.
[1] Xiu-hui WANG,Yong-bo XU. Chinese named entity recognition algorithm with soft attention mask embedding [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 231-238.
[2] Hong-bin WANG,Hao-dong TANG,Yan-tuan XIAN,Bo LIU,Xin-liang GU. Knowledge graph alignment based on entity reliable path and semantic aggregates [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3673-3685.
[3] Jing-shu YUAN,Wu LI,Xing-yu ZHAO,Man YUAN. Semantic matching model based on BERTGAT-Contrastive [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2383-2392.
[4] You-wei WANG,Ao LIU,Li-zhou FENG. New method for text sentiment classification based on knowledge distillation and comment time [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1664-1674.
[5] Jun-jie LIU,Jia-yi Dong,Yong YANG,Dan LIU,Fu-heng QU,Yan-chang LYU. Analysis of factors associated with online learning performance of students based on HM-OLS stepwise regression model [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(12): 3755-3762.
[6] Liu LIU,Kun DING,Shan-shan LIU,Ming LIU. Event detection method as machine reading comprehension [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 533-539.
[7] Yue-lin CHEN,Zhu-cheng GAO,Xiao-dong CAI. Long text semantic matching model based on BERT and dense composite network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(1): 232-239.
[8] Yue-kun MA,Yi-feng HAO. Fast recognition method of short text named entities considering feature sparsity [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3529-3535.
[9] Xiang-jiu CHE,Huan XU,Ming-yang PAN,Quan-le LIU. Two-stage learning algorithm for biomedical named entity recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2380-2387.
[10] Ya-hui ZHAO,Fei-yu LI,Rong-yi CUI,Guo-zhe JIN,Zhen-guo ZHANG,De LI,Xiao-feng JIN. Korean⁃Chinese translation quality estimation based on cross⁃lingual pretraining model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2371-2379.
[11] Jian LI,Qi XIONG,Ya-ting HU,Kong-yu LIU. Chinese named entity recognition method based on Transformer and hidden Markov model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1427-1434.
[12] Tian BAI,Ming-wei XU,Si-ming LIU,Ji-an ZHANG,Zhe WANG. Dispute focus identification of pleading text based on deep neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1872-1880.
[13] Wen-jun WANG,Yin-feng YU. Automatic completion algorithm for missing links in nowledge graph considering data sparsity [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1428-1433.
[14] Dong-ming SUN,Liang HU,Yong-heng XING,Feng WANG. Text fusion based internet of things service recommendation for trigger⁃action programming pattern [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2182-2189.
[15] Ya-hui ZHAO,Fei-yang YANG,Zhen-guo ZHANG,Rong-yi CUI. Korean text structure discovery based on reinforcement learning and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1387-1395.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!