吉林大学学报(理学版) ›› 2023, Vol. 61 ›› Issue (5): 1103-1111.

• • 上一篇    下一篇

基于对比学习思想的多跳问题生成

王红斌1,2,3, 杨何祯旻1,2,3, 王灿宇4   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650500; 2. 昆明理工大学 云南省人工智能重点实验室, 昆明 650500; 
    3. 昆明理工大学 云南省计算机技术应用重点实验室, 昆明 650500; 4. 云南农业大学 大数据学院, 昆明 650201
  • 收稿日期:2022-10-24 出版日期:2023-09-26 发布日期:2023-09-26
  • 通讯作者: 王灿宇 E-mail:736559039@qq.com

Multi-hop Question Generation Based on Contrastive Learning Ideas

WANG Hongbin1,2,3, YANG Hezhenmin1,2,3, WANG Canyu4   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; 
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China; 
    3. Yunnan Key Laboratory of Computer Technology Application, Kunming University of Science and Technology, Kunming 650500, China;
    4. Faculty of Big Data, Yunnan Agricultural University, Kunming 650201, China
  • Received:2022-10-24 Online:2023-09-26 Published:2023-09-26

摘要: 针对获取大规模的多跳问答训练数据集耗时耗力的问题, 提出一个基于对比学习思想的多跳问题生成模型. 模型分为生成阶段和对比学习打分阶段, 生成阶段通过执行推理图生成候选多跳问题, 对比学习打分阶段通过一个基于对比学习思想的无参考问题的候选问题打分模型对候选问题进行打分排序, 并选择最优的候选问题. 该模型在一定程度上缩小了无监督方法与人工标注方法的差距, 有效缓解了缺少多跳问答数据集的问题. 在数据集HotpotQA上的实验结果表明, 基于对比学习的多跳问题生成模型能有效扩充训练数据, 极大减少了人工标注数据的成本.

关键词: 多跳问题生成, 机器阅读理解, 对比学习

Abstract: Aiming at the time-consuming and labor-intensive problem of obtaining large-scale multi-hop question and answer training dataset, we  proposed a multi-hop question generation model based on the contrastive learning idea. The model was divided into the generation phase and the contrastive learning scoring phase. In the generation phase, candidate multi-hop questions were generated by executing the inference graph. In the contrastive  learning scoring phase, candidate questions were scored and sorted through a candidate question scoring model without reference question based on the contrastive learning idea, and the best candidate question was selected. This model had to some extent narrowed the gap between unsupervised methods and manual annotation methods, effectively alleviating the problem of lacking a multi-hop question and answer dataset. The experimental results on HotpotQA dataset show that the multi-hop question generation model based on contrastive learning can effectively expand the training data and greatly reduce the cost of manually labeling data.

Key words: multi-hop question generation, machine reading comprehension, contrastive learning

中图分类号: 

  • TP391