吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (6): 1629-1636.

• • 上一篇    下一篇

基于GraphRAG的大数据知识学习系统

王晓燕, 黄岚, 王岩   

  1. 吉林大学 计算机科学与技术学院, 长春 130012; 吉林省大数据智能计算重点实验室, 长春 130012
  • 收稿日期:2025-01-03 出版日期:2025-11-26 发布日期:2025-11-26
  • 通讯作者: 王晓燕 E-mail:wangxy@jlu.edu.cn

Big Data Knowledge Learning System Based on GraphRAG

WANG Xiaoyan, HUANG Lan, WANG Yan   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;Jilin Provincial Key Laboratory of Big Data Intelligent Computing, Changchun 130012, China
  • Received:2025-01-03 Online:2025-11-26 Published:2025-11-26

摘要: 针对大数据教学资源爆炸导致的信息过载与传统检索增强生成(RAG)在多源信息融合时准确性不足的问题, 提出一种基于GraphRAG的大数据知识学习方法. 首先, 设计中文提示模板, 驱动GraphRAG自动抽取课程实体和关系, 构建初始知识图谱并持久化至Neo4j图数据库; 其次, 通过实体对齐和关系补全, 将人工整理的知识点与自动构建的图谱相融合, 形成统一、 可演化的知识图谱库; 最后, 利用GraphRAG预生成的社区摘要实现全局语义搜索, 同时依托Neo4j图数据库完成面向知识点的局部精准检索. 实验结果表明, 该方法在问答准确率、 响应相关性和多源信息整合流畅度上均显著优于传统RAG.

关键词: 大语言模型, 检索增强生成, 图检索增强生成, 知识图谱

Abstract: Aiming at the problem of  the information overload caused by the explosion of big data teaching resources and the insufficient accuracy of traditional retrieval-augmented generation (RAG) in multi-source information fusion, we proposed a big data knowledge learning method based on GraphRAG. Firstly, we designed a Chinese prompt template to drive GraphRAG to automatically extract course entities and relationships, constructed an initial knowledge graph, and persisted it to Neo4j graph database. Secondly, through entity alignment and relationship completion, manually organized knowledge points were integrated with the automatically constructed graph to form a unified and evolving knowledge graph database. Finally, the community summaries pre generated by GraphRAG were utilized to achieve global semantic search, while relying on the Neo4j graph database to achieve precise local retrieval of knowledge points. The experimental results show that the proposed method is significantly better than traditional RAG in terms of question answering accuracy, response correlation, and smoothness of multi\|source information integration.

Key words: large language model, retrieval-augmented generation, graph retrieval-augmented generation, knowledge graph

中图分类号: 

  • TP391