吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (5): 1792-1797.doi: 10.13229/j.cnki.jdxbgxb20200484

• 计算机科学与技术 • 上一篇    

关联数据链接有效性评估的新方法

袁满(),江运龙,胡超   

  1. 东北石油大学 计算机与信息技术学院,黑龙江 大庆 163318
  • 收稿日期:2020-06-29 出版日期:2021-09-01 发布日期:2021-09-16
  • 作者简介:袁满(1965-),男,教授,博士生导师.研究方向:数据科学与知识工程,数据标准化与数据质量.E-mail:yuanman@nepu.edu.cn
  • 基金资助:
    黑龙江省哲学社会科学研究规划项目(19EDE334)

A new method for link validity assessment based on linked data

Man YUAN(),Yun-long JIANG,Chao HU   

  1. School of Computer and Information Technology,NorthEast Petroleum University,Daqing 163318,China
  • Received:2020-06-29 Online:2021-09-01 Published:2021-09-16

摘要:

为能高效、准确地对关联数据的链接有效性进行评估,本文对国内外有关链接有效性的评估方法和技术进行了研究,发现目前对链接有效性研究的成果有些文献只有简单提及,成果很少。因此,本文提出了用于统一资源标次符(URI)有效性评估的?算法,通过理论分析证明了该算法的有效性和高效性。最后,利用DBpedia发布的开放数据进行实验验证,实验结果表明,该方法评估链接有效性提高了0.1%,评估效率为常规方法的4倍,验证了本文算法的准确性与高效性。

关键词: 计算机软件, 关联数据, 数据质量评估, 链接有效性, 数据质量

Abstract:

in order to access the linked validity more efficiently and accurately. This paper studies the methods and techniques of link validity assessment at home and abroad, and finds that some of the results of link validity research are only briefly mentioned in some literatures. Therefore,in this paper, an algorithm?for URI validity assessment is proposed, and the validity and efficiency of the algorithm are proved by theoretical analysis.Finally, the open data published by DBpedia was used for experimental verification. Through comparison of experimental results, the assessment effectiveness of this method was improved by 0.1% and the evaluation efficiency was four times higher than that of the conventional method.

Key words: computer software, linked data, data quality assessment, link validity, data quality

中图分类号: 

  • TP391

表1

文中指代的符号及其含义"

符号含义
Count_null无效URI数量
Count_P评估的URI总数
N数据集中URI数量
α(Hi)URI网络协议有效性,Hi为某URI的协议部分
β(Ai)URI域名及端口有效性,Ai为某URI域名和端口
γ(Pi)URI中资源路径有效性,Pi为某URI中资源路径
?(U)该关联数据集URI有效性

图1

评估流程"

图2

评估结果"

表2

方法运行时间"

运行次数常规方法评估耗时/s?评估耗时/s
平均值35 514.68 894.1
135 788.89 005.0
235 945.58 938.1
1135 537.28 807.3
1235 573.78 702.6

表3

评估细致程度对比"

评估方法协议 验证服务器连通性链接 可达性复检支持 多任务
常规方法
?方法
1 Berners-Lee T.Linked data[EB/OL].(2018-06-05).[2018-06-05].
2 SweoIG/TaskForces/CommunityProjects/LinkingOpenData[EB/OL].[2018-06-06].
3 The linked open data cloud[EB/OL].[2020-04-01].
4 刘炜. 关联数据: 概念、技术及应用展望[J]. 大学图书馆学报, 2011, 29(2): 5-12.
Liu Wei. Overview on linkeddata :concept ,technology and implementation[J]. Journal of Academic Libraries, 2011,29(2): 5-12.
5 付瑶. 图书馆关联数据质量控制研究[D]. 长春: 东北师范大学信息科学与技术学院, 2013.
Fu Yao. The study of the quality control of the library linked data[D]. Changchun: College of Information Science and Technology, Northeast Normal University, 2013.
6 程录庆. 数据约束对数据质量的影响研究[J]. 长江大学学报: 自然科学版, 2011, 8(5): 100-102.
Cheng Lu-qing. Data constraints on the impact of data quality[J]. Journal of Yangtze University(Natural Science Edition), 2011, 8(5): 100-102.
7 Yolanda G, Donovan A. Towards content trust of web resources[J]. Journal of Web Semantics, 2007, 5(4): 227-239.
8 Christian B, Richard C. Quality-driven information filtering using the WIQA policy framework[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2009, 7(1): 1-10.
9 Christoph B, Naumann F, Abedjan Z, et al. Profiling linked open data with ProLOD[C]∥2010 IEEE 26th International Conference on Data Engineering Workshops, Long Beach, 2010: 11260520.
10 Flemming A. Quality characteristics of linked data publishing datasources[J]. Master's Thesis, Humboldt-Universität of Berlin, 2010.
11 Shekarpour S, Katebi S D. Modeling and evaluation of trust with an extension in semantic web[J]. Journal of Web Semantics, 2010, 8(1): 26-36.
12 Fürber C, Hepp M. SWIQA-a semantic web information quality assessment framework[J]. Computer Science, 2011, 76: 8935047.
13 Jacobi I, Kagal L, Khandelwal A. Rule-based trust assessment on the semantic web[C]∥Rule-Based Reasoning, Programming, and Applications—5th International Symposium, RuleML 2011, Spain, 2011: 0831442.
14 Christophe G, Groth P T, Stadler C, et al. Assessing linked data mappings using network measures[C]∥Proceedings of the 9th Extended Semantic Web Conference, Heraklion, 2012: 12126405.
15 Kontokostas D, Westphal P, Auer S, et al. Test-driven evaluation of linked data quality[C]∥International Conference on World Wide Web. ACM, Seoul, 2014: 747-757.
16 Ruckhaus E, Baldizán O, Vidal E M. Analyzing linked data quality with LiQuate[J]. Lecture Notes in Computer Science, 2013, 8798: 488-493.
17 Zaveri A, Rula A, Maurino A, et al. Quality assessment for linked data:a survey[J]. Semantic Web, 2015, 7(1): 63-93.
18 Jeremy D, SÖren A, Christoph L. Luzzu—a methodology and framework for linked data quality assessment[J]. Journal of Data and Information Quality, 2016, 8(1): 2992786.
19 Mohammad R, Marco T, Giuseppe R, et al. A quality assessment approach for evolving knowledge bases[J]. Semantic Web, 2018, 10(2): 1-35.
20 Yang L, Huang L, Liu Z Z. Linked data crowdsourcing quality assessment based on domain professionalism[J]. Journal of Physics: Conference Series, 2019, 1187(5): 052085.
21 袁满, 胡超, 仇婷婷. 基于Linked data的数据完整性评估新方法[J]. 吉林大学学报: 工学版, 2020, 50(5): 1826-1831.
Yuan Man, Hu Chao, Qiu Ting-ting, A new method for data integrity assessment based on Linked data[J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1826-1831.
22 Hogan A, Jürgen U, Harth A, et al. An empirical survey of linked data conformance[J]. Journal of Web Semantics, 2012, 14: 14-44.
23 Acosta M, Zaveri A, Simperl E, et al. Crowdsourcing linked data quality assessment[C]∥International Semantic Web Conference. Berlin, Heidelberg:Springer, 2013: 260-276.
24 Christophe G, Groth P T, Stadler C, et al. Assessing linked data mappings using network measures[C]∥Proceedings of the 9th Extended Semantic Web Conference, Heraklion, 2012: 12126405.
25 王梦竹. 求解0-1背包问题算法研究[J]. 软件导刊, 2013, 12(8): 59-61.
Wang Meng-zhu. A research of algorithm for the 0-1 knapsack problem[J]. Software Guide, 2013, 12(8): 59-61.
26 欧阳丹彤, 高杰. 不一致术语集最小基数诊断的分支限界[J]. 吉林大学学报: 工学版, 2020, 50(4): 1449-1454.
Ouyang Dan-tong, Gao Jie, Branch and bound for computing the cardinality-minimal diagnosis of incoherent terminology[J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(4): 1449-1454.
27 Pivnichny J R, Samodovitz A J. Web browser which checks availability of hot links[J]. United States Patent, 1999, 8: 5974445.
[1] 吕帅,刘京. 基于深度强化学习的随机局部搜索启发式方法[J]. 吉林大学学报(工学版), 2021, 51(4): 1420-1426.
[2] 魏晓辉,孙冰怡,崔佳旭. 基于图神经网络的兴趣活动推荐算法[J]. 吉林大学学报(工学版), 2021, 51(1): 278-284.
[3] 袁满,胡超,仇婷婷. 基于Linked data的数据完整性评估新方法[J]. 吉林大学学报(工学版), 2020, 50(5): 1826-1831.
[4] 刘磊,瓮杰,郭德贵. 面向编译器测试的部分求值静态输入确定方法[J]. 吉林大学学报(工学版), 2020, 50(1): 262-267.
[5] 马健, 樊建平, 刘峰, 李红辉. 面向对象软件系统演化模型[J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[6] 罗养霞, 郭晔. 基于数据依赖特征的软件识别[J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[7] 应欢, 王东辉, 武成岗, 王喆, 唐博文, 李建军. 适用于商用系统环境的低开销确定性重放技术[J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[8] 李勇, 黄志球, 王勇, 房丙午. 基于多源数据的跨项目软件缺陷预测[J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[9] 王念滨, 祝官文, 周连科, 王红卫. 支持高效路径查询的数据空间索引方法[J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[10] 陈鹏飞, 田地, 杨光. 基于MVC架构的LIBS软件设计与实现[J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[11] 康辉, 王家琦, 梅芳. 基于Pi演算的并行编程语言[J]. 吉林大学学报(工学版), 2016, 46(1): 235-241.
[12] 特日跟, 江晟, 李雄飞, 李军. 基于整数数据的文档压缩编码方案[J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[13] 冯晓宁, 王卓, 张旭. 基于L-π演算的WSN路由协议形式化方法[J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[14] 刘磊, 王燕燕, 申春, 李玉祥, 刘雷. Bellman-Ford算法性能可移植的GPU并行优化[J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.
[15] 李明哲, 王劲林, 陈晓, 陈君. 基于网络处理器的流媒体应用架构模型(VPL)[J]. 吉林大学学报(工学版), 2015, 45(5): 1572-1580.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!