吉林大学学报(工学版) ›› 2023, Vol. 53 ›› Issue (4): 1174-1180.doi: 10.13229/j.cnki.jdxbgxb.20220081

• 计算机科学与技术 • 上一篇    

基于模糊近似度的隐私敏感数据过滤算法

方朝剑(),胡新荣()   

  1. 武汉纺织大学 计算机与人工智能学院,武汉 430073
  • 收稿日期:2022-01-19 出版日期:2023-04-01 发布日期:2023-04-20
  • 通讯作者: 胡新荣 E-mail:fangchaojian54545@yeah.net;hxr@wtu.edu.cn
  • 作者简介:方朝剑(1983-),男,实验师.研究方向:网络安全,数字媒体技术.E-mail:fangchaojian54545@yeah.net
  • 基金资助:
    国家自然科学基金项目(61807013);湖北省高等学校优秀中青年科技创新团队计划项目(T201807);湖北省教育厅科学研究计划重点项目(D20191708)

Privacy-sensitive data filtering algorithm based on fuzzy approximation

Chao-jian FANG(),Xin-rong HU()   

  1. School of Computer Science and Artificial Intelligence,Wuhan 430073,China
  • Received:2022-01-19 Online:2023-04-01 Published:2023-04-20
  • Contact: Xin-rong HU E-mail:fangchaojian54545@yeah.net;hxr@wtu.edu.cn

摘要:

针对目前现有算法对隐私敏感数据进行过滤时,仅使用单一的近似度获取方法,在求取近似度时存在一定的局限性,导致平均绝对误差(MAE)值和均方根误差(RMSE)值高的问题,提出了一种基于模糊近似度的隐私敏感数据过滤算法。首先通过改进的局部敏感哈希算法E2LSH对数据进行降维处理,获取到更利于后续近似度计算的低维数据,然后采用Paillier同态加密算法在保证数据安全性的前提下对隐私敏感数据信息进行提取,最后构建梯形模糊评分模型,通过修正余弦相似性和皮尔森相关相似性的混合模型相似性算法对模糊近似度进行计算,完成对隐私敏感数据的过滤。分析实验结果可知,本文方法的MAE最低值低于0.82,说明该方法能够有效地降低MAE值和RMSE值,提升数据过滤效果。

关键词: 模糊近似度, 隐私敏感数据, 数据过滤, Paillier同态加密算法, 混合模型相似性算法, 余弦相似性, 皮尔森相关相似性, 梯形模糊评分模型

Abstract:

When the current algorithm is used to filter privacy-sensitive data, only a single approximation acquisition method is used. There are certain limitations in obtaining the approximation, which leads to the problem of high MAE and RMSE values. A privacy-sensitive data filtering algorithm based on fuzzy approximation is proposed. First, the data is reduced in dimensionality through the improved local sensitive hash algorithm E2LSH, and low-dimensional data that is more conducive to the subsequent approximation calculation is obtained, and then the Paillier homomorphic encryption algorithm is used to protect the privacy-sensitive data under the premise of ensuring data security. After extraction, the trapezoidal fuzzy scoring model is finally constructed, and the fuzzy approximation is calculated by the mixed model similarity algorithm of modified cosine similarity and Pearson correlation similarity to complete the filtering of privacy-sensitive data. Analysis of the experimental results shows that the minimum MAE value of the proposed method is lower than 0.82, indicating that the method can effectively reduce the MAE value and RMSE value and improve the data filtering effect.

Key words: fuzzy approximation, privacy-sensitive data, data filtering, Paillier homomorphic encryption algorithm, hybrid model algorithm, cosine similarity, Pearson correlation similarity, trapezoidal fuzzy scoring model

中图分类号: 

  • TP391.3

图1

梯形模糊评分模型"

图2

MAE值实验结果"

图3

RMSE值实验结果"

图4

MAE值对比实验结果"

图5

RMSE值对比实验结果"

1 刘瑜, 姚欣, 龚咏喜, 等.大数据时代的空间交互分析方法和应用再论[J].地理学报, 2020, 75(7): 1523-1538.
Liu Yu, Yao Xin, Gong Yong-xi, et al. Analytical methods and applications of spatial interactions in the era of big data[J]. Acta Geographica Sinica, 2020, 75(7):1523-1538.
2 傅江辉.基于云计算的社交网络安全隐私数据融合方法[J].济南大学学报: 自然科学版, 2021, 35(1): 29-33.
Fu Jiang-hui. Security and privacy data fusion method for social network based on cloud computing[J]. Journal of University of Jinan (Science and Technology), 2021,35(1): 29-33.
3 李扬, 苏海, 余松森, 等.基于动态阈值分段的激光扫描数据过滤算法[J].激光与红外, 2020, 50(8): 920-928.
Li Yang, Su Hai, Yu Song-sen, et al. Laser scanning data filtering algorithm based on dynamic threshold segmentation[J]. Laser & Infrared, 2020, 50(8):920-928.
4 韦存存.激光雷达定位信息冗余数据过滤技术研究[J]. 激光杂志, 2019, 40(3): 125-129.
Wei Cun-cun. Research on filtering technology of lidar positioning information redundant data[J]. Laser Journal, 2019, 40(3): 125-129.
5 张润莲, 张瑞, 武小年, 等.基于混合相似度和差分隐私的协同过滤推荐算法[J].计算机应用研究, 2021, 38(8): 2334-2339.
Zhang Run-lian, Zhang Rui, Wu Xiao-nian, et al. Collaborative filtering recommendation algorithm based on mixed similarity and differential privacy[J]. Application Research of Computers, 2021, 38(8):2334-2339.
6 王俊杰, 温雪岩, 徐克生, 等.基于局部敏感哈希的改进堆叠算法[J].广西师范大学学报: 自然科学版, 2020, 38(4): 21-31.
Wang Jun-jie, Wen Xue-yan, Xu Ke-sheng, et al. An improved stack algorithm based on local sensitive Hash[J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(4): 21-31.
7 韩敏, 李宇, 韩冰.基于改进结构保持数据降维方法的故障诊断研究[J]. 自动化学报, 2021, 47(2):338-348.
Han Min, Li Yu, Han Bing. Research on fault diagnosis of data dimension reduction based on improved structure preserving algorithm[J]. Acta Automatica Sinica, 2021, 47(2): 338-348.
8 安彦哲, 朱妤晴, 王建民.物联网大数据场景下的分布式哈希表适用条件分析[J].计算机学报, 2021, 44(8): 1679-1695.
An Yan-zhe, Zhu Yu-qing, Wang Jian-min. On distributed Hash table's applicability to internet-of-things big data management[J]. Chinese Journal of Computers, 2021, 44(8): 1679-1695.
9 梁艳丽, 凌捷.基于区块链的云存储加密数据共享方案[J]. 计算机工程与应用, 2020, 56(17): 41-47.
Liang Yan-li, Ling Jie. Encrypted data sharing scheme in cloud storage based on blockchain[J]. Computer Engineering and Applications, 2020, 56(17):41-47.
10 田茂毅, 于婷. 基于RSA及Paillier的网络数据传输隐私保护[J]. 计算机仿真, 2021, 38(6): 142-145, 183.
Tian Mao-yi, Yu Ting. Network data transmission privacy protection based on RSA and Paillier[J]. Computer Simulation, 2021, 38(6): 142-145, 183.
11 王凤, 平轶男, 周礼刚, 等.一种基于新的区间二型梯形模糊相似测度的多属性群决策方法[J].运筹与管理, 2019, 28(4): 33-41.
Wang Feng, Ping Yi-nan, Zhou Li-gang, et al. An approach to multiple attribute group decision making based on the new interval type-2 trapezoidal fuzzy similarity measure[J]. Operations Research and Management Science, 2019, 28(4): 33-41.
12 曾强, 黄政, 魏曙寰.基于模糊理论和贝叶斯网络的燃气轮机健康状态评估方法[J]. 科学技术与工程,2020, 20(11): 4363-4369.
Zeng Qiang, Huang Zheng, Wei Shu-huan. Assessment method of gas turbine health based on fuzzy theory and Bayesian network[J]. Science Technology and Engineering, 2020, 20(11): 4363-4369.
13 夏丽丽, 田茂再, 朱钰.二项分布下基于鞍点逼近的总体成数置信区间的构造[J].统计与信息论坛, 2019, 34(9): 3-9.
Xia Lili, Tian Mao-zai, Zhu Yu. Construction of confidence intervals for the whole percentage based on saddle point approximations under binomial distribution[J]. Statistics & Information Forum, 2019, 34(9): 3-9.
14 闫新国, 谢赤, 朱玉国, 等.基于余弦相似度(CS)和最小生成树(MST)的基金市场复杂网络研究[J].财经理论与实践, 2020, 41(2): 55-61.
Yan Xinguo, Xie Chi, Zhu Yu-guo, et al. The study of fund market complex network based on cosine similarity and MST method[J]. The Theory and Practice of Finance and Economics, 2020, 41(2): 55-61.
15 汪苗苗, 焦学磊. 概率数学模型在数据过滤中的应用研究[J]. 科技通报, 2019, 35(6): 20-23, 57.
Wang Miao-miao, Jiao Xue-lei. Research on application of probability mathematical model in data filtering[J]. Bulletin of Science and Technology, 2019, 35(6):20-23, 57.
[1] 李宾,周旭,梅芳,潘帅宁. 基于K-means和矩阵分解的位置推荐算法[J]. 吉林大学学报(工学版), 2019, 49(5): 1653-1660.
[2] 张海龙, 王仁彪, 聂俊, 刘进忠. 海量数据的网格启发信息密度聚类算法[J]. 吉林大学学报(工学版), 2011, 41(增刊2): 254-258.
[3] 李雄飞,张海龙,刘兆军,王仁彪. 用启发式算法求解最短路径问题[J]. 吉林大学学报(工学版), 2011, 41(01): 182-0187.
[4] 张海龙, 李雄飞, 王仁彪. 启发式遗传算法求解应急资源调度[J]. 吉林大学学报(工学版), 2010, 40(03): 758-0762.
[5] 赵宏伟,张海龙,刘萍萍,王慧,徐震宇 . 基于表象式语义网络的图匹配算法[J]. 吉林大学学报(工学版), 2008, 38(增刊): 145-0149.
[6] 李永丽, 张海龙, 刘衍珩. 基于共生遗传算法求解应急资源调度[J]. 吉林大学学报(工学版), 2011, 41(02): 442-0446.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!