Journal of Jilin University(Engineering and Technology Edition) ›› 2022, Vol. 52 ›› Issue (4): 885-890.doi: 10.13229/j.cnki.jdxbgxb20200938

Previous Articles    

Multiple source data selective integration algorithm based on frequent pattern tree

Shi-min FANG()   

  1. School of Politics,National Defence University,Shanghai 200433,China
  • Received:2020-12-07 Online:2022-04-01 Published:2022-04-20

Abstract:

In order to solve the problems of high computational difficulty and low classification accuracy when ensemble learning algorithm is used to process multi-source data sets, a multi-source data selective integration algorithm based on frequent pattern tree is proposed. By comparing the difference between data source data and truth value, the error of multi-source data is judged by using Raida criterion, and all frequent patterns of multi-source data are extracted and converted into compressed form. The frequent pattern tree structure is created, and the concept of dynamic selection is integrated. The current data is used to test the extent to which an instance belongs to a subset of wrongly classified data by adjusting the difference and accuracy of the base classifiers by weighted harmonic average, the frequent itemsets are normalized, and finally the selective ensemble of multi-source data is completed by combining the base classifiers with high precision and big difference with other base classifiers. Simulation results show that the proposed ensemble learning algorithm has better generalization performance and efficiency, and has high classification accuracy.

Key words: computer application technology, frequent pattern tree, multi-source data, selective ensemble, weight weighting, classifier

CLC Number: 

  • TP391

Fig.1

Multi-source data acquisition process based on Laida criteria"

Fig.2

Frequent pattern tree structure diagram"

Table 1

Relationship between two classifiers"

分类器hj分类器hi
正确(1)错误(0)
正确(1)N11N10
错误(0)N01N00

Table 2

Real multi-source data sets basic information"

数据集训练集测试集输入输出
Boston Housing380120151
Ozone2755381
Ocean35262121

Table 3

Comparison of integration results of different algorithms on data sets"

数据集算 法集成精度/%
训练集测试集
Boston Housing本文0.9450.893
文献[30.8710.865
文献[40.8990.884
Ozone本文0.9120.906
文献[30.8450.834
文献[40.8690.857
Ocean本文0.9350.927
文献[30.8860.874
文献[40.8960.885

Fig.3

Comparison of integration rates of different algorithms"

1 韩萌, 丁剑. 数据流频繁模式挖掘综述[J]. 计算机应用, 2019, 39(3): 719-727.
Han Meng, Ding Jian. Survey of frequent pattern mining over data streams[J]. Journal of Computer Applications, 2019, 39(3): 719-727.
2 魏怀明. 模糊关联规则结合动态树重建的数据流挖掘[J]. 控制工程, 2018, 25(12): 2263-2268.
Wei Huai-ming. Data stream mining using fuzzy association rules and dynamic tree reconstruction[J]. Control Engineering of China, 2018, 25(12): 2263-2268.
3 陈涛. 基于教与学优化算法的基因表达谱选择性集成分类[J]. 科学技术与工程, 2018, 18(21): 232-238.
Chen Tao. A selective ensemble method based on teaching-learning-based optimization for classifying gene expression profiles[J]. Science Technology and Engineering, 2018, 18(21): 232-238.
4 李尧, 王志海, 孙艳歌, 等. 一种基于深度属性加权的数据流自适应集成分类算法[J]. 山东大学学报: 工学版, 2018, 48(6): 44-55, 66.
Li Yao, Wang Zhi-hai, Sun Yan-ge, et al. An adaptive ensemble classification method based on deep attribute weighting for data stream[J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 44-55, 66.
5 侯莉莎. 大数据集合中冗余特征排除的聚类算法设计[J].现代电子技术, 2018, 41(14): 48-50, 54.
Hou Li-sha. Design of clustering algorithm for redundancy feature removal in big data sets[J].Modern Electronics Technique, 2018, 41(14): 48-50, 54.
6 杨阳, 丁家满, 李海滨, 等. 一种基于Spark的不确定数据集频繁模式挖掘算法[J]. 信息与控制, 2019, 48(3): 257-264.
Yang Yang, Ding Jia-man, Li Hai-bin, et al. A spark-based frequent patterns mining algorithm for uncertain datasets[J]. Information and Control, 2019, 48(3): 257-264.
7 吴磊, 程良伦, 王涛. 基于事务映射区间求交的高效频繁模式挖掘算法[J]. 计算机应用研究, 2019, 36(4): 1031-1035, 1050.
Wu Lei, Cheng Liang-lun, Wang Tao. Efficient frequent pattern mining algorithm based on interval interaction and transaction mapping[J]. Application Research of Computers, 2019, 36(4): 1031-1035, 1050.
8 郑玉艳, 田莹, 石川. 一种元路径下基于频繁模式的实体集扩展方法[J]. 软件学报, 2018, 29(10): 2915-2930.
Zheng Yu-yan, Tian Ying, Shi Chuan. Method of entity set expansion based on frequent pattern under meta path[J]. Journal of Software, 2018, 29(10): 2915-2930.
9 陶晓玲, 亢蕊楠, 刘丽燕. 基于选择性集成的并行多分类器融合方法[J]. 计算机工程与科学, 2018, 40(5): 787-792.
Tao Xiao-ling, Kang Rui-nan, Liu Li-yan. A parallel multi-classifier fusion approach based on selective ensemble[J]. Computer Engineering and Science, 2018, 40(5): 787-792.
10 任永功, 高鹏, 张志鹏. 一种利用相关性度量的不确定数据频繁模式挖掘[J]. 小型微型计算机系统, 2019, 40(3): 623-627.
Ren Yong-gong, Gao Peng, Zhang Zhi-peng. Frequent patterns mining for uncertain data using correlation metric[J]. Journal of Chinese Computer Systems, 2019, 40(3): 623-627.
11 万芳, 胡东辉. 基于弱关联频繁模式的超限行为挖掘优化[J]. 北京交通大学学报, 2018, 42(2): 31-37.
Wan Fang, Hu Dong-hui. An optimization for overload behavior mining based on weakly correlated frequent patterns[J]. Journal of Beijing Jiaotong University, 2018, 42(2): 31-37.
[1] Sheng-sheng WANG,Jing-yu CHEN,Yi-nan LU. COVID⁃19 chest CT image segmentation based on federated learning and blockchain [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2164-2173.
[2] Hong-wei ZHAO,Zi-jian ZHANG,Jiao LI,Yuan ZHANG,Huang-shui HU,Xue-bai ZANG. Bi⁃direction segmented anti⁃collision algorithm based on query tree [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1830-1837.
[3] Jie CAO,Xue QU,Xiao-xu LI. Few⁃shot image classification method based on sliding feature vectors [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1785-1791.
[4] Chun-bo WANG,Xiao-qiang DI. Cloud storage integrity verification audit scheme based on label classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1364-1369.
[5] Rong QIAN,Ru ZHANG,Ke-jun ZHANG,Xin JIN,Shi-liang GE,Sheng JIANG. Capsule graph neural network based on global and local features fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(3): 1048-1054.
[6] Qian-yi XU,Gui-he QIN,Ming-hui SUN,Cheng-xun MENG. Classification of drivers' head status based on improved ResNeSt [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 704-711.
[7] Yuan SONG,Dan-yuan ZHOU,Wen-chang SHI. Method to enhance security function of OpenStack Swift cloud storage system [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 314-322.
[8] Xiang-jiu CHE,You-zheng DONG. Improved image recognition algorithm based on multi⁃scale information fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1747-1754.
[9] HU Guan-yu, QIAO Pei-li. High dimensional differential evolutionary algorithm based on cloud population for network security prediction [J]. 吉林大学学报(工学版), 2016, 46(2): 568-577.
[10] DONG Sa, LIU Da-you, LI Li-na, OUYANG Ruo-chuan, CHAI Xiao-li. Relational neighbor algorithm based on class propagation distributions for classification in networked data with heterophily [J]. 吉林大学学报(工学版), 2016, 46(2): 522-527.
[11] ZHANG Hao, LIU Hai-ming, WU Chun-guo, ZHANG Yan-mei, ZHAO Tian-ming, LI Shou-tao. Detection method of vehicle in highway green toll lane based on multi-feature fusion [J]. 吉林大学学报(工学版), 2016, 46(1): 271-276.
[12] WANG Zhong-yu, CAI Qing, WU Bing, LI Lin-bo. Queue length estimation for signalized intersections based on multi-source data [J]. 吉林大学学报(工学版), 2015, 45(4): 1088-1094.
[13] SUN Zhong-hua, JIANG Bin, JIA Ke-bin. Detection of the road snow coverage status based on naive Bayesian classifier [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 380-383.
[14] TONG Jin, WANG Ya-hui, FAN Xue-mei, ZHANG Shu-jun, CHEN Dong-hui. Monitoring system of cold chain logistics for farm fresh produce [J]. 吉林大学学报(工学版), 2013, 43(06): 1707-1711.
[15] ZHAO Hong-wei, CHEN Xiao, LONG Man-li, YUAN Shi-pei. Object classification algorithm based on improved PLSA [J]. 吉林大学学报(工学版), 2012, 42(增刊1): 231-235.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!