吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (1): 209-220.doi: 10.13229/j.cnki.jdxbgxb.20220202
• 计算机科学与技术 • 上一篇
Zhong-liang FU(),Xiao-qing CHEN,Wei REN,Yu YAO
摘要:
针对传统K最近邻(KNN)算法没有学习过程,进行分类预测时需要遍历全部学习样本、时效性差且对k值敏感的缺点,本文提出了两种带学习过程的随机KNN算法(RKNN),包括对样本Bootstrap抽样的SRKNN算法和对样本特征Bootstrap抽样的ARKNN算法,均属于Bagging集成学习,学习多个简单KNN后投票输出结果。算法对样本的特征进行组合得到组合特征,简单KNN基于组合特征得到。重点研究了如何选取特征的最优组合系数,得到了取得最好分类精度时的特征最优组合系数选取规则和公式。RKNN算法在构造简单KNN时引入学习,分类时不再遍历全部学习样本而只需要用二分查找法即可,其分类时间复杂度比传统KNN算法分类时间复杂度低一个数量级。RKNN算法的分类精度比传统KNN算法的分类精度有大幅提升,解决了使用KNN算法难以选取k值的问题。理论分析和实验结果均验证了本文RKNN算法的有效性。
中图分类号:
1 | Cover T M, Hart P E. Nearst neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. |
2 | Hart P E. The condensed nearest neighbor rule[J]. IEEE Trans actions on Information Theory, 1968, 14(3):515-516. |
3 | 李荣陆, 胡运发. 基于密度的KNN 文本分类器学习样本裁剪方法[J]. 计算机研究与发展, 2004, 41(4): 539-545. |
Li Rong-lu, Hu Yun-Fa. A density-based method for reducing the amount of training data in kNN text classification[J]. Journal of Computer Research and Development, 2004, 41(4): 539-545. | |
4 | 张孝飞, 黄河燕. 一种采用聚类技术改进的KNN 文本分类方法[J]. 模式识别与人工智能, 2009, 22(6): 936-940. |
Zhang Xiao-fei, Huang He-yan. An improved KNN text categorization algorithm by adopting cluster technology[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(6): 936-940. | |
5 | Hwang W J, Wen K W. Fast KNN classification algorithm based on partial distance search[J]. Electron Letter, 1998, 34(21): 2062-2063. |
6 | Pan J S, Qiao Y L, Sun S H. A fast k-nearest neighbors classification algorithm[J]. IEICE Transactions Fundamentals, 2004, 87(4): 961-961. |
7 | Samet H. k-Nearst neighbor finding using MaxNearstDist [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2): 243-252. |
8 | Guo G D, Wang H, Bell D, et al. Using KNN model for automatic text categorization[J]. Soft Computing-A Fusion of Foundations,Methodologies and Application, 2006, 10(5): 423-430. |
9 | 朱付保, 谢利杰, 汤萌萌, 等. 基于模糊C-Means 的改进型KNN分类方法[J]. 华中师范大学学报: 自然科学版, 2017, 51(6): 754-759. |
Zhu Fu-bao, Xie Li-jie, Tang meng-meng, et al. Improved KNN classification algorithm based fuzzy C-means[J]. Journal of Central China Normal University (Natural Science Edition), 2017, 51(6): 754-759. | |
10 | Samet H. The Design an Analysis of Spatial Data Structure[M]. Reading: Addison-Wesley, 1990. |
11 | 尚文倩, 黄厚宽, 刘玉玲, 等.文本分类中基于基尼指数的特征选择算法研究 [J]. 计算机研究与发展, 2006, 43(10): 1688-1694. |
Shang Wen-qian, Huang Hou-kuan, Liu Yu-ling, et al. Research on the algorithm of feature selection based on Gini index for text categorization[J]. Journal of Computer Research and Development, 2006,43(10): 1688-1694. | |
12 | 林用民, 朱卫东. 模糊KNN在文本分类中的应用研究[J]. 计算机应用与软件, 2008, 25(9): 185-187. |
Lin Yong-min, Zhu Wei-dong. Study on the application of fuzzy KNN to text categorization[J]. Computer Applications and Software, 2008, 25(9): 185-187. | |
13 | Thanh N P, Kappas M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery[J]. Sensors, 2018, 18(1): 18. |
14 | 黄光华, 殷锋, 冯九林. 一种交叉验证和距离加权方法改进的KNN算法研究[J]. 西南民族大学学报: 自然科学版, 2020, 46(2): 172-177. |
Huang Guang-hua, Yin Feng, Feng Jiu-lin. An improved KNN algorithm based on cross validation and distance weighting[J]. Journal of Southwest Minzu University (Natural Science Edition), 2020, 46(2): 182-177. | |
15 | Gou J P, Ma H X, Ou W H, et al. A generalized mean distance-based k-nearest neighbor classifier[J]. Expert Systems with Applications, 2019, 115: 356-372. |
16 | Gou J P, Qiu W M, Zhang Y, et al. A local mean representation-based K-nearest neighbor classifier[J]. ACM Transactions on Intelligent Systems, 2019, 29 (10): 1-25. |
17 | Bicego M, Loog M. Weighted K-nearest neighbor revisited[C]∥The 23th International Conference on Pattern Recognition (ICPR),Cancun, Mexico, 2016, 1642-1647. |
18 | Gou J P, Qiu W M, Zhang Y, et al. Locality constrained representation-based K-nearest neighbor classification[J]. Knowledge-Based Systems, 2019, 167(3): 38-52. |
19 | Ma H, Gou J, Ou W, et al. A new nearest neighbor classifier based on multi-harmonic mean distances[C]∥International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 2017: 115-125. |
20 | Zhang S C, Li X L, Zong M, et al. Learning k for kNN classification[J]. ACM Transactions on Intelligent Systems and Technology, 2017, 8(3): 1-19. |
21 | Zhong X F, Guo S Z, Gao L, et al. An improved k-NN classification with dynamic k [C]∥Proceedings of the 9th International Conference on Machine Learning and Computing, Singapore, 2017: 211-216. |
22 | Li B, Chen Y, Chen Y. The nearest neighbor algorithm of local probability centers[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(1): 141-154. |
23 | Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140. |
24 | Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. |
25 | 付忠良, 张丹普, 王莉莉. 多标签AdaBoost 算法的改进算法[J]. 四川大学学报: 工程科学版, 2015, 47(5): 103-109. |
Fu Zhong-liang, Zhang Dan-pu, Wang Li-li. Improvement on AdaBoost for multi-label classification[J]. Journal of Sichuan University (Engineering Science Edition), 2015, 47(5): 103-109. | |
26 | Breiman L. Random forest[J]. Machine Learning, 2001, 29(1): 1-10. |
27 | 付忠良. 分类器线性组合及分类器线性组合的有效性和最佳组合问题的研究[J]. 计算机研究与发展, 2009, 46(7): 1206-1216. |
Fu Zhong-liang. Effective property and best combination of classifier linear combination[J]. Journal of Computer Research and Development, 2009, 46(7): 1206-1216. |
[1] | 王玫,宋志远. 基于TrAdaBoost算法为内核的行人航迹推算技术[J]. 吉林大学学报(工学版), 2023, 53(8): 2364-2370. |
[2] | 耿庆田,刘植,李清亮,于繁华,李晓宁. 基于一种深度学习模型的土壤湿度预测[J]. 吉林大学学报(工学版), 2023, 53(8): 2430-2436. |
[3] | 潘恒彦,张文会,梁婷婷,彭志鹏,高维,王永岗. 基于MIMIC与机器学习的出租车驾驶员交通事故诱因分析[J]. 吉林大学学报(工学版), 2023, 53(2): 457-467. |
[4] | 袁伟,袁小慧,高岩,李坤宸,赵登峰,刘朝辉. 基于自然驾驶数据的电动公交踏板误操作辨识方法[J]. 吉林大学学报(工学版), 2023, 53(12): 3342-3350. |
[5] | 周丰丰,颜振炜. 基于混合特征的特征选择神经肽预测模型[J]. 吉林大学学报(工学版), 2023, 53(11): 3238-3245. |
[6] | 耿庆田,赵杨,李清亮,于繁华,李晓宁. 基于注意力机制的LSTM和ARIMA集成方法在土壤温度中应用[J]. 吉林大学学报(工学版), 2023, 53(10): 2973-2981. |
[7] | 车翔玖,于英杰,刘全乐. 增强Bagging集成学习及多目标检测算法[J]. 吉林大学学报(工学版), 2022, 52(12): 2916-2923. |
[8] | 段亮,宋春元,刘超,魏苇,吕成吉. 基于机器学习的高速列车轴承温度状态识别[J]. 吉林大学学报(工学版), 2022, 52(1): 53-62. |
[9] | 李光松,李文清,李青. 基于随机性特征的加密和压缩流量分类[J]. 吉林大学学报(工学版), 2021, 51(4): 1375-1386. |
[10] | 朱小龙,谢忠. 基于机器学习的地理空间数据抽取算法[J]. 吉林大学学报(工学版), 2021, 51(3): 1011-1016. |
[11] | 李阳,李硕,井丽巍. 基于贝叶斯模型与机器学习算法的金融风险网络评估模型[J]. 吉林大学学报(工学版), 2020, 50(5): 1862-1869. |
[12] | 方伟,黄羿,马新强. 基于机器学习的虚拟网络感知数据缺陷自动检测[J]. 吉林大学学报(工学版), 2020, 50(5): 1844-1849. |
[13] | 刘洲洲,尹文晓,张倩昀,彭寒. 基于离散优化算法和机器学习的传感云入侵检测[J]. 吉林大学学报(工学版), 2020, 50(2): 692-702. |
[14] | 赵东, 臧雪柏, 赵宏伟. 基于果蝇优化的随机森林预测方法[J]. 吉林大学学报(工学版), 2017, 47(2): 609-614. |
[15] | 金立生, 王岩, 刘景华, 王亚丽, 郑义. 基于Adaboost算法的日间前方车辆检测[J]. 吉林大学学报(工学版), 2014, 44(6): 1604-1608. |
|