吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (6): 2122-2130.doi: 10.13229/j.cnki.jdxbgxb.20230991

• 计算机科学与技术 • 上一篇    下一篇

基于细粒度分析的不均衡图像分类算法

刘萍萍1,2(),商文理3,解小宇1,杨晓康3   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 符号计算与知识工程教育部重点实验室,长春 130012
    3.吉林大学 软件学院,长春 130012
  • 收稿日期:2023-09-15 出版日期:2025-06-01 发布日期:2025-07-23
  • 作者简介:刘萍萍(1979-),女,教授,博士.研究方向:机器学习与图像处理.E-mail:liupp@jlu.edu.cn
  • 基金资助:
    吉林省自然科学基金项目(20200201283JC);吉林省产业关键核心技术攻关项目(20230201085GX);国家自然科学基金面上项目(62071199)

Unbalanced image classification algorithm based on fine⁃grained analysis

Ping-ping LIU1,2(),Wen-li SHANG3,Xiao-yu XIE1,Xiao-kang YANG3   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
    3.College of Software,Jilin University,Changchun 130012,China
  • Received:2023-09-15 Online:2025-06-01 Published:2025-07-23

摘要:

针对细粒度属性图像具有复杂性和多样性,传统的图像分类方法在关注图像细粒度属性方面存在不足,并在处理不均衡数据集时表现不佳的问题,提出了一种基于深度度量学习的细粒度图像阈值分类算法。通过引入度量学习方法增强对图像细粒度属性的关注。同时,通过应用成对损失和代理损失,提高了模型的分类准确性并加快了模型的收敛速度。为了应对数据不均衡问题,设计了一个基于阈值分析的分类器。该分类器利用阈值分析技术实现了对细粒度图像的多级分类,从而改善了在不均衡数据集中少数类别分类准确性较低的问题。实验结果表明,本文所提出的基于深度度量学习的细粒度图像阈值分类算法在分类准确性方面显著优于其他方法。

关键词: 计算机应用, 深度度量学习, 细粒度分类, 不均衡数据, 阈值分类器

Abstract:

Aiming at the complexity and diversity of fine-grained images, where traditional image classification methods exhibit limitations in focusing on fine-grained attributes and perform poorly when handling imbalanced datasets, a threshold-based fine-grained image classification algorithm utilizing deep metric learning was proposed. The focus on fine-grained attributes of images was enhanced by introducing a metric learning approach. Additionally, the classification accuracy was enhanced and the model convergence was expedited by incorporating pairwise loss and agent loss mechanisms. To address the issue of data imbalance, a classifier was devised grounded in threshold analysis techniques. This innovative classifier harnesses threshold analysis to facilitate multi-level classification of fine-grained images, thereby ameliorating the issue of low classification accuracy for certain categories within an imbalanced dataset. The results of these experiments unequivocally demonstrate that the proposed threshold classification algorithm for fine-grained images, based on deep metric learning, outperforms alternative methods in terms of classification accuracy.

Key words: computer application, deep metric learning, fine-grained classification, unbalanced data, threshold classifier

中图分类号: 

  • TP391

图1

阈值分类框架的总体训练结构"

图2

阈值分类方法示意图"

图3

数据集里不同类型的图像"

图4

数据集里各类型图像数量"

表1

不同Embedding Size在各度量损失函数下的分类性能"

Embedding SizeMulti-similarityProxy-NCA
ODIR5K

COVID

Radiography

ODIR5K

COVID

Radiography

1694.6393.8290.8792.85
3294.2493.3791.0393.82
6493.7794.2491.1093.96
12894.9495.6291.1394.35
25693.3194.5690.2894.21
51293.8594.2791.5993.62
1 02494.0193.6690.8693.24

表2

不同度量损失在ODIR5K数据集上的性能及收敛速度"

损失函数准确率/%收敛迭代次数
Triplet+CE91.052.0 k
Contrastive+CE93.311.9 k
Multi-similarity+CE93.522.5 k
Proxy-NCA+CE92.811.3 k
Proxy-Anchor+CE91.131.0 k
本文方法95.631.5 k

表3

不同度量损失函数在ODIR5K数据集上的表现 (%)"

损失函数准确率F1精准率召回率
Cross Entropy92.2292.5093.2092.00
Triplet+CE91.0590.5095.5087.00
Contrastive+CE93.3193.7095.2092.30
Multi-Similarity+CE93.5294.7397.3292.82
Proxy-NCA+CE92.8193.6096.8091.40
Proxy-Anchor+CE91.1391.6093.4090.10
本文方法95.6395.3097.6093.30

表4

不同度量损失函数在COVID Radiography数据集上的表现 (%)"

损失函数准确率F1精准率召回率
Cross Entropy94.1993.4796.0194.10
Triplet+CE92.9491.9896.9390.12
Contrastive+CE95.2194.9796.2495.64
Multi-Similarity+CE95.6296.5298.1795.79
Proxy-NCA+CE94.3595.6297.7895.92
Proxy-Anchor+CE92.6693.6195.2193.62
本文方法96.5793.4095.6091.30

表5

不同分类器在ODIR5K数据集上的表现 (%)"

类别准确率
本文MLPNN
年龄相关性黄斑变性99.995.994.8
白内障10099.196.5
糖尿病96.888.291.5
青光眼99.593.088.7
高血压99.883.792.7
近视99.997.099.0
正常95.293.693.8
其他疾病98.289.392.5

表6

不同分类器在COVID Radiography数据集 (%)"

类别准确率
本文MLPNN
正常95.5893.397.3
肺部浑浊96.8696.597.8
新冠肺炎97.5090.794.6
病毒性肺炎96.6096.889.8

表7

在ODIR5K数据集上不同阈值下的分类器性能"

阈值准确率F1精准率召回率
0.4687.5089.2985.0092.31
0.4789.3690.4187.9493.65
0.4895.6395.3097.6093.30
0.4992.6094.3291.8796.21
0.5090.9693.6291.1896.43
0.5180.9785.6480.1592.67

表8

在COVID Radiography数据集上不同阈值下的分类器性能"

阈值准确率F1精准率召回率
0.4689.5190.6187.2094.30
0.4791.3892.6189.8795.54
0.4896.5793.4095.6091.30
0.4994.5695.5692.8598.44
0.5092.9195.1092.2198.19
0.5182.8287.9182.1394.58

表9

在眼底数据集中不同类别下的衡量指标结果 (%)"

类别准确率F1精准率召回率
年龄相关性黄斑变性99.998.9100.097.9
白内障100.0100100.0100.0
糖尿病96.892.396.988.2
青光眼99.591.8100.084.8
高血压99.892.795.090.5
近视99.999.0100.098.0
正常95.295.291.699.0
其他疾病98.292.697.388.3

表10

在肺炎数据集中不同类别下的衡量指标结果 (%)"

类别准确率F1精准率召回率
正常95.5897.8399.1296.58
肺部浑浊96.8696.3695.6097.15
新冠肺炎97.5098.9498.3399.63
病毒性肺炎96.6093.3392.7293.96

表11

不同方法在ODIR5K数据集上的表现 (%)"

方法准确率F1精准率召回率
EfficientNetB31592.00-71.0066.00
MCGS-Net16-89.6665.8861.60
BFPC-Net1794.2394.1697.0993.23
DSRACNN1887.9088.1688.50-
本文95.6395.3097.6093.30

表12

不同方法在COVID Radiography数据集上的表现"

方法准确率F1精准率召回率
COVID-CAPS1995.70---
DRNN2092.191.193.01-
本文96.5793.4095.6091.30
[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2016: 770-778.
[2] Huang Z Z, Zhang J P, Shan H M. When age-invariant face recognition meets face age synthesis: a multi-task learning framework[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Kuala Lumpur, Malaysia, 2021: 7282-7291.
[3] Ji R, Wen L, Zhang L, et al. Attention convolutional binary neural tree for fine-grained visual categorization[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10468-10477.
[4] Wei X S, Xie C W, Wu J, et al. Mask-CNN: localizing parts and selecting descriptors for fine-grained bird species categorization[J]. Pattern Recognition, 2018, 76: 704-714.
[5] Zheng H, Fu J, Zha Z J, et al. Learning deep bilinear transformation for fine-grained image representation[J]. Advances in Neural Information Processing Systems, 2019, 32: No.03621.
[6] Chang D, Ding Y, Xie J, et al. The devil is in the channels: Mutual-channel loss for fine-grained image classification[J]. IEEE Transactions on Image Processing, 2020, 29: 4683-4695.
[7] Bera A, Wharton Z, Liu Y, et al. SR-GNN: spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017-6031.
[8] Sundgaard J V, Harte J, Bray P, et al. Deep metric learning for otitis media classification[J]. Medical Image Analysis, 2021, 71: No.102034.
[9] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J/OL].[2023-08-11].
[10] Guo H, Wang S. Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Kuala Lumpur, Malaysia, 2021: 15089-15098.
[11] Movshovitz-Attias Y, Toshev A, Leung T K, et al. No fuss distance metric learning using proxies[C]∥Proceedings of the IEEE International Conference on Computer Vision, Hawaii, USA, 2017: 360-368.
[12] Wang X, Han X, Huang W, et al. Multi-similarity loss with general pair weighting for deep metric learning[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Angeles, USA, 2019: 5022-5030.
[13] International Competition on Ocular Disease Intelligent Recognition[EB/OL]. [2021-11-18].
[14] Rahman T, Khandakar A, Qiblawey Y, et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images[J]. Computers in Biology and Medicine,2021,132:No.104319.
[15] Wang J, Yang L, Huo Z, et al. Multi-label classification of fundus images with efficientnet[J]. IEEE Access, 2020, 8: 212499-212508.
[16] Lin J, Cai Q, Lin M. Multi-label classification of fundus images with graph convolutional network and self-supervised learning[J]. IEEE Signal Processing Letters, 2021, 28: 454-458.
[17] Li Z, Xu M, Yang X, et al. Multi-label fundus image classification using attention mechanisms and feature fusion[J]. Micromachines, 2022, 13(6): No.947.
[18] Yang X, Yi S. Multi-classification of fundus diseases based on DSRA-CNN[J]. Biomedical Signal Processing and Control, 2022, 77: No.103763.
[19] Afshar P, Heidarian S, Naderkhani F, et al. Covid-caps: a capsule network-based framework for identification of COVID-19 cases from X-ray images[J]. Pattern Recognition Letters, 2020, 138: 638-643.
[20] Panahi A, Askari M R, Akrami M, et al. Deep residual neural network for COVID-19 detection from chest X-ray images[J]. SN Computer Science, 2022, 3(2): No.169.
[1] 王健,贾晨威. 面向智能网联车辆的轨迹预测模型[J]. 吉林大学学报(工学版), 2025, 55(6): 1963-1972.
[2] 车翔玖,孙雨鹏. 基于相似度随机游走聚合的图节点分类算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2069-2075.
[3] 周丰丰,郭喆,范雨思. 面向不平衡多组学癌症数据的特征表征算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2089-2096.
[4] 赵宏伟,周伟民. 基于数据增强的半监督单目深度估计框架[J]. 吉林大学学报(工学版), 2025, 55(6): 2082-2088.
[5] 陈海鹏,张世博,吕颖达. 多尺度感知与边界引导的图像篡改检测方法[J]. 吉林大学学报(工学版), 2025, 55(6): 2114-2121.
[6] 申自浩,高永生,王辉,刘沛骞,刘琨. 面向车联网隐私保护的深度确定性策略梯度缓存方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1638-1647.
[7] 王友卫,刘奥,凤丽洲. 基于知识蒸馏和评论时间的文本情感分类新方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1664-1674.
[8] 赵宏伟,周明珠,刘萍萍,周求湛. 基于置信学习和协同训练的医学图像分割方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1675-1681.
[9] 侯越,郭劲松,林伟,张迪,武月,张鑫. 分割可跨越车道分界线的多视角视频车速提取方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1692-1704.
[10] 王军,司昌馥,王凯鹏,付强. 融合集成学习技术和PSO-GA算法的特征提取技术的入侵检测方法[J]. 吉林大学学报(工学版), 2025, 55(4): 1396-1405.
[11] 徐涛,孔帅迪,刘才华,李时. 异构机密计算综述[J]. 吉林大学学报(工学版), 2025, 55(3): 755-770.
[12] 赵孟雪,车翔玖,徐欢,刘全乐. 基于先验知识优化的医学图像候选区域生成方法[J]. 吉林大学学报(工学版), 2025, 55(2): 722-730.
[13] 蔡晓东,周青松,张言言,雪韵. 基于动静态和关系特征全局捕获的社交推荐模型[J]. 吉林大学学报(工学版), 2025, 55(2): 700-708.
[14] 车翔玖,武宇宁,刘全乐. 基于因果特征学习的有权同构图分类算法[J]. 吉林大学学报(工学版), 2025, 55(2): 681-686.
[15] 郭晓然,王铁君,闫悦. 基于局部注意力和本地远程监督的实体关系抽取方法[J]. 吉林大学学报(工学版), 2025, 55(1): 307-315.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘序宗,刘树彬,郑伟,安琪 . BESⅢ TOF子触发系统击中信息
多通道串行同步传输方法
[J]. 吉林大学学报(工学版), 2008, 38(02): 483 -0488 .
[2] 孙恩昌,田斌,张冬英,易克初 . 空间相关信道下STBC-QOTDM性能分析[J]. 吉林大学学报(工学版), 2009, 39(02): 514 -0518 .
[3] 莫秀玲, 苗雨, 赵晓晖. UWB局域网络MAC中改进排外范围的算法[J]. 吉林大学学报(工学版), 2010, 40(02): 560 -0565 .
[4] 高荣,叶佩青,蒋克荣,李文. 基于小波奇异性的电主轴振动信号处理[J]. 吉林大学学报(工学版), 2010, 40(04): 1025 -1028 .
[5] 魏克新1,2,杜明星1,2. 基于传热反问题的绝缘栅双极型晶体管模块温度计算方法[J]. 吉林大学学报(工学版), 2011, 41(6): 1743 -1747 .
[6] 王霄维, 王殿海, 江晟, 金盛. 基于混合优化模型的平面交叉口控制方法[J]. 吉林大学学报(工学版), 2012, 42(增刊1): 170 -174 .
[7] 王超,宋克柱,唐进 . 高性能水下地震数据采集系统设计与实现[J]. 吉林大学学报(工学版), 2007, 37(01): 168 -172 .
[8] 刘寒冰;张淼;魏健 . 结构动力响应分析的多重网格方法[J]. 吉林大学学报(工学版), 2008, 38(03): 619 -0623 .
[9] 李晓英,于秀敏,李 君,吴志新. 串联混合动力汽车控制策略[J]. 吉林大学学报(工学版), 2005, 35(02): 122 -0126 .
[10] 宋大凤,李 静,石桂花,赵 健,李幼德. 基于车辆快速开发系统的汽车牵引力控制目标控制器[J]. 吉林大学学报(工学版), 2005, 35(01): 1 -0006 .