吉林大学学报(工学版) ›› 2023, Vol. 53 ›› Issue (4): 1146-1154.doi: 10.13229/j.cnki.jdxbgxb.20210829

• 计算机科学与技术 • 上一篇    

基于分割方法的繁体中文报纸文本检测

姜宇1,2(),潘家铮1,陈何淮1,符凌智1,齐红1,2()   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 符号计算与知识工程教育部重点实验室,长春 130012
  • 收稿日期:2021-08-27 出版日期:2023-04-01 发布日期:2023-04-20
  • 通讯作者: 齐红 E-mail:jiangyu2011@jlu.edu.cn;qihong@jlu.edu.cn
  • 作者简介:姜宇(1979-),男,副教授,博士.研究方向:环境智能感知.E-mail:jiangyu2011@jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(62072211)

Segmentation-based detector for traditional Chinese newspaper

Yu JIANG1,2(),Jia-zheng PAN1,He-huai CHEN1,Ling-zhi FU1,Hong QI1,2()   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
  • Received:2021-08-27 Online:2023-04-01 Published:2023-04-20
  • Contact: Hong QI E-mail:jiangyu2011@jlu.edu.cn;qihong@jlu.edu.cn

摘要:

现阶段文本检测的研究主要面向自然场景数据集进行,针对繁体中文图像内嵌文本场景难以有效检测的问题,本文提出了一个基于分割方法的繁体中文报纸图像文本检测模型。该模型使用Resnet50和FPN作为特征提取网络,采用分割实例缩放加扩展算法的方法生成用于预测文本框的二值图,并提出了周围填补、循环检测加区域覆盖的方法增强检测效果。该模型针对自建繁体中文报纸数据集的实验结果的三项指标均在0.9左右,且相比于目前文本检测效果较好的DBNet的实验结果均提升了5%~7%,针对繁体中文报纸图像文本检测任务具有一定的优越性。

关键词: 计算机应用, 特定场景, 繁体中文报纸, 图像内嵌文本, 文本检测

Abstract:

Most of the research on text detection has been conducted on natural scene datasets, few on such specific scene. And the existing models are not good enough for text detection on traditional Chinese newspaper. In order to solve this problem, a segmentation-based text detector for traditional Chinese newspaper is proposed in this paper. The model uses Resnet50 and FPN as feature extraction network, employing a segmentation instance scaling and extension algorithm to generate the binary map for predicting text boxes. And the methods of surrounding filling and loop detection plus region coverage are proposed to enhance the detection effect. In addition, a traditional Chinese newspaper dataset is built to satisfy the research needs. The experimental results of this model on traditional Chinese newspaper dataset are around 0.9 and are improved by 5% to 7% compared with DBNet, which indicates that the model is effective and accurate for text detection on traditional Chinese newspaper.

Key words: computer application, specific scene, traditional Chinese newspaper, text embedded in images, text detection

中图分类号: 

  • TP391

图1

模型前端网络架构"

图2

两种二值化公式的图像"

图3

不同阈值下指标的平均值"

图4

检测结果示例"

图5

不同分割尺寸的分割实例"

图6

生成文本框的流程"

图7

覆盖部分文本区域后的图片"

图8

循环检测加区域覆盖的流程"

表1

中文标题"

模型RecallPrecisionF?score

不使用周围填补

使用周围填补

0.864

0.899

0.855

0.912

0.859

0.906

表2

中文标题"

模型RecallPrecisionF?score

不使用该方法

使用该方法

0.870

0.899

0.902

0.912

0.886

0.906

表3

繁体中文报纸文本检测数据集评估结果"

模型RecallPrecisionF?score

MSER+SWT

CRAFT

FCENet

DBNet+MobileNetv3

DBNet+Resnet18

DBNet+Resnet50

本文

0.558

0.581

0.654

0.900

0.873

0.840

0.899

0.719

0.343

0.656

0.730

0.737

0.836

0.912

0.628

0.431

0.655

0.805

0.799

0.838

0.906

1 Tian Z, Huang W L, He T, et al. Detecting text in natural image with connectionist text proposal network[C]∥European Conference on Computer Vision, Springer, Cham, 2016: 56-72.
2 Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments[C]∥Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, 2017: 2550-2558.
3 Zhou X Y, Yao C, Wen H, et al. East: an efficient and accurate scene text detector[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5551-5560.
4 Wang X X, Jiang Y Y, Luo Z B, et al. Arbitrary shape scene text detection with adaptive text region representation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 6449-6458.
5 Zhu Y Q, Chen J L, Liang L Y, et al. Fourier contour embedding for arbitrary-shaped text detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 3123-3131.
6 Liu Y L, He T, Chen H, et al. Exploring the capacity of sequential-free box discretization network for omnidirectional scene text detection[J/OL].[2019-03-12]. arXiv preprint arXiv:.
7 Liu Y L, Chen H, Shen C H, et al. Abcnet: real-time scene text spotting with adaptive bezier-curve network[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9809-9818.
8 Liao M H, Shi B G, Bai X, et al. Textboxes: a fast text detector with a single deep neural network[C]∥Thirty-first AAAI conference on artificial intelligence, San Francisco,USA, 2017: 4164-4167.
9 Liao M H, Shi B G, Bai X. Textboxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
10 Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580-587.
11 Gkioxari G, Girshick R, Malik J. Contextual action recognition with R*CNN[C]∥Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1080-1088.
12 Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
13 Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]∥European conference on computer vision, Springer, Cham, 2016: 21-37.
14 Wang W H, Xie E Z, Li X, et al. Shape robust text detection with progressive scale expansion network[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9336-345.
15 Tian Z T, Shu M, Lyu P, et al. Learning shape-aware embedding for scene text detection[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 4234-4243.
16 Baek Y, Lee B, Han D, et al. Character region awareness for text detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9365-9374.
17 Wang W H, Xie E Z, Song X G, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 8440-8449.
18 Lv P, Liao M H, Yao C, et al. Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 67-83.
19 Liao M H, Wan Z Y, Yao C, et al. Real-time scene text detection with differentiable binarization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11474-11481.
20 Zhu Y X, Du J. Textmountain: accurate scene text detection via instance segmentation[J]. Pattern Recognition, 2021, 110: No. 107336.
21 Vatti B R. A generic solution to polygon clipping[J]. Communications of the ACM, 1992, 35(7): 56-63.
22 Nayef N, Yin F, Bizid I, et al. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]∥The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017: 1454-1459.
23 Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading[C]//The 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015: 1156-1160.
24 Matas J, Chum O, Urban M, et al. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and vision computing, 2004, 22(10): 761-767.
25 Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 2963-2970.
[1] 薛珊,张亚亮,吕琼莹,曹国华. 复杂背景下的反无人机系统目标检测算法[J]. 吉林大学学报(工学版), 2023, 53(3): 891-901.
[2] 吴振宇,刘小飞,王义普. 基于DKRRT*-APF算法的无人系统轨迹规划[J]. 吉林大学学报(工学版), 2023, 53(3): 781-791.
[3] 陶博,颜伏伍,尹智帅,武冬梅. 基于高精度地图增强的三维目标检测算法[J]. 吉林大学学报(工学版), 2023, 53(3): 802-809.
[4] 潘弘洋,刘昭,杨波,孙庚,刘衍珩. 基于新一代通信技术的无人机系统群体智能方法综述[J]. 吉林大学学报(工学版), 2023, 53(3): 629-642.
[5] 何颖,樊俊松,王巍,孙庚,刘衍珩. 无人机空地安全通信与航迹规划的多目标联合优化方法[J]. 吉林大学学报(工学版), 2023, 53(3): 913-922.
[6] 郭鹏,赵文超,雷坤. 基于改进Jaya算法的双资源约束柔性作业车间调度[J]. 吉林大学学报(工学版), 2023, 53(2): 480-487.
[7] 刘近贞,高国辉,熊慧. 用于脑组织分割的多尺度注意网络[J]. 吉林大学学报(工学版), 2023, 53(2): 576-583.
[8] 时小虎,吴佳琦,吴春国,程石,翁小辉,常志勇. 基于残差网络的弯道增强车道线检测方法[J]. 吉林大学学报(工学版), 2023, 53(2): 584-592.
[9] 祁贤雨,王巍,王琳,赵玉飞,董彦鹏. 基于物体语义栅格地图的语义拓扑地图构建方法[J]. 吉林大学学报(工学版), 2023, 53(2): 569-575.
[10] 周丰丰,朱海洋. 基于三段式特征选择策略的脑电情感识别算法SEE[J]. 吉林大学学报(工学版), 2022, 52(8): 1834-1841.
[11] 曲福恒,丁天雨,陆洋,杨勇,胡雅婷. 基于邻域相似性的图像码字快速搜索算法[J]. 吉林大学学报(工学版), 2022, 52(8): 1865-1871.
[12] 白天,徐明蔚,刘思铭,张佶安,王喆. 基于深度神经网络的诉辩文本争议焦点识别[J]. 吉林大学学报(工学版), 2022, 52(8): 1872-1880.
[13] 赵宏伟,张健荣,朱隽平,李海. 基于对比自监督学习的图像分类框架[J]. 吉林大学学报(工学版), 2022, 52(8): 1850-1856.
[14] 秦贵和,黄俊锋,孙铭会. 基于双手键盘的虚拟现实文本输入[J]. 吉林大学学报(工学版), 2022, 52(8): 1881-1888.
[15] 胡丹,孟新. 基于时变网格的对地观测卫星搜索海上船舶方法[J]. 吉林大学学报(工学版), 2022, 52(8): 1896-1903.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!