Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (3): 1011-1016.doi: 10.13229/j.cnki.jdxbgxb20200113

   

Geospatial data extraction algorithm based on machine learning

Xiao-long ZHU1,2(),Zhong XIE1()   

  1. 1.School of Geography and Information Engineering,China University of Geosciences(Wuhan),Wuhan 430074,China
    2.College of Geoscience,Yangtze University,Wuhan 430100,China
  • Received:2020-02-27 Online:2021-05-01 Published:2021-05-07
  • Contact: Zhong XIE E-mail:zxlong0224@tom.com

Abstract:

In order to improve the accuracy and recall of geospatial data integration extraction, a geospatial data extraction algorithm based on machine learning is proposed. GeoNames, OpenStreetMap, etc. are used as the data sources of geographic information. Through web crawler and search engine, the relevant web pages are searched and downloaded at the same time, and the content is filtered. After filtering, the location name and address information and other data in the web pages are parsed and extracted to realize visualization. By analyzing and extracting the geographic data entities, using the mapping between geographic data and entities, the disambiguation of heterogeneous geographic data is eliminated, the integration of geospatial data is realized, and the digital features of geographic data are realized according to the similarity degree calculation of multi features such as entity name and category. Combined with multi feature and machine learning KNN classification method, the proposed algorithm can complete the automation of geographic data link and realize the classification and extraction of geospatial data. The experimental results show that the proposed algorithm has high precision and recall, and the data extraction effect is good, which can lay a foundation for the integrated extraction of geographic data.

Key words: computer application, machine learning, geospatial data, extraction algorithm

CLC Number: 

  • P208

Fig.1

Overview of geospatial dataintegration and extraction"

Fig.2

Flow chart of extracting geographicdata from web pages"

Fig.3

Schematic digram of definition relationship of common geographic data ontology"

Fig.4

Schematic diagram of operation procedureof geographic data link method"

Fig.5

Effect of geospatial data extractionbased on machine learning"

1 刘学, 刘张霞. 村镇区域规划中统计数据空间化研究初探[J]. 中国农业资源与区划, 2016, 37(5):27-34.
Liu Xue, Liu Zhang-xia. Preliminary study on spatialization of statistical data in village and town planning[J]. Chinese Journal of Agricultural Resources and Regional Planning, 2016, 37(5):27-34.
2 周志光, 余佳珺, 郭智勇, 等. 平行坐标轴动态排列的地理空间多维数据可视分析[J]. 中国图象图形学报, 2019, 24(6):956-968.
Zhou Zhi-guang, Yu Jia-jun, Guo Zhi-yong, et al. Visual analysis of geospatial multi-dimensional data via a dynamic arrangement of parallel coordinates[J]. Journal of Image and Graphics, 2019, 24(6):956-968.
3 何振芳, 郭庆春, 赵牡丹, 等. 基于小波分析的复杂地貌区DEM自动综合研究[J]. 地理与地理信息科学, 2019, 35(4):57-63.
He Zhen-fang, Guo Qing-chun, Zhao Mu-dan, et al. Research on DEM automatic synthesis in complex geomorphic areas based on wavelet analysis[J]. Geography and Geo-Information Science, 2019, 35(4):57-63.
4 孙凯, 诸云强, 潘鹏, 等. 形态本体及其在地理空间数据发现中的应用研究[J]. 地球信息科学学报, 2016, 18(8):1011-1021.
Sun Kai, Zhu Yun-qiang, Pan Peng, et al. Research on morphology-ontology and its application in geospatial data discovery[J]. Journal of Geo-Information Science, 2016, 18(8):1011-1021.
5 赵红伟, 诸云强, 侯志伟, 等. 地理空间元数据关联网络的构建[J]. 地理科学, 2016, 36(8):1180-1189.
Zhao Hong-wei, Zhu Yun-qiang, Hou Zhi-wei, et al. Construction of geospatial metadata association network[J]. Scientia Geographica Sinica, 2016, 36(8):1180-1189.
6 陆旻, 袁晓如. 地理空间数据可视化中的过滤[J]. 计算机辅助设计与图形学学报, 2016, 28(5):702-711.
Lu Min, Yuan Xiao-ru. Filter in visualization of geospatial data[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(5):702-711.
7 邵彧, 师晓利. 基于遥感数据挖掘的智能地理信息系统设计[J]. 现代电子技术, 2016, 39(10):54-57.
Shao Yu, Shi Xiao-li. Design of intelligent geographic information system based on remote sensing data mining[J]. Modern Electronics Technique, 2016, 39(10):54-57.
8 许栋浩, 李宏伟, 张铁映, 等. 一种顾及模糊属性的空间关联规则挖掘方法[J]. 测绘科学技术学报, 2016, 33(3):313-318.
Xu Dong-hao, Li Hong-wei, Zhang Tie-ying, et al. A method of spatial association rule mining considering fuzzy attributes[J]. Journal of Geomatics Science and Technology, 2016, 33(3):313-318.
9 李德仁. 展望大数据时代的地球空间信息学[J]. 测绘学报, 2016, 45(4):379-384.
Li De-ren. Towards geo-spatial information science in big data era[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(4):379-384.
10 王东旭, 诸云强, 潘鹏,等. 地理数据空间本体构建及其在数据检索中的应用[J]. 地球信息科学学报, 2016, 18(4):443-452.
Wang Dong-xu, Zhu Yun-qiang, Pan Peng, et al. Construction of geodata spatial ontology and its application in data retrieval[J]. Journal of Geo-Information Science, 2016, 18(4):443-452.
11 王晓辉,吴禄慎,陈华伟. 基于法向量距离分类的散乱点云数据去噪[J]. 吉林大学学报:工学版,2020,50(1):278-288.
Wang Xiao-hui, Wu Lu-shen, Chen Hua-wei. Denoising of scattered point cloud data based on normal vector distance classification[J]. Journal of Jilin University(Engineering and Technology Edition), 2020,50(1):278-288.
12 丁宁,常玉春,赵健博,等. 基于USB 3.0的高速CMOS图像传感器数据采集系统[J]. 吉林大学学报:工学版,2018,48(4):1298-1304.
Ding Ning, Chang Yu-chun, Zhao Jian-bo, et al. High-speed CMOS image sensor data acquisition system based on USB 3.0[J]. Journal of Jilin University(Engineering and Technology Edition), 2018,48(4):1298-1304.
13 山海涛, 程承旗, 陈波. 一种基于GeoSOT剖分网格的地理空间数据存储架构设计方法[J]. 测绘科学技术学报, 2018, 35(3):94-97, 103.
Shan Hai-tao, Cheng Cheng-qi, Chen bo. A method of the storage architecture design of geospatial data based on GeoSOT[J]. Journal of Geomatics Science and Technology, 2018, 35(3):94-97, 103.
14 熊伟,资文杰,曹竞之. 科学工作流支持的复杂地理计算流程处理[J].武汉大学学报:信息科学版,2020,45(12):1903-1909.
Xiong Wei, Zi Wen-jie, Cao Jing-zhi. Complicated geospatial flow processing with scientific workflow[J]. Geomatics and Information Science of Wuhan University, 2020,45(12):1903-1909.
15 陈磊, 王江锋, 谷远利,等. 基于思维进化优化的多源交通数据融合算法[J]. 吉林大学学报:工学版, 2019, 49(3):705-713.
Chen Lei, Wang Jiang-feng, Gu Yuan-li, et al. Multi-source traffic data fusion algorithm based onmind evolutionary algorithm optimization[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(3): 705-713.
16 蔡英凤,张为公,王海.边缘特征与局部纹理特性融合的阴影消除算法[J].江苏大学学报:自然科学版,2012,33(2):144-149.
Cai Ying-feng,Zhang Wei-gong,Wang Hai. Shadow elimination method integrated edge features and local texture characteristic[J]. Journal of Jiangsu University(Natural Science Edition),2012,33(2):144-149.
17 赵慧慧,赵凡,陈仁海,等. 基于地理空间大数据的高效索引与检索算法[J]. 计算机研究与发展,2020,57(2):333-345.
Zhao Hui-hui, Zhao Fan, Chen Ren-hai, et al. Efficient index and query algorithm based on geospatial big data[J]. Journal of Computer Research and Development, 2020,57(2):333-345.
[1] Bing-hai ZHOU,Qiong WU. Balancing and bi⁃objective optimization of robotic assemble lines [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 720-727.
[2] Xiao-hui WEI,Chang-bao ZHOU,Xiao-xian SHEN,Yuan-yuan LIU,Qun-chao TONG. Accelerating CALYPSO structure prediction with machine learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 667-676.
[3] Tian-qi GU,Chen-jie HU,Yi TU,Shu-wen LIN. Robust reconstruction method based on moving least squares algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 685-691.
[4] Qian-yi XU,Gui-he QIN,Ming-hui SUN,Cheng-xun MENG. Classification of drivers' head status based on improved ResNeSt [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 704-711.
[5] Xiao-yu WANG,Xin-hao HU,Chang-lin HAN. Face pencil drawing algorithms based on generative adversarial network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 285-292.
[6] Yuan SONG,Dan-yuan ZHOU,Wen-chang SHI. Method to enhance security function of OpenStack Swift cloud storage system [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 314-322.
[7] Ming FANG,Wen-qiang CHEN. Face micro-expression recognition based on ResNet with object mask [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 303-313.
[8] Yang LI,Shuo LI,Li-wei JING. Estimate model based on Bayesian model and machine learning algorithms applicated in financial risk assessment [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1862-1869.
[9] Bing-hai ZHOU,Zhao-xu HE. Dynamic material handling scheduling for mixed⁃model assembly lines based on line⁃integrated supermarkets [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1809-1817.
[10] Lei JIANG,Ren-chu GUAN. Design of fuzzy comprehensive evaluation system for talent quality based on multi⁃objective evolutionary algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1856-1861.
[11] Hong-wei ZHAO,Xiao-han LIU,Yuan ZHANG,Li-li FAN,Man-li LONG,Xue-bai ZANG. Clothing classification algorithm based on landmark attention and channel attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1765-1770.
[12] Nai-yan GUAN,Juan-li GUO. Component awareness adaptive model based on attitude estimation algorithms [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1850-1855.
[13] Wei FANG,Yi HUANG,Xin-qiang MA. Automatic defect detection for virtual network perceptual data based on machine learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1844-1849.
[14] Xiang-jiu CHE,You-zheng DONG. Improved image recognition algorithm based on multi⁃scale information fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1747-1754.
[15] Zhou-zhou LIU,Wen-xiao YIN,Qian-yun ZHANG,Han PENG. Sensor cloud intrusion detection based on discrete optimization algorithm and machine learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(2): 692-702.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!