吉林大学学报(地球科学版) ›› 2019, Vol. 49 ›› Issue (2): 611-620.doi: 10.13278/j.cnki.jjuese.20180016

• 地球探测与信息技术 • 上一篇    

基于决策树特征提取的支持向量机在岩性分类中的应用

韩启迪1, 张小桐2, 申维1   

  1. 1. 中国地质大学(北京)地球科学与资源学院, 北京 100083;
    2. 中国土地勘测规划院数据中心, 北京 100035
  • 收稿日期:2018-01-23 出版日期:2019-03-26 发布日期:2019-03-28
  • 作者简介:韩启迪(1989-),男,博士研究生,主要从事非线性和机器学习研究,E-mail:qidihan@163.com
  • 基金资助:
    国家自然科学基金项目(41172302,40672196)

Application of Support Vector Machine Based on Decision Tree Feature Extraction in Lithology Classification

Han Qidi1, Zhang Xiaotong2, Shen Wei1   

  1. 1. School of Earth Sciences and Resources, China University of Geosciences(Beijing), Beijing 100083, China;
    2. China Land Surveying and Planning Institute, Beijing 100035, China
  • Received:2018-01-23 Online:2019-03-26 Published:2019-03-28
  • Supported by:
    Supported by National Natural Science Foundation of China (41172302, 40672196)

摘要: 由于支持向量机属于黑箱模型,因此在进行模型学习时无法直接对特征进行选择,而决策树模型在递归创建的过程中自身具有一定的特征选择能力。针对岩性分类问题,本文将决策树和支持向量机结合,通过决策树的建立,在考虑特征重要性的前提下,利用树节点的高度对特征进行提取,并将具有更高分类能力的特征送入支持向量机进行岩性分类。结果表明:通过决策树的特征提取,减少了支持向量机模型的输入特征,从而有效控制了模型的复杂度,使得模型更加稳定并具有更高的分类精度,测试集精度能够提升10%以上。

关键词: 支持向量机, 决策树, 特征提取, 岩性分类

Abstract: Support vector machine is a kind of black box model,and its feature cannot be selected directly when learning model;while decision tree model has the ability of feature selection during the process of recursive creation.For lithology classification,we combined decision tree with support vector machine.In consideration with the importance of the features,we used the tree height to extract the features after the decision tree establishment,and furthermore,we used the features with higher classification ability to fed into the support vector machine.The results show that the feature extraction of decision tree can reduce the input characteristics,so this,in turn,makes the SVM model more stable and accurate through controlling the complexity of the model effectively.The accuracy of test set of the model can be increased by more than 10%.

Key words: support vector machine, decision tree, feature extraction, lithology classification

中图分类号: 

  • P58
[1] 李航.统计学习方法[M].北京:清华大学出版社,2012. Li Hang.Statistical Learning Method[M].Beijing:Tsinghua University Press,2012.
[2] 于代国,孙建孟,王焕增,等.测井识别岩性新方法:支持向量机方法[J].大庆石油地质与开发,2005,24(5):93-95. Yu Daiguo,Sun Jianmeng,Wang Huanzeng,et al.A New Method of Logging Recognition Lithology:Support Vector Machine Method[J].Daqing Petroleum Geology and Development,2005,24(5):93-95.
[3] 周继宏,袁瑞.基于支持向量机的复杂碎屑岩储层岩性识别[J].石油天然气学报,2012,34(7):72-75. Zhou Jihong,Yuan Rui.Lithology Identification of Complex Clastic Rock Reservoirs Based on Support Vector Machine[J]. Journal of Oil and Gas Technolog,2012,34(7):72-75.
[4] 张翔,肖小玲,严良俊,等.基于模糊支持向量机方法的岩性识别[J].石油天然气学报(江汉石油学院学报),2009,31(6):115-118. Zhang Xiang,Xiao Xiaoling,Yan Liangjun,et al.Lithology Identification Based on Fuzzy Support Vector Machine[J].Journal of Oil and Gas Technolog,2009,31(6):115-118.
[5] 李洪奇,谭锋奇,许长福,等.基于决策树方法的砾岩油藏岩性识别[J].测井技术,2010,34(1):16-21. Li Hongqi,Tan Fengqi,Xu Changfu,et al.Lithology Identification of Conglomerate Reservoir Based on Decision Tree Method[J].Well Logging Technology,2010,34(1):16-21.
[6] 石广仁.支持向量机在裂缝预测及含气性评价应用中的优越性[J].石油勘探与开发,2008,35(5):589-594. Shi Guangren.Superiorities of Support Vector Machine in Fracture Prediction and Gassinesse Valuation[J].Petroleum Exploration and Development,2008,35(5):589-594.
[7] 桑吉夫·库尔卡尼,吉尔伯特·哈曼.统计学习理论基础[M].肖忠祥,闫效莺,段沛沛,等译.北京:机械工业出版社,2017. Kulkarni S,Harman G.An Elementary Introduction to Statistical Learining Theory[M].Translated by Xiao Zhongxiang,Yan Xiaoying,Duan Peipei,et al.Beijing:Machinery Industry Press,2017.
[8] 王建国,董泽宇,张文兴,等.基于回归树的支持向量机规则提取及应用[J].计算机工程与应用,2017,53(6):236-240. Wang Jianguo,Dong Zeyu,Zhang Wenxing,et al.Rule Extraction of Support Vector Machine Based on Regression Tree and Application[J].Computer Engineering and Applications,2017,53(6):236-240.
[9] Barakat N,Bradley A P.Rule Extraction from Support Vector Machines:A Review[J].Neurocomputing,2010,74(5):178-190.
[10] 温小霓,蔡汝骏.分类与回归树及其应用研究[J].统计与决策,2007(23):14-16. Wen Xiaoni,Cai Rujun.Classification and Regression Tree and Its Application Research[J].Statistics and Decision,2007(23):14-16.
[11] 谢益辉.基于R软件rpart包的分类与回归树应用[J].统计与信息论坛,2007,22(5):67-70. Xie Yihui.Classification and Regression Tree Application Based on R Software Rpart Package[J].Forum on Statistics and Information,2007,22(5):67-70.
[12] 周志华.机器学习[M].北京:清华大学出版社,2016:1-415. Zhou Zhihua.Machine Learning[M].Beijing:Tsinghua University Press,2016:1-415.
[13] 范淼,李超.Python机器学习及实践[M].北京:清华大学出版社,2016:1-180. Fan Miao,Li Chao.Python Machine Learning and Practice[M].Beijing:Tsinghua University Press,2016:1-180.
[14] 张冰,郭智奇,徐聪,等.基于岩石物理模型的页岩储层裂缝属性及各向异性参数反演[J].吉林大学学报(地球科学版),2018,48(4):1244-1252. Zhang Bing,Guo Zhiqi,Xu Cong,et al.Fracture Properties and Anisotropic Parameters Inversion of Shales Based on Rock Physics Model[J].Journal of Jilin University (Earth Science Edition),2018,48(4):1244-1252.
[15] 杨震宇.基于机器学习的分类算法研究[J].科学中国人,2017,2:22-25. Yang Zhenyu.Research on Classification Algorithm Based on Machine Learning[J].Scientific Chinese,2017,2:22-25.
[16] 丁世飞,齐丙娟,谭红艳.支持向量机理论与算法研究综述[J].电子科技大学学报,2011,40(1):2-10. Ding Shifei,Qi Bingjuan,Tan Hongyan.An Overview on Theory and Algorithm of Support Vector Machines[J].Journal of University of Electronic Science and Technology of China,2011,40(1):2-10.
[17] 冷强奎,李玉鑑.使用SVM和二叉树结构的分片线性分类器[J].中国科技论文,2015,10(2):164-168. Leng Qiangkui,Li Yujian.A Piecewise Linear Classifier Using SVM and two Forked Tree Structure[J].China Sciencepaper,2015,10(2):164-168.
[18] 石广仁.支持向量机在多地质因素分析中的应用[J].石油学报,2008,29(2):195-198. Shi Guangren.Application of Support Vector Machine to Multi-Geological-Factor Analysis[J].Acta Petrolei Sinica,2008,29(2):195-198.
[19] Janez D,Dale S.Statistical Comparisons of Classifiers over Multiple DataSets[J].Journal of Machine Learning Research,2006,7(1):1-30.
[20] Zhang M,Zhou Z.A Review on Multi-Label Learning Algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
[21] Yoonkyung L,Yi L,Grace W.Multicategory Support Vector Machines:Theory and Application to the Classification of Microarray Data and Satellite Radiance Data[J].Journal of the American Statistical Association,2004,99:67-81.
[22] Fan R E,Chang K W,Hsieh C J,et al.LIBLINEAR:A Library for Large Linear Classification[J].Journal of Machine Learning Research,2008,9:1871-1874.
[23] Shwartz S S,Singer Y,Srebro N,et al.Pegasos:Primal Estimated Sub-Gradient Solver for SVM[J].Mathematical Programming,2011,127(1):3-30.
[24] Bouchaffra D,Vitae A,Cheriet M.Machine Learning and Pattern Recognition Models in Change Detection[J].Pattern Recognition,2015,48(3):613-615.
[25] Collins M,Schapire R E,Singer Y.Logistic Regression,Ada Boost and Bregman Distances[J].Machine Learning,2002,48(1):235-285.
[26] Canu S,Smola A.Kernel Methods and the Exponential Family[J].Neurocomputing,2006,69(7):714-720.
[27] Tsochantaridis I,Joachims T,Hofmann T,et al.Large Margin Methods for Structured and Interdependent Output Variables[J].Journal of Machine Learning Research,2005,1:1453-1484.
[28] Chang C C,Lin C J.LIBSVM:A Library for Support Vector Machines[J/OL].ACM Transactions on Intelligent Systems and Technology,2011,2(3).http://dx.doi.org/10.1145/1961189.1961199.
[1] 徐守余, 路研, 王亚. 基于支持向量机的浊积扇低渗透储层流动单元研究[J]. 吉林大学学报(地球科学版), 2018, 48(5): 1330-1341.
[2] 王明常, 张馨月, 张旭晴, 王凤艳, 牛雪峰, 王红. 基于极限学习机的GF-2影像分类[J]. 吉林大学学报(地球科学版), 2018, 48(2): 373-378.
[3] 卢文喜, 郭家园, 董海彪, 张宇, 林琳. 改进的支持向量机方法在矿山地质环境质量评价中的应用[J]. 吉林大学学报(地球科学版), 2016, 46(5): 1511-1519.
[4] 王常明, 田书文, 王翊虹, 阮云凯, 丁桂伶. 泥石流危险性评价:模糊c均值聚类-支持向量机法[J]. 吉林大学学报(地球科学版), 2016, 46(4): 1168-1175.
[5] 杨雪峰, 王雪梅, 毛东雷. 塔里木河下游土地利用覆被MISR多角度遥感制图[J]. 吉林大学学报(地球科学版), 2016, 46(2): 617-626.
[6] 秦喜文, 刘媛媛, 王新民, 董小刚, 张瑜, 周红梅. 基于整体经验模态分解和支持向量回归的北京市PM2.5预测[J]. 吉林大学学报(地球科学版), 2016, 46(2): 563-568.
[7] 周林飞, 陈启新, 成遣, 张静. 利用粗糙集理论进行遥感分类信息提取[J]. 吉林大学学报(地球科学版), 2015, 45(4): 1246-1256.
[8] 牟丹, 王祝文, 黄玉龙, 许石, 周大鹏. 基于最小二乘支持向量机测井识别火山岩类型:以辽河盆地中基性火山岩为例[J]. 吉林大学学报(地球科学版), 2015, 45(2): 639-648.
[9] 陈圣波,刘彦丽,杨倩,周超,赵靓. 植被覆盖区卫星高光谱遥感岩性分类[J]. 吉林大学学报(地球科学版), 2012, 42(6): 1959-1965.
[10] 杨佳佳, 姜琦刚, 赵静, 徐言, 孟翔冲. 基于改进的SVM技术和高光谱遥感的标准矿物定量计算[J]. J4, 2012, 42(3): 864-871.
[11] 牛瑞卿, 彭令, 叶润青, 武雪玲. 基于粗糙集的支持向量机滑坡易发性评价[J]. J4, 2012, 42(2): 430-439.
[12] 许长福, 李雄炎, 谭锋奇, 于红岩, 李洪奇. 任务驱动数据挖掘方法的提出及在低阻油层识别中的应用[J]. J4, 2012, 42(1): 39-46.
[13] 王利花, 周云轩, 田波. 基于TM和ETM+影像数据的东沙环礁珊瑚礁监测[J]. J4, 2011, 41(5): 1630-1637.
[14] 叶润青, 牛瑞卿, 张良培. 基于多尺度分割的岩石图像矿物特征提取及分析[J]. J4, 2011, 41(4): 1253-1261.
[15] 佴磊, 彭文, 袁明哲, 周能娟. 基于经验模态分解和加权最小二乘支持向量机的采空区地面塌陷预测[J]. J4, 2011, 41(3): 799-804.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!