岩性识别,机器学习,随机森林,极端随机树,平衡数据 ," /> 岩性识别,机器学习,随机森林,极端随机树,平衡数据 ,"/> lithology identification, machine learning, random forest, extra trees, data balancing ,"/> <p class="pf0"> <span class="cf0">Lithology Identification Using Extra Trees Based </span><span class="cf0">on SMOTE for Data </span><span class="cf0">Balancing</span>

Journal of Jilin University(Earth Science Edition) ›› 2025, Vol. 55 ›› Issue (4): 1372-1386.doi: 10.13278/j.cnki.jjuese.20240116

Previous Articles     Next Articles

Lithology Identification Using Extra Trees Based on SMOTE for Data Balancing

Cao Zhimin1, 2, Zhang Li1, 2, Zheng Bing3, Han Jian1, 2   

  1. 1. Sanya Offshore Oil and Gas Research Institute, Northeast Petroleum University, Sanya 572000, Hainan, China

    2. School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing 163318, Heilongjiang, China

    3. Hainan Engineering Research Center for Virtual Reality Technology and Systems,Hainan Vocational University of Science and Technology, Haikou 571126, China

  • Received:2024-05-24 Online:2025-07-26 Published:2025-08-05
  • Supported by:

    the Hainan Province Science and Technology Special Fund (ZDYF2022GXJS220, ZDYF2022GXJS222)

Abstract: In the domains of oil and gas exploration and geoengineering, precise lithology identification holds paramount importance for the assessment and utilization of resources. The inherent complexity of geologic data and the imbalanced distribution of lithology samples pose significant challenges to traditional methods in terms of lithology identification. In this paper, we propose a methodology for lithology identification that combines SMOTE (synthetic minority over-sampling technique) with extra trees. Firstly, the SMOTE method is employed to enhance the representation of minority class samples, thereby improving the balance of the training data. Secondly, the lithology classification model is constructed using the high efficiency and strong generalization ability of extra trees. The experimental findings demonstrate that the recognition accuracy of extra trees is 85.54%, which is 5.58%, 2.55%, 2.35%, and 2.08% higher than that of other machine learning methods—gradient boosting decision tree (GBDT), extreme gXGBoost), light gradient boosting machine (LightGBM), and random forest method, respectively. The prediction bias of the model caused by sample imbalance is mitigated by SMOTE sampling, resulting in enhanced recognition accuracy for specific lithology categories within each model. Consequently, this leads to an overall enhancement in the performance of the model. The extra trees model exhibits the best performance, achieving an identification accuracy of 86.62%, which represents improvements of 4.71%, 2.56%, 1.55%, and 2.02% over GBDT, XGBoost, LightGBM, and random forest, respectively. These results confirm the effectiveness of combining SMOTE with extra trees for lithology identification.

Key words: lithology identification')">

lithology identification, machine learning, random forest, extra trees, data balancing

CLC Number: 

  • P631.8
[1] Yang Lan, Wang Yun, Zou Yongjun, Hu Baoqun, Li Mangen, Zhang An, Zhu Manhuai. Construction and Comparison of Models for Predicting Selenium Rich Soil Based on Machine Learning: A Case Study of Youshan Area,Xinfeng County, Jiangxi Province [J]. Journal of Jilin University(Earth Science Edition), 2025, 55(5): 1629-1643.
[2] Zhang Shengyu, Shen Wenchao, Su Xiaosi.  Risk Assessment of Regional Groundwater Nitrate Pollution Based on Random Forest Method [J]. Journal of Jilin University(Earth Science Edition), 2025, 55(3): 943-956.
[3] Wang Mingchang, Yu Haibin, Zeng Zhaofa, Wang Dian, Han Fuxing, Zhang Jian, Luo Xiujie, Leng Liang, Liu Ziwei.  Prediction of Urban Road Collapse Susceptibility Based on Multi-Source Remote Sensing Data [J]. Journal of Jilin University(Earth Science Edition), 2025, 55(3): 1028-1038.
[4] Cao Zhimin , , Ding Lu, Han Jian, , Hao Lechuan, . Large-Scale Difference Super-Resolution of Logging Curves Based on Integrated Machine Learning [J]. Journal of Jilin University(Earth Science Edition), 2025, 55(2): 670-685.
[5] Lyu Huaxing, Chen Zhaoming, Zhang Zhenbo, Jiang Dapeng, Li Kecheng, Guo Wei. Application of Machine Learning High-Resolution Fusion Inversion in Stratigraphic Correlation: A Case Study of Kaiping A Structural #br# Belt in Kaiping Sag of the Pearl River Mouth Basin [J]. Journal of Jilin University(Earth Science Edition), 2025, 55(1): 289-297.
[6] An Xuelian, Mi Changlin, Sun Deliang, Wen Haijia, Li Xiaoqin, Gu Qingyu, Ding Yuekai. Comparison of  Landslide Susceptibility in Three Gorges Reservoir Area Based on Different Evaluation Units——Take Yunyang County in Chongqing as an Example [J]. Journal of Jilin University(Earth Science Edition), 2024, 54(5): 1629-1644.
[7] Wang Xinling, Zhu Xinyi, Zhang Hongbing, Sun Bo, Xu Kexin. Lithology Identification Method for Logging While Drilling Based on Random Tree Embedding [J]. Journal of Jilin University(Earth Science Edition), 2024, 54(2): 701-708.
[8] Wang Mingchang, Ding Wen, Zhao Jingzheng, Wu Linlin, Wang Fengyan, Ji Xue. Remote Sensing Identification of Dendrolimus Superans Infestation Based on Knowledge Graph and Random Forest [J]. Journal of Jilin University(Earth Science Edition), 2023, 53(6): 2006-2017.
[9] Wang Xuedong, Zhang Chaobiao, Wang Cui, Zhu Yongdong, Wang Haipeng. Geological Disaster Susceptibility in Helong City Based on Logistic Regression and Random Forest [J]. Journal of Jilin University(Earth Science Edition), 2022, 52(6): 1957-1970.
[10] Yang Guohua, Li Wanlu, Meng Bo. Spatiotemporal Distribution of Groundwater Ammonia Nitrogen Based on Machine Learning Methods#br# [J]. Journal of Jilin University(Earth Science Edition), 2022, 52(6): 1982-1995.
[11] Hou Xianmu, Wang Fuyong, Zai Yun, Lian Peiqing. Prediction of Carbonate Porosity and Permeability Based on Machine Learning and Logging Data [J]. Journal of Jilin University(Earth Science Edition), 2022, 52(2): 644-653.
[12] Wang Mingchang, , Liu Peng, Chen Xueye, Wang Fengyan, Song Yulian, Liu Hanyuan. Land Expansion of Urban Construction in the Three Provinces of Northeast China Based on Google Earth Engine [J]. Journal of Jilin University(Earth Science Edition), 2022, 52(1): 292-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] CHENG Li-ren, ZHANG Yu-jie, ZHANG Yi-chun. Ordovician Nautiloid Fossils of Xainza Region,Tibet[J]. J4, 2005, 35(03): 273 -0282 .
[2] LI Bing-cheng. Preliminary Studies on Holocene Climatic In Fuping,Shaanxi Province[J]. J4, 2005, 35(03): 291 -0295 .
[3] HE Zhong-hua,YANG De-ming,WANG Tian-wu,ZHENG Chang-qing. SHRIMP U[CD*2]Pb Dating of Zircons from Two-Mica Granite in Baga Area in Gangdise Belt[J]. J4, 2005, 35(03): 302 -0307 .
[4] CHEN Li, NIE Lei, WANG Xiu-fan, LI Jin. Seismic Risk Analysis of Some Electric Power Equipment Station in Suizhong[J]. J4, 2005, 35(05): 641 -645 .
[5] JI Hong-jin,SUN Feng-yue2,CHEN Man,HU Da-qian,SHI Yan-xiang,PAN Xiang-qing. Geochemical Evaluation for Uncovered GoldBearing Structures in Jiaodong Area[J]. J4, 2005, 35(03): 308 -0312 .
[6] CHU Feng-you, SUN Guo-sheng,LI Xiao-min,MA Wei-lin, ZHAO Hong-qiao. The Growth Habit and Controlling Factors of the CobaltRich Crusts in Seamount of the Central Pacific[J]. J4, 2005, 35(03): 320 -0325 .
[7] LI Bin, MENG Zi-fang, LI Xiang-bo, LU Hong-xuan, ZHENG Min. The Structural Features and Depositional Systems of the Early Tertiary in the Biyang Depression[J]. J4, 2005, 35(03): 332 -0339 .
[8] LI Tao, WU Sheng-jun, CAI Shu-ming, XUE Huai-ping, YASUNORI Nakayama. Simulation Analysis of the Storage Capacity Based on DEM Before and After Connecting to Yangtze River in Zhangdu Lake[J]. J4, 2005, 35(03): 351 -0355 .
[9] KUANG Li-xiong,GUO Jian-hua, MEI Lian-fu, TONG Xiao-lan, YANG Li. Study on the Upheaval of the Bogeda Mountain Block from Angle of Oil and Gas Exploration[J]. J4, 2005, 35(03): 346 -0350 .
[10] ZHANG Guang-xin, DENG Wei, HE Yan, RAMSIS Salama. An Application of Hydrological Response Units in Assessment of Soil Salinization Risks[J]. J4, 2005, 35(03): 356 -0360 .