吉林大学学报(信息科学版)

• 论文 • 上一篇    下一篇

基于G4ICCS系统的数据挖掘并行算法

刘威a,b, 路来君c, 王洪肖b, 曹延波b   

  1. 吉林大学 a. 综合信息矿产预测研究所, 长春 130026; b. 公共计算机教学与研究中心, 长春 130012; c. 地球科学学院, 长春 130026
  • 收稿日期:2013-02-28 出版日期:2013-05-27 发布日期:2013-06-07
  • 作者简介:刘威(1977—), 男, 黑龙江肇州人, 吉林大学副教授, 博士研究生, 主要从事空间数据集成与系统集成、 地学G4I系统云计算与仿真研究, (Tel)86-13504423383(E-mail)liuwei@jlu.edu.cn; 通讯作者:路来君(1956—), 男, 吉林德惠人, 吉林大学教授, 博士生导师, 主要从事数字地学、 地理信息系统和地学空间信息技术研究, (Tel)86-18604402821(E-mail)lulj1956@163.com
  • 基金资助:

    吉林省“十二五”矿产资源规划预测基金资助项目(3R212H104422)

Data Mining Parallel Algorithm Based on G4ICCS

LIU Weia,b, LU Lai-junc, WANG Hong-xiaob, CAO Yan-bob   

  1. a. Mineral Prediction Institute, Jilin University, Changchun 130026, China; b. Center for Computer Fundamental Education, Jilin University,Changchun 130012, China; c. College of Earth Sciences, Jilin University, Changchun 130026, China
  • Received:2013-02-28 Online:2013-05-27 Published:2013-06-07

摘要:

针对传统决策树SPRINT(Scalable Parallelizable Induction of Decision Trees)算法不能处理海量地学数据挖掘的问题, 设计实现了基于G4ICCS(Geology Geography Geochemistry Geophysics Information Cloud Computing System)的决策树并行分类算法PSPRINT。该算法使用哈希表存储连续属性分割点两侧的数据记录, 为并行节点的分割提供依据, 在MapReduce架构下解决了海量地学数据挖掘问题。实验结果表明, 在模拟的云计算环境下, 决策树并行算法可以处理海量地学数据分类问题, 并获得较好的稳定性和较高的处理速度。

关键词: 地学G4ICCS系统, 数据挖掘, 决策树算法, 并行

Abstract:

For the traditional decision tree SPRINT(Scalable Parallelizable Induction of Decision Trees) algorithm cannot solve the problem of mass geoscience data mining, the paper designed and realized PSPRINT algorithm. It is a decision tree parallel classification algorithm based on G4ICCS (Geology Geography Geochemistry Geophysics Information Cloud Computing System). The algorithm uses hash table to save data record on both sides of continuous attributes po
intof division, providing basis for the division of parallel node, and solved mass geoscience data mining problem. The experimental results show that the decision t
ree parallel algorithm can deal with the classification problem of mass geoscience data under the simulated environment of cloud computing. And the algorithm has better stability and processing speed.

Key words: geology geography geochemistry geophysics information cloud computing system(G4ICCS), data mining, decision tree algorithm, parallel

中图分类号: 

  • TP312