In order to
find selenium rich soil quickly, efficiently and accurately using selenium free
data, it is necessary to build the best model to predict selenium rich soil.
502 data sets were selected from 1 277 1∶50 000 surface soil geochemical data. With
w(Zn),w(K2O),w(P),w(Mo),w(Mn),w(Cr),pH,D(Devonian) as independent variables and Se rich or not as
dependent variables, SPSS Modeler 18 software was used to build binary Logistic
regression model, multi-layer perceptron neural network model, random forest
model and support vector machine model (linear, multinomial, radial basis
function, Sigmoid) for predicting Se rich soil, and the measured data of 35
soil samples were used for verification. The results show that, using binary
Logistic regression model, multilayer perceptron neural network model, random
forest model and support vector machine model (linear, polynomial, radial basis
function, Sigmoid), the overall accuracy of prediction and verification of the
seven prediction models and were 88.8% and 94.3%, 91.0% and 97.1%, 96.6% and
97.1%, 87.9% and 97.1%, 86.1% and 94.3%, 86.9% and 94.3%, 80.3% and 91.4%. The
AUC were 0.948, 0.950, 0.993, 0.937, 0.945, 0.928 and 0.873, respectively. The
accuracy and stability of the random forest model are the best. Meanwhile, this
study identified clean selenium-rich soil and green selenium-rich mountain
rice, indicating that this method is feasible in the prediction of
selenium-rich soil, and it can be further extended to geological prospecting
and environmental monitoring.