|
Parameter Optimization of Text Feature Vectorof Nave Bayesian Classifier
FANG Qiulian, WANG Peijin, SUI Yang, ZHENG Hanying, LV Chunyue, WANG Yantong
Journal of Jilin University Science Edition. 2019, 57 (06):
1479-1485.
Nave Bayesian algorithm was used to build an automatic Chinese text classifier, and the selection of relevant parameter was studied to realize the efficient classification of Chinese text. Firstly, in model training stage, N-gram model was used to extract feature vectors from training data sets. Secondly, Nave Bayesian algorithm was used to build a text classifier. Finally, in model testing stage, in order to improve the classification accuracy, term frequencyinverse document frequency algorithm was used to extract feature vectors of the test samples. The results show that when extracting feature vectors from training sets, 2-gram model and 4-gram model have the best effect of feature extraction; when selecting the length of feature vectors, the length of 25 000 can make the greatest increment of classification accuracy and ensure a higher accuracy; when determining the characteristic of feature items, the accuracy is the highest when both verbs and nouns are selected, and the lowest when only verbs are selected.
Related Articles |
Metrics
|