吉林大学学报(理学版) ›› 2020, Vol. 58 ›› Issue (6): 1452-1460.

• • 上一篇    下一篇

基于CRF和多元规则的层次化句法分析

杨陈菊, 孙俊, 皮乾东, 邵玉斌, 龙华   

  1. 昆明理工大学 信息工程与自动化学院, 昆明 650504
  • 出版日期:2020-11-18 发布日期:2020-11-26
  • 通讯作者: 孙俊 em.junsun@gmail.com

Hierarchical Parsing Based on CRF and Multiple Rules

YANG Chenju, SUN Jun, PI Qiandong, SHAO Yubin, LONG Hua   

  1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China
  • Online:2020-11-18 Published:2020-11-26

摘要: 针对句法分析中细粒度和粗粒度组块识别模型的冲突问题, 为解决句法分析中词语搭配规则多、减少搭配优先级变动的影响, 提出一种结合条件随机场(CRF)和多元规则的层次化句法分析模型. 先利用CRF算法识别细粒度语句的组块标记序列, 然后结合统计和多元规则识别粗粒度组块, 在识别出的组块中层层引入不同优先级的二元、三元规则. 该模型实现了同时进行细粒度和粗粒度组块的识别, 可更好地服务于句法分析. 在Chinese TreeBank8.0(CTB8.0)语料上采用5-折交叉验证, 结果表明, 相比于仅使用二元、 三元规则及使用CRF+二元规则的句法分析, 该模型的正确率分别约提高12%,3%,5%, 验证了该模型有效性和稳定性.

关键词: 层次句法分析, 条件随机场, 多元规则, 组块识别

Abstract: Aiming at the problem of the conflict between fine-grained and coarse-grained chunk recognition models in parsing, in order to solve the problem of multiple collocation rules in parsing and reduce the influence of collocation priority changes, we proposed a hierarchical parsing model which combined conditional random field (CRF) with multiple rules. First, CRF algorithm was used to identify the chunk tag sequence of the fine-grained sentence, and then the coarse-grained chunks were identified by combining statistics and multiple rules, and binary and ternary rules of different priorities were introduced into the identified chunks. The model realized the identification of fine-grained and coarse-grained chunks at the same time, which could better serve parsing. On the Chinese TreeBank8.0 corpus, the 5-fold cross-validation method was used for experimental verification. The results show that it is compared with the parsing using only binary and ternary rules, as well as the use of binary rules and CRF, the accuracy of the model is improved by nearly 12%,3%,5%, respectively, which verifies the effectiveness and stability of the model.

Key words: hierarchical parsing, conditional random field, multiple rules, chunk recognition

中图分类号: 

  • TP391