吉林大学学报(信息科学版) ›› 2022, Vol. 40 ›› Issue (6): 1039-1044.

• • 上一篇    下一篇

基于Transformer的电网企业文件密级分类系统

董 添, 李 广, 杨振宇, 张 博, 于 波, 王 巍   

  1. 国网吉林省电力有限公司 党委办公室, 长春 130021
  • 收稿日期:2021-11-08 出版日期:2022-12-09 发布日期:2022-12-10
  • 作者简介:董添(1986— ), 男, 长春人, 国网吉林省电力有限公司高级工程师, 主要从事模式识别、 人工智能等研究,( Tel)86-13154392086(E-mail)2414602110@ qq. com。
  • 基金资助:
    国网吉林公司科技基金资助项目(522342210001)

Annotation System of File Secrecy for Power Grid Enterprises Based on Transformer

DONG Tian, LI Guang, YANG Zhenyu, ZHANG Bo, YU Bo, WANG Wei   

  1. General Committee Office, State Grid Jilin Electric Power Supply Company, Changchun 130021, China
  • Received:2021-11-08 Online:2022-12-09 Published:2022-12-10

摘要: 为解决依靠保密人员对文件的密级进行人工标注, 其准确性依赖相关人员的业务素质, 容易造成标密不 准的问题, 建立一种基于Transformer模型的企业文件密级分类系统。 该系统能自动提取文本密级信息的特征 表达, 对企业秘密文件做出智能辅助定密的决策。 在国网吉林省电力有限公司内部核心商密文件、 普通商密 文件和非秘密文件构建的数据集上对提出的模型进行了实验验证, 准确率为97. 37% , 召回率为98. 67% , 表明 模型达到了较高的识别效果, 因此该系统能有效防止秘密文件的泄露

关键词: 密级分类,  , 深度学习,  , 自注意力网络,  , 词嵌入,  , 企业秘密

Abstract: At present, State Grid Jilin Electric Power Co. , Ltd. relies on confidential personnel to manually mark the confidentiality level of documents, and its accuracy depends on the professional quality of relevant personnel, which is easy to cause the problem of inaccurate labeling. Therefore, we establish an enterprise document security classification system based on the transformer model, which can automatically extract the feature expression of text security information and intelligently assist the decision-making of enterprise secret documents. The proposed model is trained and tested on the data set constructed by the internal core commercial secret files, ordinary commercial secret files and non secret files of State Grid Jilin Electric Power Company Limited. The accuracy rate is 97. 37% and the recall rate is 98. 67% . The results show that the model achieves high recognition effect and can effectively prevent the disclosure of secret files.

Key words: security classification,  , deep learning,  , self-attention network,  , word embedding,  , enterprise secrets

中图分类号: 

  • TP305