,enterprise-WeChat, high-performance computing cluster ( HPCC), performance monitoring, job management, message passing ,"/> 基于企业微信的高性能集群监控管理系统

吉林大学学报(信息科学版) ›› 2023, Vol. 41 ›› Issue (2): 381-386.

• • 上一篇    

基于企业微信的高性能集群监控管理系统

冯 伟, 姜远飞   

  1. (吉林大学 原子与分子物理研究所, 长春 130012) 
  • 收稿日期:2022-06-16 出版日期:2023-04-13 发布日期:2023-04-17
  • 通讯作者: 姜远飞(1980— ), 男, 吉林农安人, 吉林大学高级工程师, 主要 从事实验原子与分子物理研究, (Tel)86-13578681787(E-mail)jiangyuanfei@ jlu. edu. cn
  • 作者简介:冯伟(1976— ), 女, 吉林白山人, 吉林大学高级工程师, 主要从事高性能集群管理、 分子动力学模拟研究, ( Tel)86- 18843162547(E-mail)fengw@ jlu. edu. cn
  • 基金资助:
    吉林大学 2019 实验技术基金资助项目(11974136) 

 Monitoring Management System of High-Performance Computing Cluster Based on Enterprise-WeChat

 FENG Wei, JIANG Yuanfei   

  1. (Institute of Atomic and Molecular Physics, Jilin University, Changchun 130012, China)
  • Received:2022-06-16 Online:2023-04-13 Published:2023-04-17

摘要: 为解决高性能集群监控管理中, 系统异常监测受时间、 地点限制, 集群管理员无法及时发现集群异常 从而影响集群系统正常运行等问题, 利用企业微信的开放功能和消息传送机制, 结合 Linux(GNU/ Linux)操作 系统集群监控管理方法, 开发了适合中小型集群的简单易用, 并极易扩展的集群监控管理系统, 实现了手机端 预警信息呈现功能。 阐述了系统需求、 系统框架和功能设计、 技术框架和数据流, 以及系统部署与开发实现的 具体过程。 目前系统已开发完毕, 应用于吉林大学原子与分子物理研究所的日常集群管理中。 集群管理员和 用户可以在不登录集群节点的情况下, 通过手机端 APP(Application)监控到集群系统的软硬件性能和作业完成 状态, 便于及时进行后续处理工作。 尤其在疫情期间, 居家办公, 集群访问不便捷的情况下, 该功能的实施 辅助了吉林大学原子与分子物理研究所科研工作的高效进行。

关键词: 企业微信, 高性能计算集群, 性能监控, 作业管理, 消息传送

Abstract:  In order to solve the problems of high-performance cluster monitoring and management, such as system monitoring is restricted by time and place, which causes cluster administrators to be unable to find cluster abnormal situations in time and affects the running of the cluster system, the open function and message transmission mechanism of WeChat are used in combination with the cluster monitoring and management method of Linux (GNU/ Linux) operating system, a kind of simple and easy-to-use cluster monitoring and management system is developed. It is suitable for small and medium-sized clusters with the ability to expand easily. We mainly expound the system requirements, system framework and function design, technical framework and data flow, as well as the specific process of system deployment and development. At present, the system has been developed and applied in the cluster monitoring management of the institute and molecular physics of Jilin University, and has achieved good application results. The cluster administrator and users can monitor the cluster performance and job completion status through APP(Application) on the mobile phone without login system, so as to facilitate the follow-up work in time. Especially during the COVID-19 period, when the cluster access is not convenient, the implementation of this function has assisted the efficient scientific research work of the institute. 

Key words:  ')">

 , enterprise-WeChat, high-performance computing cluster ( HPCC), performance monitoring, job management, message passing

中图分类号: 

  • TP393