J4 ›› 2009, Vol. 47 ›› Issue (6): 1237-1240.

• 计算机 • 上一篇    下一篇

基于距离最大熵值的蛋白质结构域边界检测系统

邹淑雪, 刘桂霞, 时小虎, 周春光   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2009-02-23 出版日期:2009-11-26 发布日期:2010-01-07
  • 通讯作者: 周春光 E-mail:cgzhou@jlu.edu.cn.

Detection of Protein Domain Boundaries viaDistancebased Maximal Entropy

ZOU Shuxue, LIU Guixia, SHI Xiaohu, ZHOU Chunguang   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2009-02-23 Online:2009-11-26 Published:2010-01-07
  • Contact: ZHOU Chunguang E-mail:cgzhou@jlu.edu.cn.

摘要:

首次将蛋白质结构域边界检测问题归结为非平衡数据学习问题, 提出一种新的欠采样方法, 即在支持向量机特征空间中对与正类样本具有距离最大熵值的负类样本进行采样. 以经过筛选的蛋白质结构域数据库作为实验数据, 支持向量机学习系统的平均预测准确率可达80%, 同时具有较高的敏感性和特异性.

关键词: 蛋白质结构域边界, 支持向量机, 非平衡数据学习, 基于距离的最大熵

Abstract:

The domain detection was taken as an imbalanced data learning problem. A novel undersampling method using distancebased maximal entropy in the feature space of support vector machines is proposed. By way of scanning the selected proteins from the protein domain database, the overall accuracy of our machine study system is about 80% with high sensitivity and specificity.

Key words: protein domain boundaries, support vector machine, imbalanced data learning, distancebased maximal entropy

中图分类号: 

  •