吉林大学学报(工学版) ›› 2013, Vol. 43 ›› Issue (01): 192-197.

• 论文 • 上一篇    下一篇

基于双重正则化支持向量机的肿瘤基因选择

秦传东1,2, 刘三阳3   

  1. 1. 西安电子科技大学 计算机学院, 西安 710071;
    2. 北方民族大学 信息与计算科学学院, 银川 750021;
    3. 西安电子科技大学 理学院, 西安 710071
  • 收稿日期:2011-07-26 出版日期:2013-01-01 发布日期:2013-01-01
  • 通讯作者: 刘三阳(1959-),男,教授,博士生导师.研究方向:最优化理论与应用.E-mail:liusanyang@263.net E-mail:liusanyang@263.net
  • 作者简介:秦传东(1976-),男,博士研究生.研究方向:模式识别,最优化理论与应用.E-mail:qinchuandong123@163.com
  • 基金资助:

    国家自然科学基金项目(60974082);中央高校基本科研业务费专项项目(K50511700008).

Tumor gene selection based on double regularized support vector machine

QIN Chuan-dong1,2, LIU San-yang3   

  1. 1. School of Computer Seience and Technology, Xidian University, Xi'an 710071, China;
    2. School of Information and Computation Science, Beifang University of Nationalities, Yinchuan 750021, China;
    3. College of Mathematic Science, Xidian University, Xi'an 710071, China
  • Received:2011-07-26 Online:2013-01-01 Published:2013-01-01

摘要: 针对标准L2范数支持向量机和L1范数支持向量机在肿瘤基因分类分析中表现出的优缺点,在利用Bhattacharyya 距离剔除部分对分类无关紧要特征基因,从而得到少数高相关至关重要特征基因的基础上,将一种双重正则化支持向量机应用到DNA微阵列分类中。用一种二次多项式损失函数把这种有约束的优化问题改变为无约束且可微的优化问题,这可以用BFGS算法来求解,通过对两种肿瘤特征基因数据集实验分析知,该算法对肿瘤特征基因分类具有较强的可行性和有效性。

关键词: 计算机应用, 基因表达谱, Bhattacharyya距离, 双重正则化支持向量机, 二次多项式损失函数, BFGS算法

Abstract: According to the strengths and weaknesses of the L2-norm Support Vector Machine (SVM) and the L1-norm SVM in the classification analysis of cancer gene, a Doubly Regularized Support Vector Machine (DRSVM) is applied to the DNA microarray classification based on the Bhattacharyya distance, which is used to eliminate most of the unimportant genes and gain a few highly correlated important genes for classification. A quadratic polynomial loss function changes the constrained optimization into unconstrained and differentiable optimization, which can be computed by Brogden-Fltcher-Goldfarb-Shanno (BFGS) algorithm. Experiment results on two kinds of tumor gene data sets show that this method is effective and feasible.

Key words: computer application, gene expression profiles, Bhattacharyya distance, doubly regularized support vector machine, quadratic polynomial loss function, BFGS algorithm

中图分类号: 

  • TP391
[1] Ramaswamy S, Golub T R. DNA micr oar rays in clinical oncology[J]. Journal of Clinical on Cology, 2002, 20(7): 1932-1941.

[2] Wang Y, Makedon F, Ford J C,et al. Hygiene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data[J]. Bioinformatics,2005, 21(8):1530-1537.

[3] 于化龙,顾国昌,赵靖,等. 基于DNA微阵列数据的特征子空间集成分类[J]. 吉林大学学报:工学版,2011,41(4):1071-1076. Yu Hua-long,Gu Guo-chang,Zhao Jing et al. Feature subspace ensemble classification based on DNA microarray data[J]. Joural of Jilin University (Engineering and Technolog Edition),2011,41(4): 1071-1076.

[4] Cho Sung-Bae, Won Hong-Hee. Machine learning in DNA microarray analysis for cancer classification//Proceedings of the First Asia Pacific Bioinformatics Conference on Bioinformatics, 2003: 189-198.

[5] 李建中,杨昆,高宏,等. 考虑样本不均衡的模型无关的基因选择方法[J]. 软件学报,2006,17(7):1485-1493. Li Jian-zhong,Yang Kun,Gao Hong,et al. Model-free gene selection method by considering unbalanced samples[J]. Journal of Software 2006,17(7):1485-1493.

[6] 李颖新,阮晓钢. 基于支持向量机的肿瘤分类特征基因选取[J]. 计算机研究与发展,2005,42(10): 1796-1801. Li Ying-xin,Ruan Xiao-gang. Feature selection for cancer classification based on support vector machine[J]. Journal of Computer Research and Development,2005,42(10): 1796-1801.

[7] Guyon Isabelle, Weston Jason, Barnhill Stephen, et al. Gene selection for cancer classification using support vector machines[J]. Mach Learn, 2002, 46:389-422.

[8] Zhu J, Rosset S, Hastie T, et al. 1-norm support vector machines//Advances in Neural Information Processing Systems 16, New York: MIT Press, 2004:49-56.

[9] 李颖新,刘全金,阮晓钢. 一种肿瘤基因表达数据的知识提取方法[J]. 电子学报,2004,32(9):1479-1482. Li Ying-xin, Liu Quan-jin,Ruan Xiao-gang. A method for extracting knowledge from tumor gene expression data[J]. Acta Electronica Sinica, 2004,32(9):1479-1482.

[10] 边肇祺,张学工. 模式识别[M]. 2版.北京:清华大学出版社,2000.

[11] Bradley P, Mangasarian O. Feature selection via concave minimization and support vector machines//Proceedings of the 15th International Conference on Machine Learning,1998.

[12] Wang Li, Zhu Ji, Zou Hui. Hybrid huberized support vector machines for microarray classification and gene selection[J]. Data and Text Mining,2008, 24(3):412-419.

[13] Li Jun-tao, Jia Ying-min. An improved elastic net for cancer classification and gene selection[J]. Acta Automatica Sinica, 2010,36(7):976-981.

[14] 袁玉波,严杰, 徐成贤. 多项式光滑的支撑向量机[J]. 计算机学报,2005, 28 (1):9-17. Yuan Yu-bo,Yan Jie,Xu Cheng-xian. Polynomial smooth support vector machine[J]. Chinese Journal of Computers,2005, 28 (1):9-17.

[15] 陈宝林. 最优化理论与算法[M]. 北京:清华大学出版社,2005:314-315.

[16] Massachusetts Institute of Technology... http://www.broad.mit.edu/cgi-bin/cancer/datasets.

[17] Princeton University...http://www.molbio.princeton.edu/colondata.

[18] Tatsuta M, Iishi H, Baba M, et al. Attenuation of vasoactive intestinal peptide enhancement of colon carcinogenesis by ornithine decarboxylase inhibitor[J]. Cancer Lett, 1995, 93(2): 219-225.

[19] Yang Ai-jun, Song Xin-yuan. Bayesian variable selection for disease classification using gene expression data[J]. Bioinformatics, 2010,26(2): 215-222.

[20] Simmons D, Seed B. Isolation of a cDNA encoding CD33, a differentiation antigen of myeloid progenitor cells[J]. Journal of Immunology, 1988, 141(8):2797-2800.

[21] Sounak Chakrabortya, Ruixin Guob. A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data[J]. Computational Statistics and Data Analysis, 2011,55: 1342-1356.
[1] 刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850.
[2] 王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858.
[3] 金顺福,王宝帅,郝闪闪,贾晓光,霍占强. 基于备用虚拟机同步休眠的云数据中心节能策略及性能[J]. 吉林大学学报(工学版), 2018, 48(6): 1859-1866.
[4] 赵东,孙明玉,朱金龙,于繁华,刘光洁,陈慧灵. 结合粒子群和单纯形的改进飞蛾优化算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1867-1872.
[5] 刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878.
[6] 欧阳丹彤, 范琪. 子句级别语境感知的开放信息抽取方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1563-1570.
[7] 刘富, 兰旭腾, 侯涛, 康冰, 刘云, 林彩霞. 基于优化k-mer频率的宏基因组聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1593-1599.
[8] 桂春, 黄旺星. 基于改进的标签传播算法的网络聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1600-1605.
[9] 刘元宁, 刘帅, 朱晓冬, 陈一浩, 郑少阁, 沈椿壮. 基于高斯拉普拉斯算子与自适应优化伽柏滤波的虹膜识别[J]. 吉林大学学报(工学版), 2018, 48(5): 1606-1613.
[10] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[11] 吉野辰萌, 樊璐璐, 闫磊, 徐涛, 林烨, 郭桂凯. 基于MBNWS算法的假人胸部结构多目标优化设计[J]. 吉林大学学报(工学版), 2018, 48(4): 1133-1139.
[12] 赵宏伟, 刘宇琦, 董立岩, 王玉, 刘陪. 智能交通混合动态路径优化算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[13] 黄辉, 冯西安, 魏燕, 许驰, 陈慧灵. 基于增强核极限学习机的专业选择智能系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[14] 傅文博, 张杰, 陈永乐. 物联网环境下抵抗路由欺骗攻击的网络拓扑发现算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[15] 曹洁, 苏哲, 李晓旭. 基于Corr-LDA模型的图像标注方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!