吉林大学学报(信息科学版) ›› 2014, Vol. 32 ›› Issue (5): 550-555.

• 论文 • 上一篇    下一篇

多阶段的微阵列数据特征基因集选取

问亮军, 郑虹   

  1. 长春工业大学 计算机科学与工程学院, 长春 130012
  • 收稿日期:2014-01-14 出版日期:2014-09-26 发布日期:2014-12-26
  • 作者简介:问亮军(1984—), 男, 宁夏固原人, 长春工业大学硕士研究生, 主要从事搜索引擎、 智能系统研究, (Tel)86-13610742712(E-mail)wenliangjun_2008@126.com;通讯作者:郑虹(1974—), 女, 长春人,长春工业大学副教授, 博士, 硕士生导师, 主要从事智能计算、 搜索引擎研究, (Tel)86-13039301323(E-mail)zhenghong@mail.ccut.edu.cn。
  • 基金资助:

    吉林省科技厅自然科学基金资助项目(20130101060JC); 吉林省教育厅“十二五”科学技术研究基金资助项目(2014132; 2014125)

Multi-Stages Informative Gene Set Selection Algorithm in Microarray Expression Profiles

WEN Liangjun, ZHENG Hong   

  1. College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
  • Received:2014-01-14 Online:2014-09-26 Published:2014-12-26

摘要:

为解决微阵列数据中因样本量少且每个样本的维度高而带有大量干扰信息和冗余信息的问题, 通过分阶段的步骤对特征基因集进行全方位的选取和优化。考虑到单个基因在不同环境中的差异性, 从中选择出只在特定条件下差异较大的基因构成候选特征集; 剔除候选特征集中相关性较小的基因; 采用遗传算法对所得特征集的任意子集的整体分类性能进行考查, 选出较优的子集。实验结果表明, 该算法对逐步选取特征基因具有可行性和有效性, 而特征基因集在分类适应度(分类能力度量)和分类准确率均比原始数据更好。

关键词: 微阵列数据, 特征基因, 相关性, 遗传算法

Abstract:

To solve the microarray data problem that the data has small sample size and each dimension of the sample is high, therefor there is a lot of interfering information with redundant information in the data. The multi-stage algorithm of informative gene set selection is discussed in this paper. First, the difference of single gene in different condition is considered, genes with more differences in special condition are selected as candidate gene set. Then, the genes with less correlation are rejected. Finally, the better gene sets based on the global classification performance of any set are selected. The experiment result shows that the algorithm is feasible and effective for informative gene set selection. The feature set of genes in both fitness of classification (classification ca
pability metrics) and classification accuracy is more accurate, and more efficient than the raw data.

Key words: microarray expression profiles, informative gene, correlation, genetic algorithm

中图分类号: 

  • TP399