吉林大学学报(工学版)

• 论文 • 上一篇    下一篇

基于简单统计排名模型的差异表达基因识别

吴佳楠1,2, 周春光1,3, 夏雪飞4, 刘桂霞1,3, 沈薇1, 周柚1,3   

  1. 1. 吉林大学 计算机科学与技术学院,长春 130012;
    2. 长春大学 计算机科学与技术学院,长春 130022;
    3. 吉林大学 符号计算与知识工程教育部重点实验室,长春 130012;
    4. 吉林交通职业技术学院,长春 130012
  • 收稿日期:2011-09-20 出版日期:2013-07-01 发布日期:2013-07-01
  • 通讯作者: 刘桂霞(1963-),女,教授.研究方向:计算智能和生物信息学.E-mail:liugx@jlu.edu.cn E-mail:liugx@jlu.edu.cn
  • 作者简介:吴佳楠(1980-),男,博士研究生,讲师.研究方向:计算智能和生物信息学.E-mail:jiananwu@126.com
  • 基金资助:

    国家自然科学基金项目(60873146, 60803052, 60973092, 60903097);吉林省科技发展计划青年研究项目(201201139, 20090116, 20101589);吉林省教育厅"十二五"科学技术研究项目(256).

Using simple statistical model to identify differentially expressed genes in microarray experiments

WU Jia-nan1,2, ZHOU Chun-guang1,3, XIA Xue-fei4, LIU Gui-xia1,3, SHEN Wei1, ZHOU You1,3   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130022, China;
    2. College of Computer Science and Technology, Changchun University, Changchun 130022, China;
    3. Symbolic Computation and Knowledge Engineering Laboratory of the Ministry of Education, Jilin University, Changchun 130022, China;
    4. Jilin Communications Polytechnic College, Changchun 130012, China
  • Received:2011-09-20 Online:2013-07-01 Published:2013-07-01

摘要:

不同实验条件下差异表达基因(DEGs)的识别是微阵列数据分析的主要目标之一,针对分析结果中具有高排名的基因往往表现出较低差异表达水平的缺点,提出了一种基于简单统计排名模型的差异表达基因识别算法MRP(Matrix rank product)。算法可直接处理基因芯片原始数据,排除了数据预处理方法对算法的干扰;另外,通过对基因芯片数据形成的矩阵进行整体排序计算,得到具有高准确度的差异表达性排名结果。

关键词: 计算机应用, 生物信息学, 差异表达基因识别, 基因芯片数据, 排名

Abstract:

One of the main objectives in the analysis of microarray data is the identification of Differentially Expressed Genes (DEGs) under different experiment conditions. A main approach for such analysis is to calculate a statistical value for each gene, and then rank the genes in accordance with their statistical values. A large ranking value is evidence of a differential expression. Inevitably, different methods generally produce different gene rankings, and the performance of each method depends on its evaluation metric, the dataset and data preprocessing method. A disadvantage shared by existing methods is that some top ranked genes, which are falsely detected as DEGs, tend to exhibit lower expression levels. Here, we present a novel technique named Matrix Rank Product (MRP) for identifying differentially expressed genes that originate from a simple statistical rank model. The algorithm can directly deal with the raw data of the microarray. As a result it can eliminate the interference of different data preprocessing methods. Meanwhile, the new technique is designed for accurate gene ranking by calculating the microarray data matrix of overall sorting.

Key words: computer application, bioinformatics, identification of differentially expressed genes, microarray data, rank

中图分类号: 

  • TP399

[1] Ling Zhi-qiang, Wang Yi, Mukaisho Kenichi, et al. Novel statistical framework to identify differentially expressed genes allowing transcriptomic background differences[J]. Bioinformatics,2010,26(11):1431- 1436.



[2] Tusher V G, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response[J]. Proc Natl Acad Sci USA, 2001, 98(9):5116-5121.



[3] Baldi P, Long A D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inference of gene changes[J]. Bioinformatics, 2001,17:509-519.



[4] Yang Y H, Xiao Y, Segal M R. Identifying differentially expressed genes from microarray experiments via statistic synthesis[J]. Bioinformatics, 2005, 21(7):1084-1093.



[5] Breitling R, Armengaud P, Amtmann A, et al. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments[J]. FEBS Lett,2004, 573(1-3):83-92.



[6] Yousef M, Jung S, Showe L C, et al. Recursive cluster elimination (RCE) for classification and feature selection from gene expression data[J]. BMC Bioinformatics, 2007, 8:144.



[7] Kim J, Patel K, Jung H, et al. AnyExpress: integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm[J]. BMC Bioinformatics,2011,12:75.



[8] Chen J J, Tsai C A, Tzeng S, et al. Gene selection with multiple ordering criteria[J]. BMC Bioinformatics,2007, 8:74.



[9] Claverie J M. Computational methods for the identification of differential and coordinated gene expression[J]. Human Mol Genet, 1999, 8(10):1821-1832.



[10] Belle W V, Gerits N, Jakobsen K, et al. Intensity dependent confidence intervals on microarray measurements of differentially expressed genes: a case study of the effect of MK5, FKRP and TAF4 on the transcriptome[J]. Gene Regulation Systems Biol,2007 (1):57-72.



[11] Kadota Koji, Nakai Yuji, Shimizu Kentaro. A weighted average difference method for detecting differentially expressed genes from microarray data[J]. Algorithms for Molecular Biology,2008, 3:8.



[12] Kuzuya M, Kuzuya F. Probucol as an antioxidant and antiatherogenic drug[J]. Free Radic Biol Med, 1993,14(1): 67-77.



[13] Buckley M M, Goa K L, Price A H, et al. A reappraisal of its pharmacological properties and therapeutic use in hypercholesterolaemia[J]. Drugs, 1989,37(6): 761-800.



[14] Zhe Z, David L G, Eric F R, et al. Cross-platform expression microarray performance in a mouse model of mitochondrial disease therapy[J]. Molecular Genetics and Metabolism, 2010,99(3): 309-318.



[15] Smoot Michael, Ono Keiichiro, Ruscheinski Johannes,et al. Cytoscape 2.8: new features for data integration and network visualization[J]. Bioinformatics,2011, 27(3):431-432.



[16] Mohammad H S, Rouhollah H, Morteza P, et al. A new spectrophotometric method for determination of selenium in cosmetic and pharmaceutical preparations after preconcentration with cloud point extraction[J]. International Journal of Analytical Chemistry, 2011: 729651.



[17] Dáa U, Milada V, Irena D, et al. Bioaccumulation and toxicity of selenium compounds in the green alga Scenedesmus quadricauda[J]. BMC Plant Biology,2009, 9:58.



[18] Van H D, Takahashi H, Inoue E, et al. Transcriptome analyses give insights into selenium-stress responses and seleniumtolerance mechanisms in Arabidopsis[J]. Physiol Plant, 2008, 132(2):236-253.

[1] 刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850.
[2] 王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858.
[3] 金顺福,王宝帅,郝闪闪,贾晓光,霍占强. 基于备用虚拟机同步休眠的云数据中心节能策略及性能[J]. 吉林大学学报(工学版), 2018, 48(6): 1859-1866.
[4] 赵东,孙明玉,朱金龙,于繁华,刘光洁,陈慧灵. 结合粒子群和单纯形的改进飞蛾优化算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1867-1872.
[5] 刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878.
[6] 欧阳丹彤, 范琪. 子句级别语境感知的开放信息抽取方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1563-1570.
[7] 刘富, 兰旭腾, 侯涛, 康冰, 刘云, 林彩霞. 基于优化k-mer频率的宏基因组聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1593-1599.
[8] 桂春, 黄旺星. 基于改进的标签传播算法的网络聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1600-1605.
[9] 刘元宁, 刘帅, 朱晓冬, 陈一浩, 郑少阁, 沈椿壮. 基于高斯拉普拉斯算子与自适应优化伽柏滤波的虹膜识别[J]. 吉林大学学报(工学版), 2018, 48(5): 1606-1613.
[10] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[11] 赵宏伟, 刘宇琦, 董立岩, 王玉, 刘陪. 智能交通混合动态路径优化算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[12] 黄辉, 冯西安, 魏燕, 许驰, 陈慧灵. 基于增强核极限学习机的专业选择智能系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[13] 傅文博, 张杰, 陈永乐. 物联网环境下抵抗路由欺骗攻击的网络拓扑发现算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[14] 曹洁, 苏哲, 李晓旭. 基于Corr-LDA模型的图像标注方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
[15] 侯永宏, 王利伟, 邢家明. 基于HTTP的动态自适应流媒体传输算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1244-1253.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!