吉林大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (2): 624-629.doi: 10.13229/j.cnki.jdxbgxb201502043

• 论文 • 上一篇    下一篇

基于粗糙集的基因信号属性约简

刘富,张潇,侯涛,刘云   

  1. 吉林大学 通信工程学院,长春 130022
  • 收稿日期:2013-10-25 出版日期:2015-04-01 发布日期:2015-04-01
  • 通讯作者: 张潇(1989),女,硕士研究生.研究方向:模式识别及生物信息学.E-mail:zxhappy123@126.com
  • 作者简介:刘富(1968),男,教授,博士生导师.研究方向:计算机视觉及模式识别.E-mail:liufu@jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(51105170);吉林省科技发展计划项目(10100505).

Attributes reduction of gene signal based on rough set

LIU Fu,ZHANG Xiao,HOU Tao,LIU Yun   

  1. College of Communications Engineering, Jilin University, Changchun 130022, China
  • Received:2013-10-25 Online:2015-04-01 Published:2015-04-01

摘要: 为了解决k-mer频率在对DNA片段进行识别的过程中耗时长、效率低等问题,采用粗糙集的属性约简理论对DNA片段中提取的k-mer数字特征进行有效的约简优化,并对30个微生物菌株的全基因组进行了信号约简实验,结果证明本文所用方法能将原始的高维基因信号约简掉72.27%,准确率提升0.62%,运行时间缩短73.3%。

关键词: 计算机应用, 属性约简, 粗糙集, k-mer频率

Abstract: To overcome disadvantages of long time consumption and low efficiency when k-mer frequency is used for DNA segment recognition, the attributes reduction of rough set theory is adopted to reduce the k-mer frequency. Signal reduction experiment in the whole genome of 30 microbial strains was carried out. Results show that using this method can reduce 72.27% of the original high-dimensional genetic signals, and increase the accuracy by 0.62%, meanwhile, the running time is shortened by 73.3%.

Key words: computer application, attributes reduction, rough set, k-mer frequency

中图分类号: 

  • TP399
[1] Trifonov E N, Sussman J L. The pitch of chromatin DNA is reflected in its nucleotide sequence[J]. Proceedings of the National Academy of Sciences, 1980, 77(7): 3816-3820.
[2] Borodovsky M Y, Sprizhitskii Y, Golovanov E, et al. Statistical patterns in primary structures of functional regions in the E. coli genome III[C]∥Computer Recognition of Coding Regions Mol Biol, 1986, 20: 1145-1150.
[3] Woese C R, Fox G E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms[J]. Proceedings of the National Academy of Sciences, 1977, 74(11): 5088-5090.
[4] Cole J R, Chai B, Marsh T L,et al. The Ribosomal Database Project (RDP-II): previewing a new auto aligner that allows regular updates and the new prokaryotic taxonomy[J]. Nucleic Acids Research, 2003, 31(1):442-443.
[5] Olsen G J, Woese C R, Overbeek R. The winds of (evolutionary) change: breathing new life into microbiology[J]. Journal of Bacteriology 1994, 176(1): 1-6.
[6] Karlin S, Burge C. Dinucleotide relative abundance extremes: a genomic signature[J]. Trends in Genetics 1995, 11(7): 283-290.
[7] Karlin S, Brocchieri L, Mrázek J, et al. A chimeric prokaryotic ancestry of mitochondria and primitive eukaryotes[J]. Proceedings of the National Academy of Sciences, 1999, 96(16):9190-9195.
[8] Karlin S, Mrázek J, Ma J, et al. Predicted highly expressed genes in archaic genomes[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(20): 7303-7308.
[9] Karlin S, Zhu Z Y, Karlin K D. The extended environment of mononuclear metal centers in protein structures[J]. Proceedings of the National Academy of Sciences, 1997, 94(26):14225-14230.
[10] Karlin S, Mrazek J, Campbell A M. Compositional biases of bacterial genomes and evolutionary implications[J]. Journal of Bacteriology, 1997, 179(12): 3899-3913.
[11] Nakashima H, Nishikawa K, Ooi T. Differences in dinucleotide frequencies of human, yeast, and escherichia coli genes[J]. DNA Research, 1997, 4(3): 185-192.
[12] Karlin S, Ladunga I, Blaisdell B. Heterogeneity of genomes: measures and values[J]. Proceedings of the National Academy of Sciences, 1994, 91(26):12837-12841.
[13] Rosen G, Garbarine E, Caseiro D, et al. Metagenome fragment classification using N-mer frequency profiles[J]. Adv Bioinformatics, 2008, 20: 59-69.
[14] McHardy A C, Martin H G, Tsirigos A, et al. Accurate phylogenetic classification of variable-length DNA fragments[J]. Nature Methods, 2006, 4(1): 63-72.
[15] Patil K R, Haider P, Pope P B, et al. Taxonomic metagenome sequence assignment with structured output models[J]. Nature Methods 2011, 8(3): 191-192.
[16] Rosen G L, Reichenberger Erin R, Rosenfeld Aaron M. NBC: the Naive Bayes classification tool webserver for taxonomic classification of metagenomic reads[J]. Bioinformatics, 2011,27:127-129.
[17] Koslicki D, Foucart S, Rosen G. Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing[J]. Bioinformatics,2013, 29(17): 2096-2102.
[18] Amir A, Zuk O. Bacterial community reconstruction using compressed sensing[J]. J Comput Biol, 2011, 18:1723-1741.
[19] Peter Meinicke, Kathrin Petra Aβhauer, Thomas Lingner. Mixture models for analysis of the taxonomic composition of metagenomes[J]. Bioinformatics, 2011, 27(12): 1618-1624.
[20] 王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009, 32(7): 1219-1246.
Wang Guo-ying, Yao Yi-yu, Yu Hong. A survey on rough set theory and applications[J]. Chinese Journal of Computers, 2009, 32(7):1219-1246.
[21] 张政超.粗糙集理论数据处理方法及其研究[J].计算机技术与发展,2012, 20(4): 13-20.
Zhang Zheng-chao. Rough sets data processing method and its research[J]. Computer Technology and Development, 2012, 20(4):13-20.
[22] Chandran C P. Feature selection from protein primary sequence database using enhanced quick reduct fuzzy-rough set[J]. Granular Computing, IEEE International Conference, 2008, 8: 111-114.
[23] 刘斌,陈钉均.基于粗糙集和遗传算法的道路交通事故分析[J].兰州交通大学学报, 2010, 29(1): 69-71.
Liu Bin, Chen Ding-jun. Road traffic accident analysis based on rough sets and genetic algorithm[J]. Journal of Lanzhou Jiaotong University, 2010, 29(1): 69-71.
[24] Cover T M, Hart P E. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1968, IT-13: 21-27.
[1] 刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850.
[2] 王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858.
[3] 金顺福,王宝帅,郝闪闪,贾晓光,霍占强. 基于备用虚拟机同步休眠的云数据中心节能策略及性能[J]. 吉林大学学报(工学版), 2018, 48(6): 1859-1866.
[4] 赵东,孙明玉,朱金龙,于繁华,刘光洁,陈慧灵. 结合粒子群和单纯形的改进飞蛾优化算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1867-1872.
[5] 刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878.
[6] 欧阳丹彤, 范琪. 子句级别语境感知的开放信息抽取方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1563-1570.
[7] 刘富, 兰旭腾, 侯涛, 康冰, 刘云, 林彩霞. 基于优化k-mer频率的宏基因组聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1593-1599.
[8] 桂春, 黄旺星. 基于改进的标签传播算法的网络聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1600-1605.
[9] 刘元宁, 刘帅, 朱晓冬, 陈一浩, 郑少阁, 沈椿壮. 基于高斯拉普拉斯算子与自适应优化伽柏滤波的虹膜识别[J]. 吉林大学学报(工学版), 2018, 48(5): 1606-1613.
[10] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[11] 赵宏伟, 刘宇琦, 董立岩, 王玉, 刘陪. 智能交通混合动态路径优化算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[12] 黄辉, 冯西安, 魏燕, 许驰, 陈慧灵. 基于增强核极限学习机的专业选择智能系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[13] 傅文博, 张杰, 陈永乐. 物联网环境下抵抗路由欺骗攻击的网络拓扑发现算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[14] 曹洁, 苏哲, 李晓旭. 基于Corr-LDA模型的图像标注方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
[15] 侯永宏, 王利伟, 邢家明. 基于HTTP的动态自适应流媒体传输算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1244-1253.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!