吉林大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (2): 624-629.doi: 10.13229/j.cnki.jdxbgxb201502043
刘富,张潇,侯涛,刘云
LIU Fu,ZHANG Xiao,HOU Tao,LIU Yun
摘要: 为了解决k-mer频率在对DNA片段进行识别的过程中耗时长、效率低等问题,采用粗糙集的属性约简理论对DNA片段中提取的k-mer数字特征进行有效的约简优化,并对30个微生物菌株的全基因组进行了信号约简实验,结果证明本文所用方法能将原始的高维基因信号约简掉72.27%,准确率提升0.62%,运行时间缩短73.3%。
中图分类号:
[1] Trifonov E N, Sussman J L. The pitch of chromatin DNA is reflected in its nucleotide sequence[J]. Proceedings of the National Academy of Sciences, 1980, 77(7): 3816-3820. [2] Borodovsky M Y, Sprizhitskii Y, Golovanov E, et al. Statistical patterns in primary structures of functional regions in the E. coli genome III[C]∥Computer Recognition of Coding Regions Mol Biol, 1986, 20: 1145-1150. [3] Woese C R, Fox G E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms[J]. Proceedings of the National Academy of Sciences, 1977, 74(11): 5088-5090. [4] Cole J R, Chai B, Marsh T L,et al. The Ribosomal Database Project (RDP-II): previewing a new auto aligner that allows regular updates and the new prokaryotic taxonomy[J]. Nucleic Acids Research, 2003, 31(1):442-443. [5] Olsen G J, Woese C R, Overbeek R. The winds of (evolutionary) change: breathing new life into microbiology[J]. Journal of Bacteriology 1994, 176(1): 1-6. [6] Karlin S, Burge C. Dinucleotide relative abundance extremes: a genomic signature[J]. Trends in Genetics 1995, 11(7): 283-290. [7] Karlin S, Brocchieri L, Mrázek J, et al. A chimeric prokaryotic ancestry of mitochondria and primitive eukaryotes[J]. Proceedings of the National Academy of Sciences, 1999, 96(16):9190-9195. [8] Karlin S, Mrázek J, Ma J, et al. Predicted highly expressed genes in archaic genomes[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(20): 7303-7308. [9] Karlin S, Zhu Z Y, Karlin K D. The extended environment of mononuclear metal centers in protein structures[J]. Proceedings of the National Academy of Sciences, 1997, 94(26):14225-14230. [10] Karlin S, Mrazek J, Campbell A M. Compositional biases of bacterial genomes and evolutionary implications[J]. Journal of Bacteriology, 1997, 179(12): 3899-3913. [11] Nakashima H, Nishikawa K, Ooi T. Differences in dinucleotide frequencies of human, yeast, and escherichia coli genes[J]. DNA Research, 1997, 4(3): 185-192. [12] Karlin S, Ladunga I, Blaisdell B. Heterogeneity of genomes: measures and values[J]. Proceedings of the National Academy of Sciences, 1994, 91(26):12837-12841. [13] Rosen G, Garbarine E, Caseiro D, et al. Metagenome fragment classification using N-mer frequency profiles[J]. Adv Bioinformatics, 2008, 20: 59-69. [14] McHardy A C, Martin H G, Tsirigos A, et al. Accurate phylogenetic classification of variable-length DNA fragments[J]. Nature Methods, 2006, 4(1): 63-72. [15] Patil K R, Haider P, Pope P B, et al. Taxonomic metagenome sequence assignment with structured output models[J]. Nature Methods 2011, 8(3): 191-192. [16] Rosen G L, Reichenberger Erin R, Rosenfeld Aaron M. NBC: the Naive Bayes classification tool webserver for taxonomic classification of metagenomic reads[J]. Bioinformatics, 2011,27:127-129. [17] Koslicki D, Foucart S, Rosen G. Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing[J]. Bioinformatics,2013, 29(17): 2096-2102. [18] Amir A, Zuk O. Bacterial community reconstruction using compressed sensing[J]. J Comput Biol, 2011, 18:1723-1741. [19] Peter Meinicke, Kathrin Petra Aβhauer, Thomas Lingner. Mixture models for analysis of the taxonomic composition of metagenomes[J]. Bioinformatics, 2011, 27(12): 1618-1624. [20] 王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009, 32(7): 1219-1246. Wang Guo-ying, Yao Yi-yu, Yu Hong. A survey on rough set theory and applications[J]. Chinese Journal of Computers, 2009, 32(7):1219-1246. [21] 张政超.粗糙集理论数据处理方法及其研究[J].计算机技术与发展,2012, 20(4): 13-20. Zhang Zheng-chao. Rough sets data processing method and its research[J]. Computer Technology and Development, 2012, 20(4):13-20. [22] Chandran C P. Feature selection from protein primary sequence database using enhanced quick reduct fuzzy-rough set[J]. Granular Computing, IEEE International Conference, 2008, 8: 111-114. [23] 刘斌,陈钉均.基于粗糙集和遗传算法的道路交通事故分析[J].兰州交通大学学报, 2010, 29(1): 69-71. Liu Bin, Chen Ding-jun. Road traffic accident analysis based on rough sets and genetic algorithm[J]. Journal of Lanzhou Jiaotong University, 2010, 29(1): 69-71. [24] Cover T M, Hart P E. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1968, IT-13: 21-27. |
[1] | 刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850. |
[2] | 王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858. |
[3] | 金顺福,王宝帅,郝闪闪,贾晓光,霍占强. 基于备用虚拟机同步休眠的云数据中心节能策略及性能[J]. 吉林大学学报(工学版), 2018, 48(6): 1859-1866. |
[4] | 赵东,孙明玉,朱金龙,于繁华,刘光洁,陈慧灵. 结合粒子群和单纯形的改进飞蛾优化算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1867-1872. |
[5] | 刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878. |
[6] | 欧阳丹彤, 范琪. 子句级别语境感知的开放信息抽取方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1563-1570. |
[7] | 刘富, 兰旭腾, 侯涛, 康冰, 刘云, 林彩霞. 基于优化k-mer频率的宏基因组聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1593-1599. |
[8] | 桂春, 黄旺星. 基于改进的标签传播算法的网络聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1600-1605. |
[9] | 刘元宁, 刘帅, 朱晓冬, 陈一浩, 郑少阁, 沈椿壮. 基于高斯拉普拉斯算子与自适应优化伽柏滤波的虹膜识别[J]. 吉林大学学报(工学版), 2018, 48(5): 1606-1613. |
[10] | 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628. |
[11] | 赵宏伟, 刘宇琦, 董立岩, 王玉, 刘陪. 智能交通混合动态路径优化算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223. |
[12] | 黄辉, 冯西安, 魏燕, 许驰, 陈慧灵. 基于增强核极限学习机的专业选择智能系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230. |
[13] | 傅文博, 张杰, 陈永乐. 物联网环境下抵抗路由欺骗攻击的网络拓扑发现算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236. |
[14] | 曹洁, 苏哲, 李晓旭. 基于Corr-LDA模型的图像标注方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243. |
[15] | 侯永宏, 王利伟, 邢家明. 基于HTTP的动态自适应流媒体传输算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1244-1253. |
|