Journal of Jilin University(Engineering and Technology Edition) ›› 2018, Vol. 48 ›› Issue (5): 1593-1599.doi: 10.13229/j.cnki.jdxbgxb20170668

Previous Articles     Next Articles

Metagenomic clustering method based on k-mer frequency optimization

LIU Fu1,2, LAN Xu-teng1,2, HOU Tao2, KANG Bing2, LIU Yun2, LIN Cai-xia3   

  1. 1.State Key Laboratory of Automotive Simulation and Control,Jilin University,Changchun 130022,China;
    2.College of Communication Engineering, Jilin University, Changchun 130022, China;
    3.College of Information Science &Technology, Hainan University, Haikou 570228, China;
  • Received:2017-06-27 Online:2018-09-20 Published:2018-12-11

Abstract: DNA sequence classification is a very important step in metagenomic study, and k-mer frequency is a commonly used feature for DNA sequence classification. The dimension of k-mer grows exponentially with k, easily leading to the “dimension disaster”. To solve this problem, this paper proposes a k-mer optimization based metagenomic DNA sequence classification method. First, the k-mer frequency is extracted for each DNA sequence. Second, the k-mer frequency is optimized based on Non-negative Matrix Factorization (NMF) algorithm. Finally, the fuzzy C-means clustering algorithm is used for DNA sequence clustering. Experimental results on metagenomic datasets containing different species show that the proposed method can effectively overcome the shortcoming of traditional classification method, and the classification performance is better than that of several similar algorithms.

Key words: computer application, pattern recognition and intelligent systems, k-mer, non-negative matrix factorization(NMF), fuzzy C-means method, metagenome

CLC Number: 

  • TP391
[1] Teeling H, Glöckner F O.Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective[J]. Briefings in Bioinformatics, 2012, 13(6): 728-742.
[2] Burge C B, Karlin S.Finding the genes in genomic DNA[J]. Current Opinion in Structural Biology, 1998, 8(3): 346-354.
[3] Borodovskii M Y, Sprizhitskii Y A, Golovanov E I, et al.Statistical patterns in primary structures of functional regions in the E. coli genome. I. Oligonucleotide frequencies analysis[J]. Molecular Biology, 1986, 77(7): 3816-3820.
[4] Rosen G L, Reichenberger E R, Rosenfeld A M.NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads[J]. Bioinformatics, 2011, 27(1): 127-129.
[5] Gregor I, Dröge J, Schirmer M, et al.PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes[J]. Peer J, 2016, 4(10): e1603.
[6] Meinicke P, Aßhauer K P, Lingner T.Mixture models for analysis of the taxonomic composition of metagenomes[J]. Bioinformatics, 2011, 27(12): 1618-1624.
[7] 刘富,张潇,侯涛,等. 基于粗糙集的基因信号属性约简[J]. 吉林大学学报:工学版, 2015, 45(2): 624-629.
Liu Fu, Zhang Xiao, Hou Tao, et al.Attributes reduction of gene signal based on rough set[J]. Journal of Jilin University(Engineering and Technology Edition), 2015, 45(2): 624-629.
[8] Koslicki D, Foucart S, Rosen G.Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing[J]. Bioinformatics, 2013, 29(17): 2096-2102.
[9] Jiang X, Weitz J S, Dushoff J.A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data[J]. Journal of Mathematical Biology, 2012, 64(4): 697-711.
[10] Liang Z, Li Y, Zhao T.Projected gradient method for kernel discriminant nonnegative matrix factorization and the applications[J]. Signal Processing, 2010, 90(7): 2150-2163.
[11] Lin C J.On the convergence of multiplicative update algorithms for nonnegative matrix factorization[J]. IEEE Transactions on Neural Networks, 2014, 18(6): 1589-1596.
[12] 李华, 杨帆, 杨华民, 等. 条纹颜色分离与聚类[J]. 光学精密工程, 2016, 24(5): 1206-1214.
Li Hua, Yang Fan, Yang Hua-min, et al.Separating and clustering of structured light stripe colo[J]. Optics and Precision Engineering, 2016, 24(5): 1206-1214.
[13] 赵文昌, 李忠木. 融合改进人工蜂群和K均值聚类的图像分割[J]. 液晶与显示, 2017, 32(9): 726-735.
Zhao Wen-chang, Li Zhong-mu.Image segmentation algorithm based on improved artificial bee colony and K-mean clustering[J]. Chinese Journal of Liquid Crystal and Displays, 2017, 32(9): 726-735.
[14] 郭少军, 娄树理, 刘峰. 应用颜色聚类图像块的多舰船显著性检测[J]. 光学精密工程, 2016, 24(7): 1807-1817.
Guo Shao-jun, Lou Shu-li, Liu Feng.Multi-ship saliency detection via patch fusion by color clustering[J]. Optics and Precision Engineering, 2016, 24(7): 1807-1817.
[15] 王永,万潇逸,陶娅芝,等.基于K-medoids项目聚类的协同过滤推荐算法[J]. 重庆邮电大学学报:自然科学版,2017,29(4):521-526.
Wang Yong,Wan Xiao-yi,Tao Ya-zhi,et al.Collaborative filtering recommendation algorithm based on K-medoids item clustering[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2017,29(4):521-526.
[16] 杨玉梅. 基于信息熵改进的 K-means 动态聚类算法[J].重庆邮电大学学报:自然科学版,2016,28(2):254-259.
Yang Yu-mei.Improved K-means dynamic clustering algorithm based on information entropy[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2016,28(2):254-259.
[17] Leung H C, Yiu S M, Yang B, et al.A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio[J]. Bioinformatics, 2011, 27(11): 1489-1495.
[18] Alsop E B, Raymond J.Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification[J]. Plos One, 2013, 8(7): e67337.
[19] 吴秋红, 吴谨, 朱磊, 等. 基于图论和FCM的图像分割算法[J]. 液晶与显示, 2016, 31(1): 112-116.
Wu Qiu-hong, Wu Jin, Zhu Lei, et al.Image segmentation algorithm based on graph theory and FCM[J]. Chinese Journal of Liquid Crystal and Displays, 2016, 31(1): 112-116.
[20] 王民, 张鑫, 贠卫国,等. 基于核模糊C-均值和EM混合聚类算法的遥感图像分割[J]. 液晶与显示, 2017, 32(12): 999-1005.
Wang Min, Zhang Xin, Yun Wei-guo, et al.Remote sensing image segmentation based on KFCM and EM hybrid clustering algorithm[J]. Chinese Journal of Liquid Crystal and Displays, 2017, 32(12): 999-1005.
[21] Liang J, Bai L, Dang C, et al.The K-means-type algorithms versus imbalanced data distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4): 728-745.
[1] LIU Fu,ZONG Yu-xuan,KANG Bing,ZHANG Yi-meng,LIN Cai-xia,ZHAO Hong-wei. Dorsal hand vein recognition system based on optimized texture features [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1844-1850.
[2] WANG Li-min,LIU Yang,SUN Ming-hui,LI Mei-hui. Ensemble of unrestricted K-dependence Bayesian classifiers based on Markov blanket [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1851-1858.
[3] JIN Shun-fu,WANG Bao-shuai,HAO Shan-shan,JIA Xiao-guang,HUO Zhan-qiang. Synchronous sleeping based energy saving strategy of reservation virtual machines in cloud data centers and its performance research [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1859-1866.
[4] ZHAO Dong,SUN Ming-yu,ZHU Jin-long,YU Fan-hua,LIU Guang-jie,CHEN Hui-ling. Improved moth-flame optimization method based on combination of particle swarm optimization and simplex method [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1867-1872.
[5] LIU En-ze,WU Wen-fu. Agricultural surface multiple feature decision fusion disease judgment algorithm based on machine vision [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1873-1878.
[6] OUYANG Dan-tong, FAN Qi. Clause-level context-aware open information extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1563-1570.
[7] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[8] LIU Yuan-ning, LIU Shuai, ZHU Xiao-dong, CHEN Yi-hao, ZHENG Shao-ge, SHEN Chun-zhuang. LOG operator and adaptive optimization Gabor filtering for iris recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1606-1613.
[9] CHE Xiang-jiu, WANG Li, GUO Xiao-xin. Improved boundary detection based on multi-scale cues fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1621-1628.
[10] ZHAO Hong-wei, LIU Yu-qi, DONG Li-yan, WANG Yu, LIU Pei. Dynamic route optimization algorithm based on hybrid in ITS [J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[11] HUANG Hui, FENG Xi-an, WEI Yan, XU Chi, CHEN Hui-ling. An intelligent system based on enhanced kernel extreme learning machine for choosing the second major [J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[12] FU Wen-bo, ZHANG Jie, CHEN Yong-le. Network topology discovery algorithm against routing spoofing attack in Internet of things [J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[13] CAO Jie, SU Zhe, LI Xiao-xu. Image annotation method based on Corr-LDA model [J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
[14] HOU Yong-hong, WANG Li-wei, XING Jia-ming. HTTP-based dynamic adaptive streaming video transmission algorithm [J]. 吉林大学学报(工学版), 2018, 48(4): 1244-1253.
[15] ZHAO Hong-wei, LIU Yu-qi, TE Ri-gen, CHEN Chang-zheng, ZANG Xue-bai. New compression algorithms based on finite sequence [J]. 吉林大学学报(工学版), 2018, 48(3): 882-886.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] PENG Qi-yuan,XU Jin,ZHENG Sheng-bao,SHAO Yi-ming,DENG Tian-min. Effect of road pavement material change at tunnel entrance and exit on vehicle operation and optimization of transition location[J]. 吉林大学学报(工学版), 2009, 39(06): 1497 -1503 .
[2] WANG Xin,JIANG Ji-hai. Regenerative braking control strategy for wheel drive hydraulic hybrid vehicle[J]. 吉林大学学报(工学版), 2009, 39(06): 1544 -1549 .
[3] WU Jian,DONG Hui-juan,ZHANG Song-bai,ZHANG Guang-yu. Novel primary series matching scheme for piezoelectric ultrasonic transducer[J]. 吉林大学学报(工学版), 2009, 39(06): 1641 -1645 .
[4] ZHENG Wen-Zhong, WAN Fu-Xiong, LI Shi-Guang. Mechanical performance of reinforced concrete slabs strengthened with CFRP sheets bonded with an inorganic adhesive after fire[J]. 吉林大学学报(工学版), 2010, 40(05): 1244 -1249 .
[5] LIU Song-shan, WANG Qing-nian, WANG Wei-hua, LIN Xin. Influence of inertial mass on damping and amplitude-frequency characteristic of regenerative suspension[J]. 吉林大学学报(工学版), 2013, 43(03): 557 -563 .
[6] CHU Liang, WANG Yan-bo, QI Fu-wei, ZHANG Yong-sheng. Control method of inlet valves for brake pressure fine regulation[J]. 吉林大学学报(工学版), 2013, 43(03): 564 -570 .
[7] LI Jing, WANG Zi-han, YU Chun-xian, HAN Zuo-yue, SUN Bo-hua. Design of control system to follow vehicle state with HIL test beach[J]. 吉林大学学报(工学版), 2013, 43(03): 577 -583 .
[8] ZHU Jian-feng, LIN Yi, CHEN Xiao-kai, SHI Guo-biao. Structural topology optimization based design of automotive transmission housing structure[J]. 吉林大学学报(工学版), 2013, 43(03): 584 -589 .
[9] HU Xing-jun, LI Teng-fei, WANG Jing-yu, YANG Bo, GUO Peng, LIAO Lei. Numerical simulation of the influence of rear-end panels on the wake flow field of a heavy-duty truck[J]. 吉林大学学报(工学版), 2013, 43(03): 595 -601 .
[10] WANG Tong-jian, CHEN Jin-shi, ZHAO Feng, ZHAO Qing-bo, LIU Xin-hui, YUAN Hua-shan. Mechanical-hydraulic co-simulation and experiment of full hydraulic steering systems[J]. 吉林大学学报(工学版), 2013, 43(03): 607 -612 .