吉林大学学报(工学版) ›› 2013, Vol. 43 ›› Issue (06): 1626-1630.doi: 10.7964/jdxbgxb201306031

• paper • Previous Articles     Next Articles

Dimension reduction for data streams based on Johnson-Lindenstrauss transform

YANG Jing, ZHAO Jia-shi, ZHANG Jian-pei   

  1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
  • Received:2012-07-20 Online:2013-11-01 Published:2013-11-01

Abstract:

A dimension reduction method is proposed for data streams based on Johnson-Lindenstrauss transform to meet the requirement of real time performance, which can also be used in asynchronous cases. This method uses a Johnson-Lindenstrauss transform to project the stream data from high dimensional space to low dimensional space, and the pairwise distance is preserved after the projection. The transform matrix is constructed by random hash functions, and the random hash functions are generated by the seeds, therefore the dimension reduction process is completed at the same time of the updates. This method reduces the computational complexity, and an algorithmic sublinear-time dimension reduction of data stream is gained. The experimental results show that this method can improve the efficiency and get a good accuracy.

Key words: artificial intelligence, Johnson-Lindenstrauss transform, asynchronization data stream

CLC Number: 

  • TP18

[1] Aggarwal C C. Data Streams: Models and Algorithms[M]. New York: Springer-Verlag, 2007.

[2] 姚劲勃, 余宜诚, 于卓尔, 等. 基于PCA降维协同过滤算法的改进[J]. 吉林大学学报: 信息科学版, 2011, 29(5): 494-497. Yao Jin-bo,Yu Yi-cheng,Yu Zhuo-er,et al.Improvement on collaborative filtering algorithm based on PCA default-values[J].Journal of Jilin University(Information Science Edition),2011,29(5):494-497.

[3] Kuzmin D, Warmuth M K. Online kernel PCA with entropic matrix updates [C]//International Confer-ence on Machine Learning, New York, NY, USA,2007: 465-472.

[4] Hisada M, Ozawa S, Zhang K, et al. A novel incremental linear discriminant analysis for multitask pattern recognition problems[J]. Lecture Notes in Computer Science, 2009, 5506: 1163-1171.

[5] Weng J, Zhang Y, Hwang W S. Candid covariance-free incremental principal component analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(8): 1034-1040.

[6] Kim T K, Stenger B, Kittler J, et al. Incremental linear discriminant analysis using sufficient spanning sets and its applications[J]. International Journal of Computer Vision, 2011, 91(2): 216-232.

[7] Indyk P. Stable distributions, pseudorandom generators, embeddings, and data stream computation[J]. Journal of the ACM, 2006, 53(3): 307-323.

[8] Menon A K, Phamg V A, Chawla S, et al. An incre-mental data-stream sketch using sparse random projections[DB/OL].[2012-01-05]. http:[C]//cseweb.ucsd.edu/~akmenon/SparseRP.pdf.

[9] Johnson W B, Lindenstrauss J. Extensions of Lipschitz mappings into a Hilbert space[J]. Contemporary Mathematics, 1984, 26:189-206.

[10] Achlioptas D.Database-friendly random projections:Johnson-Linden strauss with binary coins[J].Journal of Computer and System Sciences,2003,66(4):671-687.

[11] Daniel M K,Nelson J.Sparser Johnson-Lindenstrauss transforms[C]//In Proceddings of the 23th Annual ACM-SIAM Symposium on Discrete Algorithms,NewYork,USA,2012:1195-1206.

[1] DONG Sa, LIU Da-you, OUYANG Ruo-chuan, ZHU Yun-gang, LI Li-na. Logistic regression classification in networked data with heterophily based on second-order Markov assumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1571-1577.
[2] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
[3] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[4] ZHANG Hao, ZHAN Meng-ping, GUO Liu-xiang, LI Zhi, LIU Yuan-ning, ZHANG Chun-he, CHANG Hao-wu, WANG Zhi-qiang. Human exogenous plant miRNA cross-kingdom regulatory modeling based on high-throughout data [J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[5] HUANG Lan, JI Lin-ying, YAO Gang, ZHAI Rui-feng, BAI Tian. Construction of disease-symptom semantic net for misdiagnosis prompt [J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[6] LI Xiong-fei, FENG Ting-ting, LUO Shi, ZHANG Xiao-li. Automatic music composition algorithm based on recurrent neural network [J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[7] LIU Jie, ZHANG Ping, GAO Wan-fu. Feature selection method based on conditional relevance [J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[8] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[9] YANG Xin, XIA Si-jun, LIU Dong-xue, FEI Shu-min, HU Yin-ji. Target tracking based on improved accelerated gradient under tracking-learning-detection framework [J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[10] LIU Xue-juan, YUAN Jia-bin, XU Juan, DUAN Bo-jia. Quantum k-means algorithm [J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[11] QU Hui-yan, ZHAO Wei, QIN Ai-hong. A fast collision detection algorithm based on optimization operator [J]. 吉林大学学报(工学版), 2017, 47(5): 1598-1603.
[12] LI Jia-fei, SUN Xiao-yu. Clustering method for uncertain data based on spectral decomposition [J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
[13] SHAO Ke-yong, CHEN Feng, WANG Ting-ting, WANG Ji-chi, ZHOU Li-peng. Full state based adaptive control of fractional order chaotic system without equilibrium point [J]. 吉林大学学报(工学版), 2017, 47(4): 1225-1230.
[14] WANG Sheng-sheng, WANG Chuang-feng, GU Fang-ming. Spatio-temporal reasoning for OPRA direction relation network [J]. 吉林大学学报(工学版), 2017, 47(4): 1238-1243.
[15] MA Miao, LI Yi-bin. Multi-level image sequences and convolutional neural networks based human action recognition method [J]. 吉林大学学报(工学版), 2017, 47(4): 1244-1252.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!