吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (1): 218-226.doi: 10.13229/j.cnki.jdxbgxb201701032

• Orginal Article • Previous Articles     Next Articles

Multiple binary codes based on entropy selection

ZHAO Hong-wei1, 2, WANG Zhen1, YANG Wen-di3, LIU Ping-ping1, 2   

  1. 1.College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University,Changchun 130012, China;
    3.School of Computer Science and Software Engineering, East China Normal University,Shanghai 200062,China
  • Received:2015-11-16 Online:2017-01-20 Published:2017-01-20

Abstract: Searching for Approximate Nearest Neighbors (ANN) of high dimensional floating point data has to compute their expensive Euclidean distances, and the memory occupancy rate is high. In order to fix such problem, an algorithm is proposed to effectively and efficiently map high dimensional floating point data into low dimensional binary codes, while preserving the normalized distance similarity. As a result, Hamming distances can be used to instead the Euclidean distances during ANN search process. To guarantee the ANN search performance of obtained binary codes, the distribution adaptive binary labels of the training data are firstly acquired based on the look-up mechanism. Then, the classifaction planes are obtained on the basis of SVM algorithm, and the one with the highest entropy value is chosen as the final hashing function. In order to further improve the retrieval performance, the retrieval system based on multiple binary codes is proposed. During the training process, different kinds of original encoding centers are chosen to obtain multiple hashing functions and multiple binary codes. During the search stage, the points with the minimal average Hamming distances are returned as query results. Experiments show that the proposed algorithm can efficiently and effectively map the floating point data into superior binary codes, and has excellent ANN search performance when compared with other state-of-art methods.

Key words: computer application, approximate nearest neighbor search, binary codes, Hashing algorithm, entropy

CLC Number: 

  • TP391.41
[1] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[2] Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3): 145-175.
[3] Wang Jun, Kumar S, Chang S F. Semi-supervised hashing for scalable image retrieval[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR),San Francisco,Ca,USA,2010:3424-3431.
[4] Weiss Y, Fergus R, Torralba A. Multidimensional Spectral Hashing[M]. Berlin Heidelberg, Springer, 2012: 340-353.
[5] Weiss Y, Torralba A, Fergus R. Spectral hashing[C]∥Proceedings of the Advances in Neural Information Processing Systems, Vancouver,British Columbia,Canada,2008:1753-1760.
[6] Liu Wei, Wang Jun, Ji Rong-rong, et al. Supervised Hashing with kernels[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Rhode Island, 2012: 2074-2081.
[7] Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality[C]∥Proceedings of the 30th Annual ACM Symposium on Theory of Computing, Dallas, Texas, 1998: 604-613.
[8] Gong Y C, Lazebnik S, Gordo A, et al. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2916-2929.
[9] MacQueen J. Some methods for classification and analysis of multivariate observations[C]∥Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, 1967: 281-297.
[10] He Kai-ming, Wen Fang, Sun Jian. K -means hashing: an affinity-preserving quantization method for learning binary compact codes[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, 2013: 2938-2945.
[11] Jegou H, Douze M, Schmid C, et al. Aggregating local descriptors into a compact image representation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 3304-3311.
[12] Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128.
[13] Ge Tie-zheng, He Kai-ming, Ke Qi-fa, et al. Optimized product quantization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 744-755.
[14] Gong Yun-chao, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, 2011: 817-824.
[15] Wang Jian-feng, Wang Jing-dong, Yu Neng-hai, et al. Order preserving hashing for approximate nearest neighbor search[C]∥Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 2013: 133-142.
[16] Salakhutdinov R, Hinton G. Semantic hashing[J]. International Journal of Approximate Reasoning, 2009, 50(7): 969-978.
[17] Norouzi M, Blei D M. Minimal loss hashing for compact binary codes[C]∥Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, Washington, 2011: 353-360.
[18] Norouzi M, Blei D M, Salakhutdinov R. Hamming distance metric learning[C]∥Proceedings of the Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, 2012: 1061-1069.
[19] Wang Jun, Liu Wei, Sun A X, et al. Learning hash codes with listwise supervision[C]∥Proceedings of the Computer Vision (ICCV), Sydney, Australia, 2013: 3032-3039.
[20] Zhang Dell, Wang Jun, Cal Deng, et al. Self-taught hashing for fast similarity search[C]∥Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, 2010: 18-25.
[21] Zhu Xiao-feng, Huang Zi, Cheng Hong, et al. Sparse hashing for fast multimedia search[J]. ACM Transactions on Information Systems (TOIS), 2013, 31(2):9.1-9.24.
[22] Lin Guo-sheng, Shen Chun-hua, Suter D, et al. A general two-step approach to learning-based hashing[C]∥Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 2013: 2552-2559.
[23] Russell B C, Torralba A, Murphy K P, et al. LabelMe: a database and web-based tool for image annotation[J]. International Journal of Computer Vision, 2008, 77(1-3): 157-173.
[24] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[R]. Toronto,2009.
[25] Vedaldi A, Fulkerson B. VLFeat: an open and portable library of computer vision algorithms[C]∥Proceedings of the International Conference on Multimedia, Firenze, Italy, 2010: 1469-1472.
[26] Philbin J, Chum O, Isard M, et al. Object retrieval with large vocabularies and fast spatial matching[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, Minnesota, 2007: 1-8.
[27] Liu Wei, Wang Jun, Kumar S, et al. Hashing with graphs[C]∥Proceedings of the International Conference on Machine Learning (ICML-11), Bellevue, Washington, 2011: 1-8.
[1] LIU Fu,ZONG Yu-xuan,KANG Bing,ZHANG Yi-meng,LIN Cai-xia,ZHAO Hong-wei. Dorsal hand vein recognition system based on optimized texture features [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1844-1850.
[2] WANG Li-min,LIU Yang,SUN Ming-hui,LI Mei-hui. Ensemble of unrestricted K-dependence Bayesian classifiers based on Markov blanket [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1851-1858.
[3] JIN Shun-fu,WANG Bao-shuai,HAO Shan-shan,JIA Xiao-guang,HUO Zhan-qiang. Synchronous sleeping based energy saving strategy of reservation virtual machines in cloud data centers and its performance research [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1859-1866.
[4] ZHAO Dong,SUN Ming-yu,ZHU Jin-long,YU Fan-hua,LIU Guang-jie,CHEN Hui-ling. Improved moth-flame optimization method based on combination of particle swarm optimization and simplex method [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1867-1872.
[5] LIU En-ze,WU Wen-fu. Agricultural surface multiple feature decision fusion disease judgment algorithm based on machine vision [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1873-1878.
[6] OUYANG Dan-tong, FAN Qi. Clause-level context-aware open information extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1563-1570.
[7] LIU Fu, LAN Xu-teng, HOU Tao, KANG Bing, LIU Yun, LIN Cai-xia. Metagenomic clustering method based on k-mer frequency optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1593-1599.
[8] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[9] LIU Yuan-ning, LIU Shuai, ZHU Xiao-dong, CHEN Yi-hao, ZHENG Shao-ge, SHEN Chun-zhuang. LOG operator and adaptive optimization Gabor filtering for iris recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1606-1613.
[10] CHE Xiang-jiu, WANG Li, GUO Xiao-xin. Improved boundary detection based on multi-scale cues fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1621-1628.
[11] NIAN Teng-fei, LI Ping, LIN Mei. Micro-morphology and gray entropy analysis of asphalt characteristics functional groups and rheological parameters under freeze-thaw cycles [J]. 吉林大学学报(工学版), 2018, 48(4): 1045-1054.
[12] ZHAO Hong-wei, LIU Yu-qi, DONG Li-yan, WANG Yu, LIU Pei. Dynamic route optimization algorithm based on hybrid in ITS [J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[13] HUANG Hui, FENG Xi-an, WEI Yan, XU Chi, CHEN Hui-ling. An intelligent system based on enhanced kernel extreme learning machine for choosing the second major [J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[14] FU Wen-bo, ZHANG Jie, CHEN Yong-le. Network topology discovery algorithm against routing spoofing attack in Internet of things [J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[15] CAO Jie, SU Zhe, LI Xiao-xu. Image annotation method based on Corr-LDA model [J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!