吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (1): 294-300.doi: 10.13229/j.cnki.jdxbgxb201701043

• 论文 • 上一篇    下一篇

基于语谱图行投影的特定人二字汉语词汇识别

梁士利1, 魏莹1, 潘迪1, 张玲2, 许廷发3, 王双维1   

  1. 1.东北师范大学 物理学院,长春 130024;
    2.长春理工大学 理学院,长春 130022;
    3.北京理工大学 光电成像技术与系统教育部重点实验室,北京 100081
  • 收稿日期:2015-05-30 出版日期:2017-01-20 发布日期:2017-01-20
  • 通讯作者: 王双维(1957-),男,教授.研究方向:声信号处理.E-mail:wsw77@nenu.edu.cn
  • 作者简介:梁士利(1968-),男,副教授,博士.研究方向:信号处理.E-mail:lsl@nenu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61471111).

Recognition to specific two words Chinese vocabulary based on projection matrix of spectrogram

LIANG Shi-li1, WEI Ying1, PAN Di1, ZHANG Ling2, XU Ting-fa3, WANG Shuang-wei1   

  1. 1.College of Physics,Northeast Normal University,Changchun 130024,China;
    2.College of Science,Changchun University of Science and Technology,Changchun 130022,China;
    3.Key Laboratory of Photoelectronic Imaging Technology and System of Ministry of Education,Beijing Institute of Technology,Beijing 100081,China
  • Received:2015-05-30 Online:2017-01-20 Published:2017-01-20

摘要: 将图像处理技术应用到语音识别领域,在图像特征提取过程中,首先对语谱图进行等宽度分带行投影和二进宽度分带行投影,分别作为窄带语谱图的第1个特征集合和第2个特征集合,同时将语谱图进行再次图像傅里叶变换之后进行等宽度行投影,作为第3个特征集合。将上述3个特征集构造为特定人二字汉语词汇识别的特征向量,以支持向量机为分类器进行特定人二字汉语词汇整体识别。采用1000个语音样本进行仿真实验,结果表明,该方法对特定人二字汉语词汇的识别率可达92.8%,为汉语词汇的识别提供了新的思路。

关键词: 通信技术, 语音识别, 语谱图, 行投影, 支持向量机

Abstract: In the process of image feature extraction, the image processing technique is applied to the speech recognition. First, equal width zoning line projection and binary width zoning line projection are carried out to the spectrogram, which are taken as the spectrogram of the first characteristic set and the second characteristic set, respectively. Meanwhile, equal width zoning line projection is carried out again to the spectrogram after Fourier transform, treating as the third feature set. Then, the above three feature sets are used as feature vectors to Support Vector Machine (SVM) as a classifier for the overall recognition of specific two words Chinese vocabulary. 1000 voice samples are used in simulation experiment. The results show that the correct recognition rate of this method is 92.8%, and it provides a new way of thinking of Chinese vocabulary overall recognition.

Key words: communication, speech recognition, spectrogram, line projection, support vector machine

中图分类号: 

  • TN912
[1] 赵力.语音信号处理[M].北京:机械工业出版社,2009.
[2] 蔡莲红,黄德智,蔡锐. 现代语音技术基础与应用[M]. 北京:清华大学出版社,2003.
[3] 潘凌云,孙达传,吴美朝. 语音识别中基于语谱图的语音音素分割方法[J]. 杭州大学学报:自然科学版,1995, 22(1):42-46.
Pan Ling-yun,Sun Da-chuan,Wu Mei-chao. A method of automatic segmentation for speech recognition based on spectrograms[J]. Journal of Hangzhou University(Natural Science),1995,22(1):42-46.
[4] Zhang Shi-xiong, Gales M J. Structured SVMs for automatic speech recognition[J]. IEEE Transactions on Audio, Speech and Language Processing, 2013, 21(3):544-555.
[5] Khunarsal P, Lursinsap C, Raicharoen T.Singing voice recognition based on matching of spectrogram pattern[C]∥Proceedings of International Joint Conference on Neural Networks, NewYork, 2009: 1595-1599.
[6] Ueda Y,Sakata T, Ikeda H, et al. Development of speech analysis and representation tool using visualized speech[J]. Journal of the Institute of Image Information and Television Engineers, 2007, 61(5): 692-698.
[7] Zhang Xue-ying, Liu Xiao-feng, Wang Zi-zhong. Evaluation of a set of new ORF kernel functions of SVM for speech recognition[J]. Engineering Applications of Artificial Intelligence, 2013, 26(10): 2574-2580.
[8] Zergat K Y, Amrouche A. New scheme based on GMM-PCA-SVM modeling for automatic speaker recognition[J]. International Journal of Speech Technology, 2014, 17(4):373-381.
[9] 马义德,袁敏,刘悦. 基于 PCNN 的语谱图特征提取在说话人识别中的应用[J].计算机工程与应用,2005, 41(20):81-84.
Ma Yi-de, Yuan Min, Liu Yue. The application of PCNN based on feature extraction in speaker recognition[J]. Computer Engineering and Application, 2005, 41(20):81-84.
[10] Asahi K, Ogawa A. Reduction of noise in speech signals through image processing using the spectrogram[J] .IEEE Transactions on Electronics, Information and Systems, 2006, 126(12):1483-1489.
[11] Ajmera P K, Jadhav D V, Holambe R S, et al. Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram[J]. Pattern Recognition, 2011, 44(10/11):2749-2759.
[12] Wu Di, Zhao He-ming, Huang Cheng-wei, et al. Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter[J]. Acta Acustica, 2014, 39(3):392-399.
[13] Souli S, Lachiri Z. Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters[J]. International Journal of Speech Technology, 2013,16(2):203-213.
[14] Wang Kun-ching.The feature extraction based on texture image information for emotion sensing in speech[J].Sensors (Switzerland),2014,14(9):16692-16714.
[15] 许森,赵旭,段成华,等. 汉语元音识别音语谱图的数学形态处理[J].应用力学与材料,2014,571-572:665-671.
Xu Sen, Zhao Xu, Duan Cheng-hua,et al.A mathematical morophological processing of spectrograms for the tone of Chineses vowels recognition[J]. Applied Mechanics and Materials,2014,571-572:665-671.
[16] Zhang Dong-juan,Tang Wan-you. Research of hot stamping image recognition algorithm based on projection feature[J]. Applied Mechanics and Materials, 2014, 469:240-245.
[17] 赵力.语音信号处理[M].北京:机械工业出版社,2009.
[18] 张悦.语谱图用于特定人组小词汇量识别算法的研究[D].长春:东北师范大学计算机科学与信息技术学院,2013.
Zhang Yue. The study of specific group of small vocabulary recognition algorithm for the spectrogram[D]. Changchun: College of Computer Sciences and Technology,Northeast Normal University, 2013.
[19] 李明宇. 现代汉语常用词表[M]. 北京:商务印书馆出版社,2008.
[20] Chang Chih-chung, Lin Chih-jen. A Library for Support Vector Machines[M]. Taipei: National Taiwan University Press, 2001.
[1] 周彦果,张海林,陈瑞瑞,周韬. 协作网络中采用双层博弈的资源分配方案[J]. 吉林大学学报(工学版), 2018, 48(6): 1879-1886.
[2] 隗海林, 包翠竹, 李洪雪, 李明达. 基于最小二乘支持向量机的怠速时间预测[J]. 吉林大学学报(工学版), 2018, 48(5): 1360-1365.
[3] 孙晓颖, 扈泽正, 杨锦鹏. 基于分层贝叶斯网络的车辆发动机系统电磁脉冲敏感度评估[J]. 吉林大学学报(工学版), 2018, 48(4): 1254-1264.
[4] 董颖, 崔梦瑶, 吴昊, 王雨后. 基于能量预测的分簇可充电无线传感器网络充电调度[J]. 吉林大学学报(工学版), 2018, 48(4): 1265-1273.
[5] 牟宗磊, 宋萍, 翟亚宇, 陈晓笑. 分布式测试系统同步触发脉冲传输时延的高精度测量方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1274-1281.
[6] 丁宁, 常玉春, 赵健博, 王超, 杨小天. 基于USB 3.0的高速CMOS图像传感器数据采集系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1298-1304.
[7] 耿庆田, 于繁华, 王宇婷, 高琦坤. 基于特征融合的车型检测新算法[J]. 吉林大学学报(工学版), 2018, 48(3): 929-935.
[8] 蔡振闹, 吕信恩, 陈慧灵. 基于反向细菌优化支持向量机的躯体化障碍预测模型[J]. 吉林大学学报(工学版), 2018, 48(3): 936-942.
[9] 陈瑞瑞, 张海林. 三维毫米波通信系统的性能分析[J]. 吉林大学学报(工学版), 2018, 48(2): 605-609.
[10] 张超逸, 李金海, 阎跃鹏. 双门限唐检测改进算法[J]. 吉林大学学报(工学版), 2018, 48(2): 610-617.
[11] 关济实, 石要武, 邱建文, 单泽彪, 史红伟. α稳定分布特征指数估计算法[J]. 吉林大学学报(工学版), 2018, 48(2): 618-624.
[12] 李炜, 李亚洁. 基于离散事件触发通信机制的非均匀传输网络化控制系统故障调节与通信满意协同设计[J]. 吉林大学学报(工学版), 2018, 48(1): 245-258.
[13] 孙晓颖, 王震, 杨锦鹏, 扈泽正, 陈建. 基于贝叶斯网络的电子节气门电磁敏感度评估[J]. 吉林大学学报(工学版), 2018, 48(1): 281-289.
[14] 武伟, 王世刚, 赵岩, 韦健, 钟诚. 蜂窝式立体元图像阵列的生成[J]. 吉林大学学报(工学版), 2018, 48(1): 290-294.
[15] 袁建国, 张锡若, 邱飘玉, 王永, 庞宇, 林金朝. OFDM系统中利用循环前缀的非迭代相位噪声抑制算法[J]. 吉林大学学报(工学版), 2018, 48(1): 295-300.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!