Journal of Jilin University (Information Science Edition) ›› 2024, Vol. 42 ›› Issue (3): 509-515.

Previous Articles     Next Articles

Research on Tibetan Driven Visual Speech Synthesis Algorithm Based on Audio Matching

HAN Xi, LIANG Kai, YUE Yu   

  1. Ganzi Prefecture Science and Technology Information Research Institute, Kangding 626000, China
  • Received:2023-04-24 Online:2024-06-18 Published:2024-06-17

Abstract: In order to solve the problems of low lip contour detection accuracy and poor visual speech synthesis effect, a Tibetan-driven visual speech synthesis algorithm based on audio matching is proposed. This algorithm extracts short-term energy and short-term zero-crossing rate from Tibetan-language-driven visual speech signal, establishes short-term autocorrelation function of speech signal, and extracts feature information in speech signal, so as to obtain the pitch track of Tibetan speech signal. Secondly, the temporal and spatial analysis model of lip is established to analyze the changing trend of lip contour in the pronunciation process, and the feature of lip contour is extracted by principal component analysis. Finally, the correlation between audio features and lip contour features is obtained through the input-output hidden Markov model, and Tibetan-driven visual speech is synthesized on the basis of audio matching. Experimental results show that the proposed method has high lip contour detection accuracy and good visual speech synthesis effect. 

Key words: audio matching, short time autocorrelation function, spatiotemporal analysis model, principal component analysis method, visual speech synthesis

CLC Number: 

  • TP391. 42