DOI QR코드

DOI QR Code

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment

모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구

  • 송민규 (전남대학교 대학원 전자공학과) ;
  • ;
  • 김진영 (전남대학교 대학원 전자공학과) ;
  • 황성택 (삼성전자 통신연구소 멀티미디어 Lab.)
  • Received : 2008.11.24
  • Accepted : 2009.08.06
  • Published : 2009.08.25

Abstract

Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.

음성 인식 기술은 편리한 삶을 추구하는 요즘 추세에 HMI를 위해 매력적인 기술이다. 음성 인식기술에 대한 많은 연구가 진행되고 있으나 여전히 잡음 환경에서의 성능은 취약하다. 이를 해결하기 위해 요즘은 청각 정보 뿐 아니라 시각 정보를 이용하는 시각 음성인식에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 모바일 환경에서의 시각 음성인식을 위한 입술의 탐지 방법을 제안한다. 시각 음성인식을 위해서는 정확한 입술의 탐지가 필요하다. 우리는 입력 영상에서 입술에 비해 보다 찾기 쉬운 눈을 이용하여 눈의 위치를 먼저 탐지한 후 이 정보를 이용하여 대략적인 입술 영상을 구한다. 구해진 입술 영상에 K-means 집단화 알고리듬을 이용하여 영역을 분할하고 분할된 영역들 중 가장 큰 영역을 선택하여 입술의 양 끝점과 중심을 얻는다. 마지막으로, 실험을 통하여 제안된 기법의 성능을 확인하였다.

Keywords

References

  1. Pedro J. Moreno, 'Speech Recognition in Noisy Environment,' Ph.D. Thesis, ECE Department, CMU, May 1996
  2. S. Dupont and J. Luettin, 'Audio-Visual Speech Modelling for Continuous Speech Recognition,' Proceedings of IEEE Transactions on Multimedia, pp.141-151, 2000
  3. J. N. Gowdy, A. Subramanya,. C. Bartels, J. Bilmes, 'DBN-based muti-stream models for audio-visual speech recognition.' proc. IEEE Int. conf. Acoustics, Speech, and Signal Processing, pp.993-996, 2004
  4. Jeff A. Bilmes and Chris Bartels, 'Graphical Model Architectures for Speech Recognition,' IEEE Signal Processing Magazine, vol.22, pp.89-100, 2005
  5. E. Saber, A. M. Tekalp, 'Frontal-view Face Detection and Facial Feature Extraction Using color, Shape and Symmetry Based Cost Function,' Pattern Recognition Letters 19, pp.669-680, 1998 https://doi.org/10.1016/S0167-8655(98)00044-0
  6. N. Eyeno, A. Caplier, P. Y. Coulon, 'A New Color transformation for Lips Segmention,' Proceedings of IEEE International conference on Acoustic, Speech, Signal Processing, pp.557-560, 1993
  7. R. Stiefelhagen, U. Meier, J. Yang, 'Real-Time Lip-Tracking for Lip Reading,' Proceedings of Eurospeech 97, 5th European Conference on Speech Communication and Technology, 1997
  8. J. Yang, R. Stiefelhagne, U. Meier, A. Waibel, 'Real-Time Face and Facial Feature Tracking and Application,' Proc. Auditory-Visual Speech Processing, pp.79-84, 1998
  9. B. Moghaddam, A. Pentland, ' Probabilistic Visual Learning for Object Detection,' IEEE ICCV'95, pp.786-793, 1995
  10. B. Moghaddam, W Wahid, A. Pentland, 'Beyond Eigen Faces : Probabolostoc Matching for Face Recognition,' IEEE Conf. Automatic Face and Gesture Recognition, pp.30-35, 1998
  11. T. T. pham, J. Y. Kim, S. Y. Na, S. T. Hwang, 'Robust Eye Localization for Lip Reading in Mobile Environment,' Proceddings of SCIS&ISIS in Japan, pp.385-388, 2008
  12. MacQueen, J. B. 'Some Methods for Classification and Analysis of Multivariate Observations,' In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, 1967
  13. Andrew W. Moore, 'K-means and Hierarchical Cl-ustering', Tutorial Slides in School of Computer Science Carnegie Mellon University, http://www.cs.cmu.edu/~awm, http://www.autonlab.org/tutorials/kmeans11.pdf

Cited by

  1. Monosyllable Speech Recognition through Facial Movement Analysis vol.63, pp.6, 2014, https://doi.org/10.5370/KIEE.2014.63.6.813