Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments

잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출

  • Park, Yun-Sik (Department of Electronic Engineering, Inha University) ;
  • Lee, Sang-Min (Department of Electronic Engineering, Inha University)
  • Received : 2011.07.05
  • Accepted : 2011.09.14
  • Published : 2012.01.25

Abstract

In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.

본 논문에서는 잡음환경에서 효과적인 음성을 검출하기 위한 새로운 음성 검출 (VAD, voice activity detection) 알고리즘을 제안한다. 통계적 모델에 기반의 Likelihood ratio (LR)를 통하여 도출되는 전역 음성부재확률 (GSAP, global speech absence probability)은 음성검출을 위한 피쳐 (feature) 파라미터로 널리 적용되고 있다. 하지만 신호 대 잡음 비 (SNR, signal-to-noise ratio)가 낮은 잡음환경에서는 정확한 GSAP 추정이 어려운 문제점을 가지고 있다. 따라서 제안된 방법에서는 잡음환경에서 강인한 VAD 알고리즘을 위하여 Teager energy (TE) 기반의 GSAP를 피쳐 파라미터로 적용한다. 제안된 알고리즘은 기존의 방법과 객관적인 실험을 통해 비교 평가한 결과 다양한 배경잡음 환경에서 향상된 성능을 보였다.

Keywords

References

  1. L. Karray, C. Mokbel and J. Monne, "Solutions for robust. speech/non-speech detection in wireless environment," presented at the IVTTA, Sep. 1988.
  2. L. R. Rabiner and M. R. Sambur, "Voicedunvoiced- silence detection using the Itakura LPC distance measure," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., pp. 323-326, May 1977.
  3. J. Sohn, N. S. Kim and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, Jan. 1999.
  4. F. Jabloun, A. E. Cetin and E. Erzin, "Teager energy based feature parameters for speech recognition in car noise," IEEE Signal Processing Letters, vol. 6, pp. 259-261, 1999. https://doi.org/10.1109/97.789604
  5. K. C. Wang and Y. H. Tsai, "Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy," Second International Symposium on Universal Communication 2008, pp. 423-428, Dec. 2008.
  6. R. J. McAualy and M. L. Malpass, "Speech enhancement using a soft-decision noise suppression filter," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, pp. 137-145, Apr. 1980.
  7. J. Sohn, W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 365-368, 1998.
  8. N. S. Kim and J.-H. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Processing Letters, vol. 7, no. 5, pp. 108-110, May 2000. https://doi.org/10.1109/97.841154
  9. Rix, A. W., Beerends, J. G., Hollier, M. P. and Hekstra, A. P. "Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2, pp.749-752, May 2001.
  10. Yi Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. ASLP, vol. 16, pp. 229-238, Jan. 2008.
  11. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.