Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments

Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo;

doi:10.13064/KSSS.2013.5.1.011

Phonetics and Speech Sciences (말소리와 음성과학)

Volume 5 Issue 1
/
Pages.11-16
/
2013
/
2005-8063(pISSN)
/
2586-5854(eISSN)

Korean Society of Speech Sciences (한국음성학회)

DOI QR Code

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments

음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출

홍정표 (한국과학기술원, 전기 및 전자공학과) ;
박상준 (한국과학기술원, 전기 및 전자공학과) ;
정상배 (경상대학교, 전자공학과(공학연구원)) ;
한민수 (한국과학기술원, 전기 및 전자공학과)

Received : 2012.11.06
Accepted : 2012.03.13
Published : 2013.03.31

https://doi.org/10.13064/KSSS.2013.5.1.011 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Keywords

References

Rabiner, L.R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, Vol. 54, No. 2, 297-315. https://doi.org/10.1002/j.1538-7305.1975.tb02840.x
Zoltan, T. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. Interspeech, 245-248.
Ouzounov, A. (2004). A robust feature for speech detection. Cybernetics and information technologies, Vol. 4, No. 2, 3-14.
Kondoz, A.M. (1994). Digital speech: coding for low bit rate communication system. UK: John Wiley & Sons.
Rabiner, L.R. (1978). Digital processing of speech signals. USA: Prentice-Hall.
Jeong, S. (2001). Speech quality and recognition rate improvement in car noise environments. Electronics Letters, Vol. 37, No. 12, 801-802.
ETSI Std. (2005). Speech processing, transmission and quality aspects (STQ); distributed speech recognition; extended advanced front-end feature extraction algorithm; compression algorithm; back-end speech reconstruction algorithm. ES 202212 V1.1.2.
Brandstein, M. (2001). Microphone arrays: signal processing techniques and applications. Berlin: Springer.
Qi, Y. (1997). Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. Journal of Acoustical Society of America. Vol. 102, No. 1, 537-543. https://doi.org/10.1121/1.419726
Hirsch, H. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000.

Phonetics and Speech Sciences (말소리와 음성과학)

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments

음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)