DOI QR코드

DOI QR Code

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

  • Received : 2010.02.19
  • Accepted : 2010.07.26
  • Published : 2010.10.31

Abstract

This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.

Keywords

References

  1. Y. Gong, "Speech Recognition in Noisy Environments: a Survey," Speech Commun., vol. 16, no. 3, Apr. 1995, pp. 261-291. https://doi.org/10.1016/0167-6393(94)00059-J
  2. Y. Suh and H. Kim, "Feature Compensation Combining SNRDependent Feature Reconstruction and Class Histogram Equalization," ETRI J., vol. 30, no. 5, Oct. 2008, pp. 753-755. https://doi.org/10.4218/etrij.08.0208.0147
  3. J. Lim and A. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech," Proc. IEEE, vol. 67, no. 12, Dec. 1979, pp. 1586-1604. https://doi.org/10.1109/PROC.1979.11540
  4. ETSI Std. Document, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm," ETSI ES 202 050 V1.1.1 (2002-10).
  5. A. Agarwal and Y. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," Proc. IEEE-ASRU Workshop, 1999, pp. 12-15.
  6. M. Cheng et al., "A Robust Front-End Algorithm for Distributed Speech Recognition," Proc. EUROSPEECH, 2001, pp. 425-428.
  7. D. Macho et al., "Evaluation of a Noise-Robust DSR Front-End on Aurora Databases," Proc. ICSLP, Sept. 2002, pp. 17-20.
  8. S. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans., Acoustics, Speech, Signal Process., vol. 27, no. 2, Apr. 1979, pp. 113-120. https://doi.org/10.1109/TASSP.1979.1163209
  9. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
  10. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 33, no. 2, Apr. 1985, pp. 443-445. https://doi.org/10.1109/TASSP.1985.1164550
  11. W. Wu and P. Chen, "Subband Kalman Filtering for Speech Enhancement," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 45, no. 8, Aug. 1998, pp. 1072-1083. https://doi.org/10.1109/82.718814
  12. J. Gibson, B. Koo, and S. Gray, "Filtering of Colored Noise for Speech Enhancement and Coding," IEEE Trans. Signal Process., vol. 39, no. 8, Aug. 1991, pp. 1732-1742. https://doi.org/10.1109/78.91144
  13. N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. Speech Audio Process., vol. 7, no. 2, Mar. 1999, pp. 126- 137. https://doi.org/10.1109/89.748118
  14. Y. Ephraim, "Statistical-Model-Based Speech Enhancement Systems," Proc. IEEE, vol. 80, no. 10, Oct. 1992, pp. 1526- 1555. https://doi.org/10.1109/5.168664
  15. H. Sameti et al., "HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise," IEEE Trans. Speech Audio Process., vol. 6, Sept. 1998, pp. 445-455. https://doi.org/10.1109/89.709670
  16. J. Wu et al., "A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise," Proc. IEEE-ASRU Workshop, 2003, pp. 321-326.
  17. T. Arakawa, M. Tsujikawa, and R. Isotani, "Model-Based Wiener Filter for Noise Robust Speech Recognition," Proc. ICASSP, 2006, pp. 537-540.
  18. N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley: NY, 1949.
  19. A. Kain and M. Macon, "Spectral Voice Conversion for Text- To-Speech Synthesis," Proc. ICASSP, 1998, pp. 285-288.
  20. K. Park and H.S. Kim, "Narrowband to Wideband Conversion of Speech using GMM based Transformation," Proc. ICASSP, vol. 3, June 2000, pp. 1843-1846.
  21. B. Kang, H. Jung, and Y. Lee, "Discriminative Noise Adaptive Training Approach for an Environment Migration," Proc. INTERSPEECH, Aug. 2007, pp. 2085-2089.
  22. H. Jung, B. Kang, and Y. Lee, "Model Adaptation using Discriminative Noise Adaptive Approach for New Environments," ETRI J., vol. 30, no. 6, Dec. 2008, pp. 865-867. https://doi.org/10.4218/etrij.08.0208.0256
  23. S. Lee et al., "A Commercial Car Navigation System Using Korean Large Vocabulary Automatic Speech Recognizer," Proc. APSIPA ASC, Oct. 2009, pp. 286-289.

Cited by

  1. 수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화 vol.6, pp.1, 2010, https://doi.org/10.13064/ksss.2014.6.1.077
  2. Intra-and Inter-frame Features for Automatic Speech Recognition vol.36, pp.3, 2010, https://doi.org/10.4218/etrij.14.0213.0181
  3. Direction-of-Arrival Based SNR Estimation for Dual-Microphone Speech Enhancement vol.22, pp.12, 2014, https://doi.org/10.1109/taslp.2014.2360646
  4. Bayesian 기법의 모수 추정을 이용한 결정트리 상태 공유 모델링 vol.13, pp.1, 2010, https://doi.org/10.14400/jdc.2015.13.1.243
  5. Hard component detection of transient noise and its removal using empirical mode decomposition and wavelet‐based predictive filter vol.12, pp.7, 2010, https://doi.org/10.1049/iet-spr.2017.0167
  6. Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model vol.41, pp.2, 2019, https://doi.org/10.4218/etrij.2018-0189
  7. Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones vol.19, pp.12, 2010, https://doi.org/10.3390/s19122695
  8. Auditory Device Voice Activity Detection Based on Statistical Likelihood-Ratio Order Statistics vol.10, pp.15, 2010, https://doi.org/10.3390/app10155026
  9. Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction vol.20, pp.20, 2020, https://doi.org/10.3390/s20205751