DOI QR코드

DOI QR Code

Improvement of Environmental Sounds Recognition by Post Processing

후처리를 이용한 환경음 인식 성능 개선

  • 박준규 (전남대학교 전자컴퓨터공학부) ;
  • 백성준 (전남대학교 전자컴퓨터공학부)
  • Received : 2010.03.02
  • Accepted : 2010.07.06
  • Published : 2010.07.28

Abstract

In this study, we prepared the real environmental sound data sets arising from people's movement comprising 9 different environment types. The environmental sounds are pre-processed with pre-emphasis and Hamming window, then go into the classification experiments with the extracted features using MFCC (Mel-Frequency Cepstral Coefficients). The GMM (Gaussian Mixture Model) classifier without post processing tends to yield abruptly changing classification results since it does not consider the results of the neighboring frames. Hence we proposed the post processing methods which suppress abruptly changing classification results by taking the probability or the rank of the neighboring frames into account. According to the experimental results, the method using the probability of neighboring frames improve the recognition performance by more than 10% when compared with the method without post processing.

본 연구에 사용된 환경음은 9 가지 상황으로 구분하였으며 생활 속에서 인간의 이동에 따라 변화하는 실제 환경음과 동일한 테스트 데이터 셋을 이용하였다. 실제 환경에서 녹음된 데이터는 Pre-emphasis, Hamming window를 이용하여 전처리하고 MFCC (Mel-Frequency Cepstral Coefficients) 방식으로 특징을 추출한 후 GMM (Gaussian Mixture Model)을 이용하여 분류 실험을 행했다. 후처리가 없는 GMM은 프레임 별로 판정하므로 분류 결과를 보면 상황이 갑자기 변화하는 이상 결과가 나타난다. 이에 본 연구에서는 인접한 프레임 별 확률 값 혹은 분류 순위를 이용해서 갑작스런 상황 변화가 발생하지 않도록 하는 후처리 방식을 제안하였다. 실험 결과에 따르면 GMM 분류방식에 인접 프레임들의 사후확률 값을 이용하는 후처리방법을 적용한 경우 후처리를 적용하지 않은 경우에 비해 10% 이상 평균 인식률이 개선되는 것을 확인할 수 있었다.

Keywords

References

  1. http://www.teco.edu/tea/
  2. L. Ma, B. P. Milner, and D. Smith, “Acoustic environment classification,” ACM Transactions on Speech and Language Processing, Vol.3, No.2, pp.1-22. 2006. https://doi.org/10.1145/1149290.1149292
  3. Y. Toyoda, J. Huang, S. Ding, and Y. Liu, “Environmental sound recognition by multilayered neural networks,” International Conference on Computer and Information Technology, pp.123-127, 2004. https://doi.org/10.1109/CIT.2004.1357184
  4. L. Couvreur and M. Laniray, “Automatic noise recognition in urban environments based on artificial neural networks and hidden Markov models,” InterNoise, Prague, Czech Republic, pp.1-8. 2004.
  5. N. Sawhney, “Situational awareness from environmental sounds,” MIT Media Lab. Technical Report, 1997.
  6. S. Chu, S. Narayana, C.-C. J. Kuo, and M. J. Mataric, "Where am I? Scene recognition for mobile robots using audio features," in Proc. ICME, 2006. https://doi.org/10.1109/ICME.2006.262661
  7. R. G. Malkin and A. Waibel, "Classifying user environment for mobile applications using linear autoencoding of ambient audio," in Proc. ICASSP, 2005. https://doi.org/10.1109/ICASSP.2005.1416352
  8. A. Eronen, V. Peltonen, J. Tuomi, A. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho, and J. Huopaniemi, "Audio-based context recognition,“ IEEE Trans. on Audio, Speech, and Language Processing, Vol.14, No.1, pp.321-329, 2006(1). https://doi.org/10.1109/TSA.2005.854103
  9. S. Chu, S. Narayanan, and C.-C. Jay Kuo “Environmental Sound Recognition With Time-Frequency Audio Features,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.17, No.6, pp.1-16, 2009. https://doi.org/10.1109/TASL.2008.2010365
  10. C. M. Bishop, Neural networks for pattern recognition, Oxford University Press, UK, 1995.
  11. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classfication, Jone Wiley & Son, Inc. 2001.