DOI QR코드

DOI QR Code

Target Speech Segregation Using Non-parametric Correlation Feature Extraction in CASA System

CASA 시스템의 비모수적 상관 특징 추출을 이용한 목적 음성 분리

  • 최태웅 (광운대학교 컴퓨터공학과) ;
  • 김순협 (광운대학교 컴퓨터공학과)
  • Received : 2012.09.20
  • Accepted : 2012.11.19
  • Published : 2013.01.31

Abstract

Feature extraction of CASA system uses time continuity and channel similarity and makes correlogram of auditory elements for the use. In case of using feature extraction with cross correlation coefficient for channel similarity, it has much computational complexity in order to display correlation quantitatively. Therefore, this paper suggests feature extraction method using non-parametric correlation coefficient in order to reduce computational complexity when extracting the feature and tests to segregate target speech by CASA system. As a result of measuring SNR (Signal to Noise Ratio) for the performance evaluation of target speech segregation, the proposed method shows a slight improvement of 0.14 dB on average over the conventional method.

CASA 시스템의 특징 추출은 시간의 연속성과 채널 간 유사성을 이용하여 청각 요소의 상관지도를 구성하여 사용한다. 채널 간 유사성을 교차 상관 계수를 이용하여 특징 추출 할 경우 상관성을 정량적으로 나타내기 위해 계산량이 많은 단점이 있다. 따라서 본 논문에서는 특징 추출 시 계산 량을 줄이기 위한 방법으로 비모수적 상관 계수를 이용한 특징 추출 방법을 제안하고 이를 CASA 시스템을 통하여 목적 음성을 분리하는 실험을 수행하였다. 목적 음성의 분리 성능을 평가하기 위하여 신호 대 잡음비를 측정한 결과, 제안 방식이 기존 방식에 비해 평균 0.14 dB의 미세한 성능 개선을 보였다.

Keywords

References

  1. S. M. Naqvi, M. Yu and J. A. Chamber, "A multimodal approach to blind source separation of moving sources," IEEE Trans. Signal Process. 4, 895-910 (2010).
  2. A. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge, MIT Press, USA: MA, 1990).
  3. Y. Shao, S. Srinivasan, Z. Jin, and D. L. Wang, "A computational auditory scene analysis system for robust speech recognition," Comput. Speech Lang. 24, 77-93 (2010). https://doi.org/10.1016/j.csl.2008.03.004
  4. P. Li, Y. Guan, B. Xu, and W. Liu "Monaural speech separation based on computational auditory analysis and objective quality assessment of speech," IEEE Trans. Audio Speech Lang. Process. 14, 2014-2022 (2006). https://doi.org/10.1109/TASL.2006.883258
  5. A. P. Klapuri, "Multipitch analysis of polyphonic music and speech signals using an auditory model," IEEE Trans. Audio Speech Lang. Process. 16, 255-266 (2008). https://doi.org/10.1109/TASL.2007.908129
  6. L. Lin and E. Ambikairajah, "Auditory filterbank inversion," in Proc. ISCAS 2001, 537-540 (2001).
  7. B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data," Hearing Research 47, 103-138 (1990). https://doi.org/10.1016/0378-5955(90)90170-T
  8. G. Hu, and D. L. Wang, "A tandem algorithm for pitch estimation and voiced speech segregation," IEEE Trans. Audio Speech Lang. Process. 18, 2067- 2079 (2010). https://doi.org/10.1109/TASL.2010.2041110
  9. S. Y. Cho, D. M Sun and Z. D. Qiu, "A spearman correlation coefficient ranking for matching-score fusion on speaker recognition," in Proc. TENCON, 736-741 (2011).
  10. G. Hu, Perception and Neurodynamics Laboratory, http://www.cse.ohio-state.edu/pnl/corpus/, 2010.
  11. D. L. Wang and G. J. Brown, "Separation of speech from interfering sounds based on oscillatory correlation," IEEE Trans. Neural Networks 10, 684-697 (1999). https://doi.org/10.1109/72.761727

Cited by

  1. Improvement of Speech Detection Using ERB Feature Extraction vol.79, pp.4, 2014, https://doi.org/10.1007/s11277-014-1752-9
  2. Semantic Ontology Speech Recognition Performance Improvement using ERB Filter vol.12, pp.10, 2014, https://doi.org/10.14400/JDC.2014.12.10.265