DOI QR코드

DOI QR Code

A Nonuniform Sampling Technique and Its Application to Speech Coding

비균등 표본화 기법과 음성 부호화로의 응용

  • Iem, Byeong-Gwan (Department of Electronic Engineering, Gangneung-Wonju National University)
  • 임병관 (강릉원주대학교 전자공학과)
  • Received : 2013.10.18
  • Accepted : 2013.12.11
  • Published : 2014.02.25

Abstract

For a signal such as speech showing piece-wise linear shape in a very short time period, a nonuniform sampling method based on the inflection point detection (IPD) is proposed to reduce data rate. The method exploits the geometrical characteristics of signal further than the existing local maxima/minima detection (MMD) based sampling method. As results, the reconstructed signal by the interpolation of the IPD based sampled data resembles the original speech more. Computer simulation shows that the proposed IPD based method produces about 9~23 dB improvement over the existing MMD method. To show the usefulness of the IPD technique, it is applied to speech coding, and compared to the continuously variable slope delta modulation (CVSD). The nonuniformly sampled data is binary coded with one bit flag set "1". Noninflection samples are not sent, but only flag bits set 0 are sent. The method shows 0.3 ~ 9 dB SNR and 0.5 ~ 1.3 mean opinion score (MOS) improvements over the CVSD.

국소적으로 선형적인 특성을 보이는 음성신호와 같은 신호의 데이터율을 감소시키기 위하여 변곡점 검출에 기반한 비균등 표본화 방법을 제안한다. 국소적인 최대값과 최소값 검출에 기반하여 비균등 표본화를 수행하는 기존의 방법에 비하여 변곡점에 기반한 비균등 표본화는 신호의 기하학적인 특징을 충실히 활용한다. 결과로서, 변곡점 검출에 기반하여 비균등 표본화된 데이터를 보간법으로 처리하면 기존의 방법보다 원시신호를 정밀하게 복원할 수 있다. 컴퓨터 모의실험을 통하여 기존의 최대값/최소값 검출 방법에 비교해서 제안된 변곡점 검출 기반의 비균등 표본화가 약 9~23dB의 신호대 잡음비 개선효과가 있음을 확인하였다. 제안된 변곡점 검출 기반의 비균등 부호화의 유용성을 보이기 위하여 음성신호의 부호화에 적용하였으며, Continuously variable slope delta modulation (CVSD)방법과 성능을 비교하였다. 변곡점 표본은 1로 설정된 플래그와 함께 이진수로 전송되며, 비 변곡점은 플래그만 0으로 설정된다. 음성신호에 따라 약 0.3 ~ 9dB의 신호대 잡음비 개선효과가 있으며, 주관적인 성능지표인 Mean Opinion Score (MOS)는 약 0.5 ~ 1.3 개선되었다.

Keywords

References

  1. A.M. Kondoz, Digital Speech, John Wiley & Sons, England, 1994.
  2. L. D. Davisson, "Data compression using straight line interpolation," IEEE Trans. on Information Theory, vol. IT-14, No.3, pp. 390-394, 1968.
  3. J. W. Mark, and T. D. Todd, "A nonuniform sampling approach to data compression," IEEE Trans. on Communications, vol. COM-29, No.1, pp. 24-32, 1981.
  4. M. Budaes, and L. Goras, "On speech signal reconstruction from local extreme values," Proc. of ISSCS, vol. I, pp. 315-318, 2005.
  5. S. Elramly, S. G. Foda, and M. El-shafie, "Continuous variable sampling rate, application on speech," Proc. of IEEE ISCC, pp. 189-193, 1997.
  6. M. R. Nakhai, and F. A. Marvasti, "Application of extremum sampling in speech coding," Proc. of IEEE ICASSP , vol. 6, pp. 3842-3845, 2000.
  7. T. Fjallbrant, "Method of data reduction of sampled speech signals by using nonuniform sampling and a time-variable digital filter," Electronics Letters, vol. 13, No.11, pp. 334-335, 1977. https://doi.org/10.1049/el:19770243
  8. P. K. Ghosh, and T. V. Sreenivas, "Dynamic programming based optimum non-uniform samples for speech reconstruction and coding," Proc. ICASSP, vol. I, pp. 1221-1224, 2006.
  9. M. Bae, W. Lee, and S. Im, "On a new vocoder technique by the nonuniform sampling," Proc. of IEEE MILCOM, vol.2, pp. 649-652, 1996.
  10. G. Lee and W. Kim, "Robust speech parameters for the emotional speech recognition," Journal of the Korea Institute of Intelligent Systems, vol. 22, pp. 681-686, 2012. https://doi.org/10.5391/JKIIS.2012.22.6.681
  11. W. Kim, "Emotion robust speech recognition using speech transformation," Journal of the Korea Institute of Intelligent Systems, vol. 20, pp. 683-687, 2010. https://doi.org/10.5391/JKIIS.2010.20.5.683
  12. B. Boashash, "Estimating and interpreting the instantaneous frequency of a signal-Part 2: Algorithms and applications," Proc. IEEE, vol. 80, pp. 540-568, 1992. https://doi.org/10.1109/5.135378
  13. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The art of scientific computing, Cambridge University Press, London, U.K., 1986.
  14. L. Rabiner and R. Schafer, Digital processing of speech signals, Prentice-Hall, NJ, 1978.

Cited by

  1. A Low Bit Rate Speech Coder Based on the Inflection Point Detection vol.15, pp.4, 2015, https://doi.org/10.5391/IJFIS.2015.15.4.300
  2. A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection vol.16, pp.4, 2016, https://doi.org/10.5391/IJFIS.2016.16.4.276