DOI QR코드

DOI QR Code

Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit Parameter and Vocabulary Clustering

GMM 음소 단위 파라미터와 어휘 클러스터링을 융합한 음성 인식 성능 향상

  • Oh, SangYeob (Division of Computer Engineering, Gachon University)
  • 오상엽 (가천대학교 컴퓨터공학과)
  • Received : 2020.06.19
  • Accepted : 2020.08.20
  • Published : 2020.08.28

Abstract

DNN error is small compared to the conventional speech recognition system, DNN is difficult to parallel training, often the amount of calculations, and requires a large amount of data obtained. In this paper, we generate a phoneme unit to estimate the GMM parameters with each phoneme model parameters from the GMM to solve the problem efficiently. And it suggests ways to improve performance through clustering for a specific vocabulary to effectively apply them. To this end, using three types of word speech database was to have a DB build vocabulary model, the noise processing to extract feature with Warner filters were used in the speech recognition experiments. Results using the proposed method showed a 97.9% recognition rate in speech recognition. In this paper, additional studies are needed to improve the problems of improved over fitting.

DNN은 기존의 음성 인식 시스템에 비해 에러가 적으나 병렬 훈련이 어렵고, 계산의 양이 많으며, 많은 양의 데이터 확보를 필요로 한다. 본 논문에서는 이러한 문제를 효율적으로 해결하기 위해 GMM에서 모델 파라메터를 가지고 음소별 GMM 파라메터를 추정하여 음소 단위를 생성한다. 그리고 이를 효율적으로 적용하기 위해 특정 어휘에 대한 클러스터링을 통해 성능을 향상시키기 위한 방법을 제안한다. 이를 위해 3가지 종류의 단어 음성 데이터베이스를 이용하여 DB를 가지고 어휘 모델을 구축하였고, 잡음 처리는 워너필터를 사용한 특징을 추출하여 음성 인식실험에 사용하였다. 본 논문에서 제안한 방법을 사용한 결과 음성 인식률에서 97.9%의 인식률을 나타내었다. 본 연구에서 개선된 오버피팅의 문제점을 향상시킬 수 있는 추가적인 연구를 필요로 한다.

Keywords

References

  1. C. S. Ahn & S. Y. Oh. (2012). Gaussian model optimizationusing configuration thread control In CHMM vocabulary recognition. Journal of Digital Policy and Management, 10(7), 167-172. DOI : 10.14400/JDPM.2012.10.7.167
  2. C. S. Ahn & S. Y. Oh. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. Journal of Digital Policy and Management, 10(10), 277-282. DOI : 10.14400/JDPM.2012.10.10.277
  3. C. S. Ahn & S. Y. Oh. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. Journal of Digital Policy and Management. 10(11), 377-382. DOI : 10.14400/JDPM.2012.10.11.377
  4. S. Y. Oh & K. Chung. (2018). Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications, 98(4), 3287-3297. DOI : 10.1007/s11277-017-4645-x
  5. K. Chung & S. Y. Oh. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683-1690. DOI : 10.1007/s10586-016-0619-0
  6. K. Chung & S. Y. Oh. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629-635. DOI : 10.1007/s10586-015-0429-9
  7. C. S. Ahn & S. Y. Oh. (2010). Vocabulary recognition post-processing system using phoneme similarity error correction. Journal of the Korea Society of Computer and Information. 15(7), 83-90. DOI : 10.9708/jksci.2010.15.7.083
  8. M. F. Gales. (1995). Model-based techniques for nosie robust speech recognition, Ph. D. dissertation, University of Cambridge.
  9. A. S. Manos & V. W. Zue. (1996). A study on out-of-vocabulary word modeling for a segment-based keyword spotting system. Master Thesis, MIT.
  10. T. Jitsuhiro, S. Takatoshi & K. Aikawa. (1998). Rejection of out-of-vocabulary words using phoneme confidence likelihood. In Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing, 217-220.
  11. K. Chung & S. Y. Oh. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747-759. DOI : 10.1007/s11277-015-3169-5,
  12. S. Y. Oh & K. Chung. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893-899. DOI : 10.1007/s10586-013-0284-5
  13. S. Y. Oh & K. Chung. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439-2451. DOI : 10.1007/s11277-014-1752-9
  14. J. C. Kim & K. Chung. (2018). Mining health-risk factors using PHR similarity in a hybrid P2P network. Peer-to-Peer Networking and Applications, 11(6), 1278-1287. DOI : 10.1007/s12083-018-