A Study on Speech Recognition using Recurrent Neural Networks

회귀신경망을 이용한 음성인식에 관한 연구

  • Published : 1999.04.01

Abstract

In this paper, we investigates a reliable model of the Predictive Recurrent Neural Network for the speech recognition. Predictive Neural Networks are modeled by syllable units. For the given input syllable, then a model which gives the minimum prediction error is taken as the recognition result. The Predictive Neural Network which has the structure of recurrent network was composed to give the dynamic feature of the speech pattern into the network. We have compared with the recognition ability of the Recurrent Network proposed by Elman and Jordan. ETRI's SAMDORI has been used for the speech DB. In order to find a reliable model of neural networks, the changes of two recognition rates were compared one another in conditions of: (1) changing prediction order and the number of hidden units: and (2) accumulating previous values with self-loop coefficient in its context. The result shows that the optimum prediction order, the number of hidden units, and self-loop coefficient have differently responded according to the structure of neural network used. However, in general, the Jordan's recurrent network shows relatively higher recognition rate than Elman's. The effects of recognition rate on the self-loop coefficient were variable according to the structures of neural network and their values.

본 논문은 회귀신경망을 이용한 음성인식에 관한 연구이다. 예측형 신경망으로 음절단위로 모델링한 후 미지의 입력음성에 대하여 예측오차가 최소가 되는 모델을 인식결과로 한다. 이를 위해서 예측형으로 구성된 신경망에 음성의 시변성을 신경망 내부에 흡수시키기 위해서 회귀구조의 동적인 신경망인 회귀예측신경망을 구성하고 Elman과 Jordan이 제안한 회귀구조에 따라 인식성능을 서로 비교하였다. 음성DB는 ETRI의 샘돌이 음성 데이터를 사용하였다. 그리고, 신경망의 최적모델을 구하기 위하여 예측차수와 은닉층 유니트 수의 변화에 따른 인식률의 변화와 문맥층에서 자기회귀계수를 두어 이전의 값들이 문맥층에서 누적되도록 하였을 경우에 대한 인식률의 변화를 비교하였다. 실험결과, 최적의 예측차수, 은닉층 유니트수, 자기회귀계수는 신경망의 구조에 따라 차이가 나타났으며, 전반적으로 Jordan망이 Elman망보다 인식률이 높았으며, 자기회귀계수에 대한 영향은 신경망의 구조와 계수값에 따라 불규칙하게 나타났다.

Keywords

References

  1. SPIE Intelligent Robots and Computer Vision XI: Biolocal, Neural Net, and 3-D Methods v.1826 Adaptive time-delay neural network for temporal correlation and prediction D. T. Lin;J. E. Dayhoff;P. A. Ligomenides
  2. Proceedings of the National Academy of Sciences USA v.81 Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons J. J. Hopfield
  3. Technical Report ICS-8604, Institute for Cognitive Science Serial Order: A parallel distributed processing approach M. I. Jordan
  4. Technical Report CRL-8801, Center for Reserch in Language Finding structure in time J. L. Elman
  5. Neural Computation v.1 A learning algorithm for continually running fully recurrent neural networks R. J. Williams;D. Zipser
  6. Proc. ICASSP'90 Speaker-Independent Word Recognition Using A Neural Prediction Model Ken-ichi Iso;Takao Watanabe
  7. Proc. ICASSP'91 Large vocabulary speech recognition using neural prediction model K. Iso;T. Watanabe
  8. 대한전자공학회 논문지 v.11;32 희귀신경예측모델을 이용한 음성인식 류재관;라경민;임재열;성광모;안수길
  9. ICASP v.1;2 The Recognition of Korean Syllables using Recurrent Prediction Neural Networks Joo-Sung Kim;Kwang-Suk Lee;Kang-In Hur
  10. ICASP v.1;2 The Recognition of Korean Syllables Using Neural Predictive HMM Soo Hoon Kim;Sang-Boum;Kang-In Hur
  11. 동아대학교 정보통신 연구소논문지 v.5 no.1 패턴구성에 따른 RPNN의 음성인식 성능비교 한학용;김주성;허강인