DOI QR코드

DOI QR Code

LSTM Language Model Based Korean Sentence Generation

LSTM 언어모델 기반 한국어 문장 생성

  • Kim, Yang-hoon (Automation and Systems Research Institute (ASRI), Department of Electrical and Computer Engineering, Seoul National University) ;
  • Hwang, Yong-keun (Department of Electrical and Computer Engineering, Seoul National University) ;
  • Kang, Tae-gwan (Department of Electrical and Computer Engineering, Seoul National University) ;
  • Jung, Kyo-min (Automation and Systems Research Institute (ASRI), Department of Electrical and Computer Engineering, Seoul National University)
  • Received : 2015.12.10
  • Accepted : 2016.05.23
  • Published : 2016.05.31

Abstract

The recurrent neural network (RNN) is a deep learning model which is suitable to sequential or length-variable data. The Long Short-Term Memory (LSTM) mitigates the vanishing gradient problem of RNNs so that LSTM can maintain the long-term dependency among the constituents of the given input sequence. In this paper, we propose a LSTM based language model which can predict following words of a given incomplete sentence to generate a complete sentence. To evaluate our method, we trained our model using multiple Korean corpora then generated the incomplete part of Korean sentences. The result shows that our language model was able to generate the fluent Korean sentences. We also show that the word based model generated better sentences compared to the other settings.

순환신경망은 순차적이거나 길이가 가변적인 데이터에 적합한 딥러닝 모델이다. LSTM은 순환신경망에서 나타나는 기울기 소멸문제를 해결함으로써 시퀀스 구성 요소간의 장기의존성을 유지 할 수 있다. 본 논문에서는 LSTM에 기반한 언어모델을 구성하여, 불완전한 한국어 문장이 입력으로 주어졌을 때 뒤 이어 나올 단어들을 예측하여 완전한 문장을 생성할 수 있는 방법을 제안한다. 제안된 방법을 평가하기 위해 여러 한국어 말뭉치를 이용하여 모델을 학습한 다음, 한국어 문장의 불완전한 부분을 생성하는 실험을 진행하였다. 실험 결과, 제시된 언어모델이 자연스러운 한국어 문장을 생성해 낼 수 있음을 확인하였다. 또한 문장 최소 단위를 어절로 설정한 모델이 다른 모델보다 문장 생성에서 더 우수한 결과를 보임을 밝혔다.

Keywords

References

  1. S. H. Gil and G. H. Kim, "Vision-based vehicle detection and tracking using online learning," J. KICS, vol. 39A, no. 1, pp. 1-11, 2014. https://doi.org/10.7840/kics.2014.39A.1.1
  2. J. H. Moon, et al., "Case study of big data-based agri-food recommendation system according to types of customers," J. KICS, vol. 40, no. 5, pp. 903-913, 2015. https://doi.org/10.7840/kics.2015.40.5.903
  3. O. Russakovskym, et al., "ImageNet large scale visual recognition challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, Dec. 2015. https://doi.org/10.1007/s11263-015-0816-y
  4. S. Kumar, et al., "Localization estimation using artificial intelligence technique in wireless sensor networks," J. KICS, vol. 39C, no. 9, pp. 820-827, 2014. https://doi.org/10.7840/kics.2014.39C.9.820
  5. Y. Bengio, et al., "Leanring long-term dependencies with gradient descent is difficult," IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157-166, 1994. https://doi.org/10.1109/72.279181
  6. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  7. H. Sak, et al., "Fast and accurate recurrent neural network acoustic models for speech recognition," in Proc. INTERSPEECH, Dresden, Germany, Sept. 2015.
  8. A. Graves, et al., "A novel connectionist system for unconstrained handwriting recognition," IEEE Trans. PAMI, vol. 31 no. 5, pp. 855-868, May 2009. https://doi.org/10.1109/TPAMI.2008.137
  9. A. Grushin, et al., "Robust human action recognition via long short-term memory," in IJCNN, pp. 1-8, Dallas, United States, Aug. 2013.
  10. S. Shin, et al., "Image classification with recurrent neural networks using input replication," J. KICS, vol. 2015, no. 06, pp. 868-869, 2015.
  11. P. Koehn, et al., "Moses: Open source toolkit for statistical machine translation," in ACL, pp. 177-180, Prague, Czech, Jun. 2007.
  12. T. Mikolov, et al., "Extensions of recurrent neural network language model," in ICASSP, pp. 5528-5531, Prague, Czech, May 2011.
  13. T. Mikolov, "Statistical language models based on neural network," Ph.D. Dissertation, Brno University of Technology, 2012.
  14. I. Sutskever, et al., "Generating text with recurrent neural networks," in ICML, Bellevue, United States, Jun. 2011.
  15. T. Mikolov, et al., Subword language modeling with neural networks, preprint(http://www.fit.vutbr.cz/imikolov/rnnlm/char.pdf), 2012.
  16. Wkipedia, Recurrent neural network, Retrieved 3rd, Dec, 2015, https://en.wikipedia.org/wiki/Recurrent_neural_network.
  17. M. C. Mozer, A focused backpropagation algorithm for temporal pattern recognition, L. Erlbaum Associates Inc., pp. 137-169, 1995.
  18. S. Kirkpatrick, et al., "Optimization by simulated annealing," Science, pp. 671-680, 1983.
  19. Korean Bible Society, Bible, Retrieved 7th, Dec, 2015, http://www.bskorea.or.kr.
  20. jungyeul, korean-parallel-corpora(2014), Retrieved 22th, Oct, 2015, https://github.com/jungyeul/korean-parallel-corpora/tree/master/korean-english-v1.
  21. Twitter, twitter-korean-text, Retrieved 10th, Nov. 2015, https://github.com/twitter/twitter-korean-text.
  22. F. Bastien, et al., "Theano: new features and speed improvements," in NIPS deep learning workshop, Lake Tahoe, United States, Dec. 2012.
  23. J. Bergstra, et al., "Theano: A CPU and GPU math expression compiler," in Proc. SciPy, 2010.
  24. S. Bird, E. Klein, and, E. Loper Natural Language Processing with Python, O'Reilly Media Inc., Jun. 2009.

Cited by

  1. 공간 태그된 트윗을 사용한 밀도 기반 관심지점 경계선 추정 vol.42, pp.2, 2016, https://doi.org/10.7840/kics.2017.42.2.453
  2. 한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성 vol.23, pp.2, 2017, https://doi.org/10.13088/jiis.2017.23.2.071
  3. CNN-LSTM Coupled Model for Prediction of Waterworks Operation Data vol.14, pp.6, 2016, https://doi.org/10.3745/jips.02.0104
  4. NTIS 시스템에서 딥러닝과 형태소 분석 기반의 대화형 검색 서비스 설계 및 구현 vol.10, pp.12, 2016, https://doi.org/10.22156/cs4smb.2020.10.12.009
  5. Anomaly Detection in Reservoir Water Level Data Using the LSTM Model Based on Deep Learning vol.21, pp.1, 2016, https://doi.org/10.9798/kosham.2021.21.1.71