DOI QR코드

DOI QR Code

Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅

  • 이건일 (포항공과대학교 컴퓨터공학과) ;
  • 이의현 (포항공과대학교 컴퓨터공학과) ;
  • 이종혁 (포항공과대학교 컴퓨터공학과)
  • Received : 2016.09.09
  • Accepted : 2016.10.25
  • Published : 2017.01.15

Abstract

Traditional Korean morphological analysis and POS tagging methods usually consist of two steps: 1 Generat hypotheses of all possible combinations of morphemes for given input, 2 Perform POS tagging search optimal result. require additional resource dictionaries and step could error to the step. In this paper, we tried to solve this problem end-to-end fashion using sequence-to-sequence model convolutional features. Experiment results Sejong corpus sour approach achieved 97.15% F1-score on morpheme level, 95.33% and 60.62% precision on word and sentence level, respectively; s96.91% F1-score on morpheme level, 95.40% and 60.62% precision on word and sentence level, respectively.

기존의 전통적인 한국어 형태소 분석 및 품사 태깅 방법론은 먼저 형태소 후보들을 생성한 뒤 수많은 조합에서 최적의 확률을 가지는 품사 태깅 결과를 구하는 두 단계를 거치며 추가적으로 형태소의 접속 사전, 기분석 사전 및 원형복원 사전 등을 필요로 한다. 본 연구는 기존의 두 단계 방법론에서 벗어나 심층학습 모델의 일종인 sequence-to-sequence 모델을 이용하여 한국어 형태소 분석 및 품사 태깅을 추가 언어자원에 의존하지 않는 end-to-end 방식으로 접근하였다. 또한 형태소 분석 및 품사 태깅 과정은 어순변화가 일어나지 않는 특수한 시퀀스 변환과정이라는 점을 반영하여 음성인식분야에서 주로 사용되는 합성곱 자질을 이용하였다. 세종말뭉치에 대한 실험결과 합성곱 자질을 사용하지 않을 경우 97.15%의 형태소 단위 f1-score, 95.33%의 어절단위 정확도, 60.62%의 문장단위 정확도를 보여주었고, 합성곱 자질을 사용할 경우 96.91%의 형태소 단위 f1-score, 95.40%의 어절단위 정확도, 60.62%의 문장단위 정확도를 보여주었다.

Keywords

Acknowledgement

Grant : 지식증강형 실시간 동시통역 원천기술 개발

Supported by : 정보통신기술진흥센터

References

  1. Sutskever, I., Vinyals, O., & Le, Q. V., "Sequence to sequence learning with neural networks," advances in Neural Information Processing Systems, pp. 3104-3112, 2014.
  2. Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," Empirical Methods on Natural Language Processing, pp. 1724-1734, 2014.
  3. Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103-111, 2014.
  4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," Intenational Conference on Learning Representations, 2015.
  5. Chung, J., Cho, K., & Bengio, Y., "A Character-level Decoder without Explicit Segmentation for Neural Machine Translation," arXiv preprint arXiv:1603.06147, 2015.
  6. Vinyals, O., Kaiser, Ł., Koo, T., Petrov, S., Sutskever, I., & Hinton, G., "Grammar as a foreign language," advances in Neural Information Processing Systems, pp. 2755-2763, 2015.
  7. F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde- Farley and Y. Bengio, "Theano: new features and speed improvements," NIPS 2012 deep learning workshop, 2012.
  8. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley and Y. Bengio, "Theano: A CPU and GPU Math Expression Compiler," SCIPY, 2010.
  9. https://github.com/nyu-dl/dl4mt-cdec
  10. Seung-Hoon Na, Sangkeun Jung, "Deep Learning for Korean POS Tagging," Proc. of the 41st KIISE Conference, pp. 426-428, 2014. (in Korean)
  11. Seung-Hoon Na, Young-Kil Kim, "Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging," Proc. of the 41st KIISE Conference, pp. 571-573, 2014. (in Korean)
  12. Changki Lee, "Joint Models for Korean Word Spacing and POS Tagging using Structural SVM," Journal of KISS : Software and Applications, Vol. 40, No. 12, pp. 826-832, 2013. (in Korean)
  13. Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y., "Attention-based models for speech recognition," NIPS, 2015.
  14. Bahdanau, D., Chorowski, J., Serdyuk, D., & Bengio, Y., "End-to-end attention-based large vocabulary speech recognition," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945-4949, 2016.