DOI QR코드

DOI QR Code

Syllable-based Probabilistic Models for Korean Morphological Analysis

한국어 형태소 분석을 위한 음절 단위 확률 모델

  • Received : 2014.02.14
  • Accepted : 2014.06.27
  • Published : 2014.09.15

Abstract

This paper proposes three probabilistic models for syllable-based Korean morphological analysis, and presents the performance of proposed probabilistic models. Probabilities for the models are acquired from POS-tagged corpus. The result of 10-fold cross-validation experiments shows that 98.3% answer inclusion rate is achieved when trained with Sejong POS-tagged corpus of 10 million eojeols. In our models, POS tags are assigned to each syllable before spelling recovery and morpheme generation, which enables more efficient morphological analysis than the previous probabilistic models where spelling recovery is performed at the first stage. This efficiency gains the speed-up of morphological analysis. Experiments show that morphological analysis is performed at the rate of 147K eojeols per second, which is almost 174 times faster than the previous probabilistic models for Korean morphology.

본 논문에서는 음절 단위의 한국어 형태소 분석 방법에 적용할 수 있는 세 가지 확률 모델을 제안하고, 품사 태깅 말뭉치를 이용하여 각 확률 모델의 성능을 평가한다. 성능 평가를 위해 1,000만 어절 규모의 세종 말뭉치를 10 개의 세트로 나누고 10 배수 교차 검증 결과 98.4%의 정답 제시율을 얻을 수 있었다. 제안된 확률 모델은 각 음절에 대하여 품사 태그를 먼저 부착한 후 원형 복원 및 형태소 생성을 하기 때문에 원형 복원을 먼저 하는 기존 확률 모델에 비하여 탐색 공간이 크게 줄어들어 형태소 분석 과정이 훨씬 간결하고 효율적이어서 분석 속도가 기존의 초당 수 백 어절에서 14만 7천 어절로 약 174배 가량 향상시킬 수 있었다.

Keywords

Acknowledgement

Supported by : 성신여자대학교

References

  1. Jae Sung Lee, "Three-Step Probabilistic Model for Korean Morphological Analysis," Journal of KIISE : Software and Applications, Vol. 38, No. 5, pp. 257-268, 2011. (in Korean)
  2. Seung Hyun Yang and Young-Sum Kim, "A High-Speed Korean Morphological Analysis Method based on Pre-Analyzed Partial Words," Journal of KIISE : Software and Applications, Vol. 27, No. 3, pp. 290-301, 2000. (in Korean)
  3. Kwangseob Shim and Jaehyung Yang, "MACH : A Supersonic Korean Morphological Analyzer," Proceedings of the 19th International Conference on Computational Linguistics, pp. 939-945, 2002.
  4. Kwangseob Shim and Jaehyung Yang, "High Speed Korean Morphological Analysis based on Adjacency Condition Check," Journal of KIISE : Software and Applications, Vol. 31, No. 1, pp. 89-99, 2004. (in Korean)
  5. Chung-Hye Han and Martha Palmer, "A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application," Machine Translation, Vol. 18, pp. 275-297, 2005.
  6. Do-Gil Lee and Hae-Chang Rim, "Probabilistic Modeling of Korean Morphology," IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 5, pp. 945-955, 2009. https://doi.org/10.1109/TASL.2009.2019922
  7. Kwang-Mo Ahn, Kyou-Youl Han, Young-Hoon Seo, "Korean Part-of-Speech Tagging using Disambiguation Rules for Ambiguous Word and Statistical Information," Journal of the Korea Contents Association, Vol. 9, No. 2, pp. 18-26, 2009. (in Korean) https://doi.org/10.5392/JKCA.2009.9.2.018
  8. Min-Hee Cho, Myoung-Sun Kim, Jae-Han Park, Eui-Kyu Park, Dong-Yul Ra, "Techniques for improving performance of POS tagger based on Maximum Entropy Model," Proc. of 16th Hangul and Korean Information Processing Conference, pp. 73-81, 2004. (in Korean)
  9. Kwangseob Shim, "Morpheme Restoration for Syllable-based Korean POS Tagging," Journal of KIISE : Software and Applications, Vol. 40, No. 3, pp. 182-189, 2013. (in Korean)
  10. Kwangseob Shim, "Syllable-based Korean Morphological Analysis using n-grams extracted from POS Tagged Corpus," Journal of KIISE : Software and Applications, Vol. 40, No. 12, pp. 722-729, 2013. (in Korean)
  11. The National Institute of the Korean Language, 21st Century Sejong Project Final Result, 2011.12 Revised Edition, 2011. (in Korean)
  12. D. Lee, B. Kim and J.S. Lee, "Language Model Smoothing for Korean Morpheme Recovery," Proceedings of KIISE, Vol. 39, No. 1B, pp. 309-311, 2012. (in Korean)

Cited by

  1. Error Correction in Korean Morpheme Recovery using Deep Learning vol.42, pp.11, 2015, https://doi.org/10.5626/JOK.2015.42.11.1452
  2. Cloning of Korean Morphological Analyzers using Pre-analyzed Eojeol Dictionary and Syllable-based Probabilistic Model vol.22, pp.3, 2016, https://doi.org/10.5626/KTCP.2016.22.3.119