Unsupervised Semantic Role Labeling for Korean Adverbial Case

비지도 학습을 기반으로 한 한국어 부사격의 의미역 결정

  • 김병수 (포항공과대학교 정보처리학과) ;
  • 이용훈 (포항공과대학교 컴퓨터공학과) ;
  • 이종혁 (포항공과대학교 컴퓨터공학과)
  • Published : 2007.02.15

Abstract

Training a statistical model for semantic role labeling requires a large amount of manually tagged corpus. However. such corpus does not exist for Korean and constructing one from scratch is a very long and tedious job. This paper suggests a modified algorithm of self-training, an unsupervised algorithm, which trains a semantic role labeling model from any raw corpora. For initial training, a small tagged corpus is automatically constructed iron case frames in Sejong Electronic Dictionary. Using the corpus, a probabilistic model is trained incrementally, which achieves 83.00% of accuracy in 4 selected adverbial cases.

말뭉치를 이용하여 통계적으로 의미역 결정(semantic role labeling)을 하기 위해서는, 의미역을 태깅하는 작업이 필수적이다. 그러나 한국어의 경우 의미역이 태깅된 대량의 말뭉치를 구하기 힘들며, 이를 직접 구축하기 위해서는 많은 시간과 노력이 필요한 문제점이 있다. 본 논문에서는 비지도 학습의 하나인 self-training 알고리즘을 적용하여, 의미역이 태깅되지 않은 말뭉치로부터 의미역을 결정하는 방법을 제안한다. 이를 위해, 세종 용언 전자사전의 격틀 정보를 이용하여 자동으로 학습 말뭉치를 구축하였으며, 확률 모델을 적용하여 점진적으로 학습하였다. 그 결과, 4개의 부사격 조사에 대해 평균적으로 83.00%의 정확률을 보였다.

Keywords

References

  1. Kurohashi, S, and Nagao, M. 'A Method of Case Structure Analysis for Japanese Based on Examples in Case Frame Dictionary,' IEICE Transaction Information and System, Vol.E77-D, No.2, pp. 227-239, 1994
  2. Stephen Beale, Serei Nirenburg, and Kavi Mahesh, 'Semantic Analysis in The Mikrokosmos Machine Translation,' In Proceeding of Symposium on NLP, 1995
  3. Aria Haghighi, Kristina Toutanova, and Christopher Manning, 'A Joint Model for Semantic Role Labeling,' In Proceedings of CoNLL 2005 Shared Task, 2005
  4. Daniel Gildea and Daniel Jurafsky, 'Automatic Labeling of Semantic Roles,' In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, 2000
  5. Daniel Gildea and Daniel Jurafsky, 'Automatic Labeling of Semantic Roles,' Computational Linguistics, Vol.28, No.3, pp. 245-288, 2002 https://doi.org/10.1162/089120102760275983
  6. Daniel Gildea and Martha Palmer, 'The Necessity of Parsing for Predicate Argument Recognition,' In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 239-246, 2002
  7. Kadri Hacioglu, Sampeer Pradhan, Wayne Ward, James H. Martin, and Daniel Jurafsky, 'Semantic role labeling by tagging syntactic chunks,' In Proceedings of CoNLL 2004 Shared Task, 2004
  8. Nianwen Xue and Martha Palmer, 'Calibrating Features for Semantic Role Labeling,' In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004
  9. Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Martin, and Daniel Jurafsky, 'Semantic Role Chunking Combining Complementary Syntactic Views,' In Proceedings of CoNLL 2005 Shared Task, 2005
  10. S.B. Park, 'Decision Tree Based Disambiguation of Semantic Roles for Korean Adverbial Postposition,' IEICE Transaction Information and System, Vol.E86-D, No.8, 2003
  11. Vasin Punyakanok, Peter Koomen, Dan Roth, and Wentau Yih, 'Generalized Inference with Multiple Semantic Role Labeling Systems,' In Proceedings of CoNLL 2005 Shared Task, 2005
  12. Jung-Hye Park, 'Determination of Thematic Roles according to Syntactic Relations Using Rules and Statistical Models,' MS Thesis, Pohang University of Science and Technology, 2002
  13. Myung-Chul Shin, 'Integration of Case-Frame Dictionary into Machine Learning Techniques for Semantic Role Assignment of Korean Adverbial Cases,' MS Thesis, Pohang University of Science and Technology, 2006
  14. Robert S. Swier and Suzanne Stevenson, 'Unsupervised Semantic Role Labelling,' In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 95-102, 2004
  15. Robert S. Swier and Suzanne Stevenson, 'Exploiting a Verb Lexicon in Automatic Semantic Role Labelling,' HLT/EMNLP, 2005
  16. Xavier Carreras et al, 'Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling,' In Proceeding of CoNLL-2005, 2005
  17. 이성헌, '전자사전 구축과 의미부류 - 세종 명사 의미부류 체계의 예', 한국사전학, 2005
  18. 이성헌, '전자사전에서의 기능동사 구문 처리문제 - 세종 체언사전의 경우', 한국사전학, 2004
  19. Emmanuel Blanchard, et al. 'A typology of ontology-based semantic measures,' EMOI - INTEROP, 2005
  20. Rada Mihalcea, 'Co-training and Self-training for Word Sense Disambiguation,' In Proceedings of CoNLL 2004, pp. 33-40, 2004
  21. Rayid Ghani and Rosie Jones, 'A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms For Training Information Extraction Systems,' Proceedings of the LREC 2002 Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, 2002
  22. Stephen Clark, James R Curran, and Miles Osborne, 'Bootstrapping POS tagger using Unlabelled Data,' In Proceedings of CoNLL 2003, pp. 49-55, 2003
  23. Avrim Blum and Tom Mitchell, 'Combining Labeled and Unlabeled Data with Co-training,' In Proceedings of the Workshop on Computational Learning Theory, pp. 92-100, 1998
  24. Kaml Nigam and Rayid Ghani, 'Analyzing the Effectiveness and Applicability of Co-training,' In CIKM, pp. 86-93, 2000
  25. 이희자, 이종희, 한국어 학습용 어미․조사사전, 한국문화사, 2001
  26. 홍재성 외, 21세기 세종계획 전자사전 개발 연구보고서, 국립국어원, pp. 62-66, 2005
  27. Rosie Jones et al, 'Bootstrapping for Text Learning Tasks,' In IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, pp. 52-63, 1999
  28. Scott Yih and Kristina Toutanova, 2006. 'Automatic Semantic Role Labeling,' HLT-NAACL 2006 tutorial
  29. Steven Abney, 'Bootstrapping,' In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 360-357, 2002
  30. Xavier Carreras et al, 'Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling,' In Proceeding of CoNLL-2004, 2004