DOI QR코드

DOI QR Code

Korean Coreference Resolution using the Multi-pass Sieve

Multi-pass Sieve를 이용한 한국어 상호참조해결

  • 박천음 (강원대학교 컴퓨터과학과) ;
  • 최경호 (강원대학교 컴퓨터과학과) ;
  • 이창기 (강원대학교 컴퓨터과학과)
  • Received : 2014.05.26
  • Accepted : 2014.09.03
  • Published : 2014.11.15

Abstract

Coreference resolution finds all expressions that refer to the same entity in a document. Coreference resolution is important for information extraction, document classification, document summary, and question answering system. In this paper, we adapt Stanford's Multi-pass sieve system, the one of the best model of rule based coreference resolution to Korean. In this paper, all noun phrases are considered to mentions. Also, unlike Stanford's Multi-pass sieve system, the dependency parse tree is used for mention extraction, a Korean acronym list is built 'dynamically'. In addition, we propose a method that calculates weights by applying transitive properties of centers of the centering theory when refer Korean pronoun. The experiments show that our system obtains MUC 59.0%, $B_3$ 59.5%, Ceafe 63.5%, and CoNLL(Mean) 60.7%.

상호참조해결은 문서 내에서 선행하는 명사구와 현재 등장한 명사구 간에 같은 개체를 의미하는 지를 결정하는 문제로 정보 추출, 문서분류 및 요약, 질의응답 등에 적용된다. 본 논문은 상호참조해결의 규칙기반 방법 중 가장 성능이 좋은 Stanford의 다 단계 시브(Multi-pass Sieve) 시스템을 한국어에 적용한다. 본 논문에서는 모든 명사구를 멘션(mention)으로 다루고 있으며, Stanford의 다 단계 시브 시스템과는 달리 멘션 추출을 위해 의존 구문 트리를 이용하고, 동적으로 한국어 약어 리스트를 구축한다. 또한 한국어 대명사를 참조하는데 있어 중심화 이론 중 중심의 전이적인 특성을 적용하여 가중치를 부여하는 방법을 제안한다. 실험 결과 F1 값은 MUC 59.0%, B3 59.5%, Ceafe 63.5%, CoNLL(평균) 60.7%의 성능을 보였다.

Keywords

Acknowledgement

Grant : 휴먼 지식증강서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발

Supported by : 한국산업기술평가관리원

References

  1. B. Baldwin, "CogNIAC: high precision coreference with limited knowledge and linguistic resources," Proc. of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts. Association for Computational Linguistics, pp. 38-45, 1997.
  2. H. Y. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, "Deterministic coreference resolution based on entity-centric, precision-ranked rules," Association for Computational Linguistics, 2013.
  3. A. Haghighi and D. Klein, "Coreference resolution in a modular, entity-centered model," Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 385-393, 2010.
  4. Z. Guodong and S. Jian, "A high-performance coreference resolution system using a constraint-based multi-agent strategy," Proc. of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, p. 522, 2004.
  5. S. S. Kang, B. H. Yun, and C. W. Woo, "Antecedent Decision Rules of Personal Pronouns for Coreference Resolution," Conference of the KIPS B, 11.2: pp. 227-232, 2004.
  6. Y. H. Ahn, S. S. Kang, C. W. Woo, and B. H. Yun, "Coreference Resolution of Pronouns by Heuristic Rules," Proc. of the KIISE, 28.2 II: pp. 193-195, 2001.
  7. M. R. Choi, C. K. Lee, J. H. Wang, and M. G. Jang, "Reference Resolution for Ontology Population," Proc. of the KIISE for HCLT (2007), pp. 140-144, 2007.
  8. M. Stede, "Discourse processing," Synthesis Lectures on Human Language Technologies, 4.3: 1-165, 2011.
  9. C. K. Lee, Y. G. Hwang, H. J. Oh, S. J. Lim, Jeong Heo, C. H. Lee, H. J. Kim, J. H. Wang, and M. G. Jang, "Fine-grained named entity recognition using conditional random fields for question answering," Information Retrieval Technology. Springer Berlin Heidelberg, pp. 581-587, 2006.
  10. C. K. Lee, Y. G. Hwang, and M. G. Jang, "Finegrained named entity recognition and relation extraction for question answering," Proc. of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 799-800, 2007.
  11. C. K. Lee, P. M. Ryu, and H. K. Kim, "Named entity recognition using a modified Pegasos algorithm," Proc. of the 20th ACM international conference on Information and knowledge management. ACM, pp. 2337-2340, 2011.
  12. Y. C. Yoon, Y. I. Song, J. Y. Lee, and H. C. Kim, "Construction of Korean acronym dictionary by considering ways of making acronym from definition," Proc. of The KSCS 2006, 2006.
  13. B. J. Grosz, S. Weinstein, and Aravind K. Joshi, "Centering: A framework for modeling the local coherence of discourse," Computational linguistics 21.2, pp. 203-225, 1995.
  14. C. E. Park, G. H. Choi, H. G. Lee, and C. K. Lee, "Semi-automatic Tagging Tool for Korean Coreference Resolution using the Multi-pass Sieve," Proc. of KIISE and KBS Joint Symposium, pp. 61-64, 2014.
  15. L. Marquez, M. Recasens, and E. Sapena, (2010). [Online]. Available: http://stel.ub.edu/semeval2010-coref/ (downloaded 2013, 12).
  16. M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, "A model-theoretic coreference scoring scheme," Proc. of the 6th conference on Message understanding, Association for Computational Linguistics, pp. 45-52, 1995.
  17. A. Bagga and B. Baldwin, "Algorithms for scoring coreference chains," The first international conference on language resources and evaluation workshop on linguistics coreference. Vol. 1, pp. 563-566, 1998.
  18. X. Luo, "On coreference resolution performance metrics," Proc. of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 25-32, 2005.
  19. S. Pradhan, L. Ramshaw, M. Marcus, M. Palmer, R. Weischedel, and N. Xue, "Conll-2011 shared task: Modeling unrestricted coreference in ontonotes," Proc. of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, pp. 1-27, 2011.

Cited by

  1. Coreference Resolution for Korean using Mention Pair with SVM vol.21, pp.4, 2015, https://doi.org/10.5626/KTCP.2015.21.4.333
  2. Korean Coreference Resolution with Guided Mention Pair Model using the Deep Learning vol.38, pp.6, 2016, https://doi.org/10.4218/etrij.16.0115.0896