DOI QR코드

DOI QR Code

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs

한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램

  • 박용민 (충북대학교 디지털정보융합학과) ;
  • 이재성 (충북대학교 소프트웨어학과)
  • Received : 2014.03.17
  • Accepted : 2014.05.27
  • Published : 2014.07.31

Abstract

A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

개체명 인식은 정보검색 시스템, 질의응답 시스템, 기계번역 시스템 등의 성능을 향상시키기 위하여 사용된다. 개체명 인식은 일반적으로 PLOs(인명, 지명, 기관명)을 대상으로 하며, 주로 미등록어와 고유명사로 이루어져 있기 때문에 고유명사나 미등록어는 중요한 개체명 후보로 쓰일 수 있다. 하지만 도서명, 영화명, 음악명, TV프로그램명과 같은 제목 개체명은 PLO와는 달리 단어부터 문장까지 매우 다양한 형태를 지니고 있어서 개체명 인식이 쉽지 않다. 본 논문에서는 뉴스 기사문을 이용하여 제목 개체명을 빠르게 인식하고 자동으로 사전을 구축하는 방법을 제안한다. 먼저 특수기호로 묶인 어절을 추출하고, 주변 문맥 단어 및 단어 거리를 이용하여 SVM으로 제목 후보들을 추출하였다. 이렇게 추출된 제목 후보들은 상호 정보량을 가중치로 SVM을 이용해 제목 유형을 분류하였다.

Keywords

References

  1. Seong-Won Kim, Dong-Yul Ra, "Korean Named Entity Recognition Using Two-level Maximum Entropy Model,"Proc. of the KIISE Symosium, Vol.2, No.1, pp.81-86, 2008.
  2. Changki Lee, Myung-Gil Jang, "Named Entity Recognition with Structural SVMs and Pegasos algorithm," Proc. of KSCS Congnitive Science, Vol.21, No.4, pp.655-667, 2010. https://doi.org/10.19066/cogsci.2010.21.4.009
  3. Joo-Young Lee, Young-In Song, Hae-Chang Rim, "Title Named Entity Recognition based on Automatically Constructed Context Patterns and Entity Dictionary," Proc. of the KIISE Conference, The 16th Annual Conference on Human & Cognitive Language Technology, pp.40-45, 2004.
  4. Black, W., F. Rinaldi and D. Mowatt, "Facile: Description Of The Ne System Used For Muc-7," in Proceedings of the 7th Message Understanding Conference, 1998.
  5. Chen H., Y. Ding, S. Tsai and G. Bian, "Description of the NTU System Used for MET2," in Proceedings of 7th Message Understanding Conference, 1998.
  6. Aberdeen, J., J. D. Burger, D. S. Day, L. Hirschman, P. Robinson and M. B. Vilain, "MITRE : Description Of The Alembic System Used For MUC-6," in Proceedings of 6th Message Understanding Conference, pp.141-155, 1995.
  7. Borthwick, A., J. Sterling, E. Agichtein and R. Grishman, "NYU : Description of the MENE Named Entity System as Used in MUC-7," in Proceedings of 7th Message Understanding Conference, 1998.
  8. Merchant, R. and M. E. Okurowski, "The multilingual entity task (MET) overview," in Proceeding TIPSTER'96 Proceedings of a workshop on held at Vienna, pp.445-447, 1996.
  9. Sekine, S. and Y. Eriguchi, "Japanese named entity extraction evaluation : analysis of results," in Proceeding COLING'00 Proceedings of the 18th conference on Computational linguistics - Vol.2, pp.1106-1110, 2000.
  10. Kyung Hee Lee, Ju Ho Lee, Myung Seok Choi, Gil Chang Kim, "Study on Named Entity Recognition in Korean Text," Proc. of the KIISE Conference, The 12th Annual Conference on Human & Cognitive Language Technology, pp.292-299, 2000.
  11. Yi-Gyu Hwang, Hyun-Sook Lee, Eui-Sok Chung, Bo-Hyun Yun, Sang-Kyu Park, "Korean Named Entity Recognition Based on Supervised Learning Using Named Entity Construction Principles," Proc. of the KIISE Conference, The 14th Annual Conference on Human & Cognitive Language Technology, pp.111-117, 2002.
  12. Hae-Suk Jang, Kyu-Cheol Jung, Jin Kwan Lee, Kihong Park, "Recognition of Korean Place Names on the Internet by Using the Rules of Dictionary Use," Proc. of the KSII Fall Conference, Vol.6, No.1, pp.397-400, 2005.
  13. Yi-Gyu Hwang, Bo-Hyun Yun, "HMM-based Korean Named Entity Recognition," Proc. of the KIPS Transaction Vol.10(B), No.2, pp.229-236, 2003. https://doi.org/10.3745/KIPSTB.2003.10B.2.229
  14. Changki Lee, Yi-Gyu Hwang, Hyo-Jung Oh, Soojung Lim, Jeong Heo, Chung-Hee Lee, Hyeon-Jin Kim, Ji-Hyun Wang, Myung-Gil Jang, "Fine-Grained Named Entity Recognition using Conditional Random Fields for Question Answering," Proc. of the KIISE Conference, The 18th Annual Conference on Human & Cognitive Language Technology, pp.268-272, 2006.
  15. Lai, A., "Movie Title Recognition in E-Mail," Stanford University Natural Language Processing, CS224N Final Project, 2009.
  16. Young-Min Park, Sang-woo Kang, Byoung-Kyu Yoo, Jung-Yun Seo, "Title Named Entity Recognition using Wikipedia and Making Acronym," Proc. of the KIISE Korea Computer Congress, pp.637-639, 2013.
  17. Vapnik, V. N., The nature of statistical learning theory, Springer, 1995.
  18. Dumais, S., J. Platt and D. Heckerman, "Inductive Learning Algorithms and Representations for Text Categorization," in Proceeding of ACM-CIKM '98, pp.148-155, 1998.
  19. Crammer, K., Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines," Journal of Machine Learning Research 2, pp.265-292, 2001.
  20. Peng H., F. Long and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max- Dependency, Max- Relevance, and Min-Redundancy," Pattern Analysis and Machine Intelligence, IEEE Transactions on Vol.27, Issue 8, pp.1226-1238, 2005. https://doi.org/10.1109/TPAMI.2005.159

Cited by

  1. A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources vol.13, pp.5, 2015, https://doi.org/10.14400/JDC.2015.13.5.187