DOI QR코드

DOI QR Code

Lightweight Named Entity Extraction for Korean Short Message Service Text

  • Seon, Choong-Nyoung (Department of Computer Science and Engineering, Sogang University) ;
  • Yoo, Jin-Hwan (Department of Computer Science and Engineering, Sogang University) ;
  • Kim, Hark-Soo (Department of Computer and Communications Engineering, Kangwon National University) ;
  • Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University) ;
  • Seo, Jung-Yun (Department of Computer Science and Interdisciplinary Program of Integrated Biotechnology, Sogang University)
  • Received : 2010.09.13
  • Accepted : 2011.02.08
  • Published : 2011.03.31

Abstract

In this paper, we propose a hybrid method of Machine Learning (ML) algorithm and a rule-based algorithm to implement a lightweight Named Entity (NE) extraction system for Korean SMS text. NE extraction from Korean SMS text is a challenging theme due to the resource limitation on a mobile phone, corruptions in input text, need for extension to include personal information stored in a mobile phone, and sparsity of training data. The proposed hybrid method retaining the advantages of statistical ML and rule-based algorithms provides fully-automated procedures for the combination of ML approaches and their correction rules using a threshold-based soft decision function. The proposed method is applied to Korean SMS texts to extract person's names as well as location names which are key information in personal appointment management system. Our proposed system achieved 80.53% in F-measure in this domain, superior to those of the conventional ML approaches.

Keywords

References

  1. R. Grishman and B. Sundheim, "Message understanding conference - 6: A brief history," in Proc. of the 16th Conference on Computational Linguistics, pp.466-471, 1996.
  2. M. Przybocki, J. Fiscus, J. Garofolo and D. Pallet, "HUB-4 information extraction evaluation," in Proc. of the DARPA Broadcast News Workshop, pp.13-18, 1999.
  3. T. Sang and F. Erik, "Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition," in Proc. of the 6th Conference on Natural Language Learning, pp.1-4, 2002.
  4. T. Sang, F. Erik and F. De Meulder, "Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition," in Proc. of Conference on Natural Language Learning, pp.142-147, 2003.
  5. J. Lafferty, A. McCallum and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. of 18th International Conference on Machine Learning, pp.282-289, 2001.
  6. A. Berger, S. Pietra and V. Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, vo.22, no.1, pp.39-71, 1996.
  7. E. Brill, "Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging," Computational Linguistics, vo.21, no.4, pp.543-565, 1995.
  8. R. Tsai, C. Sung, H. Dai, H. Hung, T. Sung and W. Hsu, "NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition," BMC Bioinformatics 7 (Suppl. 5):S11, 2006.
  9. R. Feldman, B. Rosenfeld and M. Fresko, "TEG-a hybrid approach to information extraction," Knowledge and Information Systems, vo.9, no.1, pp. 1-18, 2006. https://doi.org/10.1007/s10115-005-0204-y
  10. The National Institute of the Korean Language, "Final Report on Achievements of 21st Sejong Project: Electronic Dictionary," the National Institute of the Korean Language, 2007.
  11. Z. Le, "Maximum Entropy Modeling Toolkit for Python and C++," Available from: http://homepages.inf.ed.ac.uk/lzhang 10/maxent_toolkit.html.
  12. T. Kudo, "CRF++: Yet Another CRF toolkit," Available at http://chasen.org/-taku/ software/CRF++/.
  13. G. Ngai and R. Florian, "Transformation-based learning in the fast lane," in Proc. of North American Chapter of the Association for Computational Linguistics on Language technologies, pp.40-47, 2001.

Cited by

  1. Low-Cost Implementation of a Named Entity Recognition System for Voice-Activated Human-Appliance Interfaces in a Smart Home vol.10, pp.2, 2011, https://doi.org/10.3390/su10020488