DOI QR코드

DOI QR Code

Pattern and Instance Generation for Self-knowledge Learning in Korean

한국어 자가 지식 학습을 위한 패턴 및 인스턴스 생성

  • Received : 2014.01.18
  • Accepted : 2015.02.12
  • Published : 2015.02.25

Abstract

There are various researches which proposed an automatic instance generation from freetext on the web. Existing researches that focused on English, adopts pattern representation which is generated by simple rules and regular expression. These simple patterns achieves high performance, but it is not suitable in Korean due to differences of characteristics between Korean and English. Thus, this paper proposes a novel method for generating patterns and instances which focuses on Korean. A proposed method generates high quality patterns by taking advantages of dependency relations in a target sentences. In addition, a proposed method overcome restrictions from high degree of freedom of word order in Korean by utilizing postposition and it identifies a subject and an object more reliably. In experiment results, a proposed method shows higher precision than baseline and it is implies that proposed approache is suitable for self-knowledge learning system.

웹의 비구조 문서로부터 자동으로 인스턴스를 생성하기 위한 다양한 연구가 제안되었다. 영어권의 기존 연구들에서는 간단한 규칙과 정규식 기반의 패턴을 활용하였다. 영어에서는 단순한 정규식 기반의 패턴만으로도 충분히 높은 정확도를 보여주었지만. 한국어는 영어와 다른 언어적인 특성으로 인하여 기존의 정규식 형태의 패턴으로는 적합한 패턴을 생성할 수 없다. 이에 본 논문에서는 한국어에 적합한 패턴 및 인스턴스 생성 방법을 제안한다. 제안한 방법은 대상 문장의 의존 관계를 고려함으로써 높은 정확도를 가지는 패턴 집합을 생성한다. 또한 인스턴스의 주어(subject)와 목적어(object) 판별을 위하여 조사 정보를 함께 활용함으로써 한국어의 자유로운 어순으로부터 오는 제약을 해결한다. 실험 결과에 따르면 본 논문에서 제안한 패턴 생성 방법이 단순 어순만을 고려하여 생성된 패턴들에 비하여 더 높은 정확률을 보여주어, 한국어 대상 자동 인스턴스 생성에 적합함을 확인하였다.

Keywords

References

  1. Juana Maria Ruiz-Martinez, Jose Antonio Minarro-Gimenez Dagoberto Castellanos-Nieves, Francisco Garcia-Sanchez and Rafael Valencia-Garcia, "ONTOLOGY POPULATION: AN APPLICATION FOR THE E-TOURISM DOMAIN," International Journal of Innovative Computing, Information and Control, Vol. 7, No. 11, pp. 6115-6133, 2011.
  2. David Celjuska, and Dr. Maria Vargas-Vera, "Ontosophie: A Semi-Automatic System for Ontology Population from Text," Proceedings of WOP2009 collocated with ISWC2009, Vol. 516, 2009.
  3. Stephen Soderland, David Fisher, Jonathan Aseltine, and Wendy G. Lehnert, "CRYSTAL: Inducing a Conceptual Dictionary," Journal of CoRR, Vol. cmp-lg/9505020, 1995.
  4. Carla Fariaa, Ivo Serrab, Rosario Girardib, "A domain-independent process for automatic ontology population from text," Journal of Science of Computer Programming, 2013.
  5. Marti A. Hearst, "Automatic Acquisition of Hyponyms Large Text Corpora," Proceedings of Conference on Computational Linguistics, 1992.
  6. Patrick Pantel, and Marco Pennacchiotti, "Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations," Proceedings of Conference on Computational Linguistics, pp. 113-120, 2006.
  7. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr.2, and Tom M. Mitchell, "Toward an Architecture for Never-Ending Language Learning," Proceedings of the Association for the Advancement of Artificial Intelligence, 2010.
  8. Moon-Soo Chang, and Sun-Mee Kang, "An Extraction of Property of Ontology Instance Using Stratification of Domain Knowledge," Journal of Korean Institute of Intelligent Systems, Vol. 17, No. 3, pp. 291-296, 2007. https://doi.org/10.5391/JKIIS.2007.17.3.291

Cited by

  1. A Big Data Preprocessing using Statistical Text Mining vol.25, pp.5, 2015, https://doi.org/10.5391/JKIIS.2015.25.5.470