DOI QR코드

DOI QR Code

Competition Relation Extraction based on Combining Machine Learning and Filtering

기계학습 및 필터링 방법을 결합한 경쟁관계 인식

  • 이충희 (한국전자통신연구원 지식마이닝연구실) ;
  • 서영훈 (충북대학교 컴퓨터공학과) ;
  • 김현기 (한국전자통신연구원 지식마이닝연구실)
  • Received : 2014.10.16
  • Accepted : 2015.01.20
  • Published : 2015.03.15

Abstract

This study was directed at the design of a hybrid algorithm for competition relation extraction. Previous works on relation extraction have relied on various lexical and deep parsing indicators and mostly utilize only the machine learning method. We present a new algorithm integrating machine learning with various filtering methods. Some simple but useful features for competition relation extraction are also introduced, and an optimum feature set is proposed. The goal of this paper was to increase the precision of competition relation extraction by combining supervised learning with various filtering methods. Filtering methods were employed for classifying compete relation occurrence, using distance restriction for the filtering of feature pairs, and classifying whether or not the candidate entity pair is spam. For evaluation, a test set consisting of 2,565 sentences was examined. The proposed method was compared with the rule-based method and general relation extraction method. As a result, the rule-based method achieved positive precision of 0.812 and accuracy of 0.568, while the general relation extraction method achieved 0.612 and 0.563, respectively. The proposed system obtained positive precision of 0.922 and accuracy of 0.713. These results demonstrate that the developed method is effective for competition relation extraction.

본 논문은 기계학습 방법과 필터링 방법을 결합해서 경쟁관계를 인식하는 방법에 대한 연구이다. 기존 연구들은 기계학습 방법에만 의존해서 관계유형을 인식하는 연구들이 대부분이며. 사용되는 자질도 일반적인 관계유형에 적합한 자질을 사용하고 특히 구문분석 정보가 매우 중요한 자질로 사용된다. 본 논문에서는 구문분석 등의 언어분석 결과를 이용하지 않고, 단순한 자질들(어휘, 거리, 위치, 단서단어)만을 사용해도 경쟁관계 인식에 효과적임을 확인하였다. 또한, 경쟁관계인식 긍정 정확도를 향상시킬 수 있는 문장별 경쟁유무 분류방법, 스팸분류 방법, 거리제약 기반 자질필터링 방법을 기계학습 방법과 결합한 방법론을 제안한다. 방법론 검증을 위해서 뉴스분야 2,565개 문장을 평가셋으로 구축하였고, 비교 평가를 위해서 규칙기반 경쟁관계 인식기와 기존연구의 관계추출 방법론에 기반한 일반 관계추출기를 적용해서 비교하였다. 성능평가 결과로 규칙기반 엔진이 긍정정확도와 전체정확도(accuracy)가 81.2%와 56.8% 성능을 보였고, 일반 관계추출기는 61.2%와 56.3%를 보였다. 그에 비해서 본 논문에서 제안하는 방법은 긍정 정확도 92.2%와 전체정확도 71.3% 성능을 보여서 경쟁관계 인식에 효과적임을 확인하였다.

Keywords

Acknowledgement

Grant : (1세부) 휴먼 지식증강 서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. N. Bach, and S. Badaskar, "A review of relation extraction," Literature Review for Language and Statistics II, Carnegie Mellon University, 2007.
  2. Y. Choi, P.M. Ryu, H. Kim, and C.K. Lee, "Extracting Events from Web Documents for Social Media Monitoring using Structured SVM," IEICE TRANSACTIONS on Information and Systems, Vol. 96, No. 6, pp. 1410-1414, 2013.
  3. C. Aone, L. Halverson, T. Hampton, and M. Ramos-Santacruz, "SRA: Description of the IE2 System Used for MUC-7," Proc. of the Seventh Message Understanding Conference (MUC-7), 1998.
  4. S. Park, K.S. Lee, and J. Song, "Contrasting Opposing Views of News Articles on Contentious Issues," Proc. of the 49th Annual Meeting of the Association for Computational Linguistics (ACL'11), pp. 340-349, 2011.
  5. S. Shalev-Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. of the 24th international conference on Machine learning(ICML'07), pp. 807-814, 2007.
  6. C.K. Lee and M.G. Jang, "Named Entity Recognition with Structural SVMs and Pegasos algorithm," Journal of Cognitive Science, Vol. 21, No. 4, pp. 655-667, 2010. (in Korean) https://doi.org/10.19066/cogsci.2010.21.4.009
  7. L. Bottou, and O. Bousquet, "The Tradeoffs of Large Scale learning," Proc. of 21st Annual Conference on Neural Information Processing Systems (NIPS 20), pp. 161-168, 2007.