DOI QR코드

DOI QR Code

ABox Realization Reasoning in Distributed In-Memory System

분산 메모리 환경에서의 ABox 실체화 추론

  • Received : 2014.12.05
  • Accepted : 2015.05.13
  • Published : 2015.07.15

Abstract

As the amount of knowledge information significantly increases, a lot of progress has been made in the studies focusing on how to reason large scale ontology effectively at the level of RDFS or OWL. These reasoning methods are divided into TBox classifications and ABox realizations. A TBox classification mainly deals with integrity and dependencies in schema, whereas an ABox realization mainly handles a variety of issues in instances. Therefore, the ABox realization is very important in practical applications. In this paper, we propose a realization method for analyzing the constraint of the specified class, so that the reasoning system automatically infers the classes to which instances belong. Unlike conventional methods that take advantage of the object oriented language based distributed file system, we propose a large scale ontology reasoning method using spark, which is a functional programming-based in-memory system. To verify the effectiveness of the proposed method, we used instances created from the Wine ontology by W3C(120 to 600 million triples). The proposed system processed the largest 600 million triples and generated 951 million triples in 51 minutes (696 K triple / sec) in our largest experiment.

최근 지식 정보의 양이 방대해지면서, 대용량 온톨로지를 효과적으로 추론하는 연구가 활발히 진행되고 있다. 이러한 추론 방법들은 TBox 분류와 ABox 실체화로 나누어진다. TBox 추론은 스키마의 무결성과 종속성을 주로 다룬다면, ABox 추론은 인스턴스 위주의 다양한 문제를 다루어서 실제 응용에서의 중요성이 매우 크다. 따라서 본 논문은 클래스의 제약 조건을 분석하고, 이를 통해 인스턴스가 속하는 클래스를 추론할 수 있는 방법을 제안한다. 객체 지향 언어 기반의 분산 파일 시스템을 활용했던 기존 방법과 달리 함수형 프로그래밍 기반의 인 메모리 시스템인 스파크를 통해 대용량 온톨로지 실체화 방법에 대해서 설명한다. 제안하는 기법의 효율성을 검증하기 위해 W3C의 Wine 온톨로지를 이용해 인스턴스를 생성(1억 2천만~6억개의 트리플)하고 실험을 수행하였다. 6억개의 트리플을 대상으로 진행한 실험의 경우 전체 추론 시간이 51분(696 K Triple/sec)이 소요되었다.

Keywords

Acknowledgement

Grant : WiseKB: 빅데이터 이해 기반 자가학습형 지식베이스 및 추론 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. Weaver, Jesse, and James A. Hendler, "Parallel materialization of the finite rdfs closure for hundreds of millions of triples," The Semantic Web-ISWC 2009. Springer Berlin Heidelberg, 2009, 682-697.
  2. Oren, Eyal, et al., "Marvin: Distributed reasoning over large-scale Semantic Web data," Web Semantics: Science, Services and Agents on the World Wide Web7.4 (2009): 305-316. https://doi.org/10.1016/j.websem.2009.09.002
  3. Oren, Eyal, et al., "Marvin: A platform for largescale analysis of Semantic Web data," (2009).
  4. Klyne, Graham, and Jeremy J. Carroll, "Resource description framework (RDF): Concepts and abstract syntax," 2006.
  5. Brickley, Dan, and Ramanathan V. Guha, "{RDF vocabulary description language 1.0: RDF schema}," 2004.
  6. Jena, Apache, "Apache jena," jena. apache. org [Online]. Available: http://jena.apache.org [Accessed: Mar. 20, 2014] (2013).
  7. Broekstra, Jeen, Arjohn Kampman, and Frank Van Harmelen, "Sesame: A generic architecture for storing and querying rdf and rdf schema," The Semantic Web-ISWC 2002. Springer Berlin Heidelberg, pp. 54-68, 2002.
  8. Urbani, Jacopo, et al., "WebPIE: A Web-scale parallel inference engine using MapReduce," Web Semantics: Science, Services and Agents on the World Wide Web 10, pp. 59-75, 2012. https://doi.org/10.1016/j.websem.2011.05.004
  9. Urbani, Jacopo, et al., "WebPIE: a web-scale parallel inference engine," Third IEEE International Scalable Computing Challenge (SCALE2010), Held in Conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010), Melbourne, Australia. 2010.
  10. Zaharia, Matei, et al., "Spark: cluster computing with working sets," Proc. of the 2nd USENIX conference on Hot topics in cloud computing, 2010.
  11. Zaharia, Matei, et al., "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," Proc. of the 9th USENIX conference on Networked Systems Design and Implementation, USENIX Association, 2012.

Cited by

  1. An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1129