DOI QR코드

DOI QR Code

Distributed Table Join for Scalable RDFS Reasoning on Cloud Computing Environment

클라우드 컴퓨팅 환경에서의 대용량 RDFS 추론을 위한 분산 테이블 조인 기법

  • Received : 2014.04.11
  • Accepted : 2014.07.29
  • Published : 2014.09.15

Abstract

The Knowledge service system needs to infer a new knowledge from indicated knowledge to provide its effective service. Most of the Knowledge service system is expressed in terms of ontology. The volume of knowledge information in a real world is getting massive, so effective technique for massive data of ontology is drawing attention. This paper is to provide the method to infer massive data-ontology to the extent of RDFS, based on cloud computing environment, and evaluate its capability. RDFS inference suggested in this paper is focused on both the method applying MapReduce based on RDFS meta table, and the method of single use of cloud computing memory without using MapReduce under distributed file computing environment. Therefore, this paper explains basically the inference system structure of each technique, the meta table set-up according to RDFS inference rule, and the algorithm of inference strategy. In order to evaluate suggested method in this paper, we perform experiment with LUBM set which is formal data to evaluate ontology inference and search speed. In case LUBM6000, the RDFS inference technique based on meta table had required 13.75 minutes(inferring 1,042 triples per second) to conduct total inference, whereas the method applying the cloud computing memory had needed 7.24 minutes(inferring 1,979 triples per second) showing its speed twice faster.

지식 서비스 시스템이 효과적인 서비스를 제공하기 위해서는, 명시된 지식을 바탕으로 새로운 지식을 추론 할 수 있어야 한다. 대부분 지식 서비스 시스템은 온톨로지로 지식을 표현한다. 실 세계의 지식 정보의 양은 점점 방대해지고 있으며, 따라서 대용량 온톨로지를 효과적으로 추론하는 기법이 요구되고 있다. 본 논문은 클라우드 컴퓨팅 환경을 기반으로 대용량 온톨로지를 RDFS수준으로 추론하기 위한 분산 테이블 조인 방법을 제안하고, 성능을 평가한다. 본 논문에서 제안하는 RDFS 추론은 분산 파일 시스템 환경에서 RDFS 메타 테이블을 기반으로 맵-리듀스를 적용한 방식과, 맵-리듀스를 사용하지 않고 클라우드 컴퓨터의 메모리만 사용한 방식에 초점을 맞추었다. 따라서 본 논문에서는 제안하는 각 기법에 대한 추론 시스템 구조와 RDFS 추론 규칙에 따른 메타 테이블 설계 및 추론 전략 알고리즘에 대해서 중점적으로 설명한다. 제안하는 기법의 효율성을 검증하기 위해 온톨로지 추론과 검색 속도를 평가하는 공식 데이터인 LUBM1000부터 LUBM6000을 대상으로 실험을 수행 하였다. 가장 큰 LUBM6000(8억 6천만 트리플)의 경우, 메타 테이블 기반의 RDFS 추론 기법은 전체 추론 시간이 13.75분(초당 1,042 트리플 추론) 소요된 반면, 클라우드 컴퓨터의 메모리를 적용한 방식은 7.24분(초당 1,979 트리플 추론)이 소모되어 약 2배정도 빠른 추론 속도를 보였다.

Keywords

Acknowledgement

Grant : WiseKB: 빅데이터 이해 기반 자가학습형 지식베이스 및 추론 기술 개발

Supported by : 한국산업기술평가관리원

References

  1. Weaver, Jesse, and James A. Hendler, "Parallel materialization of the finite rdfs closure for hundreds of millions of triples," The Semantic Web-ISWC 2009, Springer Berlin Heidelberg, pp. 682-697, 2009.
  2. Oren, Eyal, et al., "Marvin: Distributed reasoning over large-scale Semantic Web data," Web Semantics: Science, Services and Agents on the World Wide Web7.4, pp. 305-316, 2009. https://doi.org/10.1016/j.websem.2009.09.002
  3. Oren, Eyal, et al., "Marvin: A platform for large-scale analysis of Semantic Web data," 2009.
  4. Urbani, Jacopo, et al., "WebPIE: A Web-scale parallel inference engine using MapReduce," Web Semantics: Science, Services and Agents on the World Wide Web 10, pp. 59-75, 2012. https://doi.org/10.1016/j.websem.2011.05.004
  5. Urbani, Jacopo, et al., "WebPIE: a web-scale parallel inference engine," Third IEEE International Scalable Computing Challenge (SCALE2010), Held in Conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010), Melbourne, Australia, 2010.
  6. Wielemaker, Jan, et al., "Swi-prolog," Theory and Practice of Logic Programming 12.1-2, pp. 67-96, 2012. https://doi.org/10.1017/S1471068411000494
  7. Thusoo, Ashish, et al., "Hive: a warehousing solution over a map-reduce framework," Proceedings of the VLDB Endowment 2.2, pp. 1626-1629, 2009.
  8. Kornacker, Marcel, and Justin Erickson, "Cloudera Impala: real-time queries in Apache Hadoop, for real," 2012-10 [2013-02]. http://blog. cloudera. com/blog/2012/10/cloudera-impalareal-time-queries-in-apache-hadoop-for-real, 2012.
  9. Heeyoung Shin, Dongwon Jeong, Doo-Kwon Baik, "Experiment and Simulation for Evaluation of Jena Storage Plug-in Considering Hierarchical Structure," Journal of the Korea society for simulation, Vol. 17, No. 2, pp. 31-47, Jun. 2008.
  10. Dogwon Jeong, Myounghoi Choi, Young-Sik Jeong, Sung-Kook Han, "Implementation and Evaluation of a Web Ontology Storage based on Relation Analysis of OWL Elemnets and Query Patterns," Journal of KIISE : Database, Vol. 35, No. 3, pp. 231-242, Jun.
  11. Klyne, Graham, and Jeremy J. Carroll, "Resource description framework (RDF): Concepts and abstract syntax," 2006.
  12. Brickley, Dan, and Ramanathan V. Guha, "{RDF vocabulary description language 1.0: RDF schema}," 2004.

Cited by

  1. Scalable Ontology Reasoning Using GPU Cluster Approach vol.43, pp.1, 2016, https://doi.org/10.5626/JOK.2016.43.1.61
  2. An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1129