Bitmap Indexes and Query Processing Strategies for Relational XML Twig Queries

관계형 XML 가지 패턴 질의를 위한 비트맵 인덱스와 질의 처리 기법

  • 이경하 (애리조나대 컴퓨터과학과) ;
  • 문봉기 (애리조나대 컴퓨터과학과) ;
  • 이규철 (충남대 컴퓨터공학과)
  • Received : 2009.07.06
  • Accepted : 2010.04.08
  • Published : 2010.06.15

Abstract

Due to an increasing volume of XML data, it is considered prudent to store XML data on an industry-strength database system instead of relying on a domain specific application or a file system. For shredded XML data stored in relational tables, however, it may not be straightforward to apply existing algorithms for twig query processing, since most of the algorithms require XML data to be accessed in a form of streams of elements grouped by their tags and sorted in a particular order. In order to support XML query processing within the common framework of relational database systems, we first propose several bitmap indexes and their strategies for supporting holistic twig joining on XML data stored in relational tables. Since bitmap indexes are well supported in most of the commercial and open-source database systems, the proposed bitmapped indexes and twig query processing strategies can be incorporated into relational query processing framework with more ease. The proposed query processing strategies are efficient in terms of both time and space, because the compressed bitmap indexes stay compressed during data access. In addition, we propose a hybrid index which computes twig query solutions with only bit-vectors, without accessing labeled XML elements stored in the relational tables.

XML 데이터 량의 증가에 따라 DBMS를 이용한 XML 데이터의 저장 관리 기법들이 고안되었다. 하지만, 현재의 가지 패턴 질의 처리 알고리즘들은 XML 데이터를 태그 또는 임의 단위로 분할되고, 각 항목들이 특정 순서로 정렬된 역 리스트들을 입력으로 한다. 이러한 저장 기법의 불일치는 관계형 테이블에 나뉘어 저장되는 XML 데이터의 질의 처리에 이 알고리즘들의 적용을 어렵게 한다. 이 논문에서는 관계형 테이블에 저장된 XML 데이터에 대한 홀리스틱 가지 조인을 지원하기 위한 비트맵 인덱스와 이를 이용한 질의 처리 기법을 제안한다. 비트맵 인덱스는 많은 데이터베이스 시스템에서 지원하므로, 제안하는 인덱스와 가지 질의 처리 기법은 관계형 질의 처리 프레임워크에서 보다 이식이 용이하다. 제안하는 인덱스 기법은 압축을 통해 인덱스 크기를 줄이면서도 질의 처리시 압축해제가 불필요해 시간과 공간 효율적이다. 또한, 이 논문에서는 비트맵 인덱스만을 이용해 XML 노드들 간의 관계성을 식별함으로써, 가지 패턴 질의 처리를 레코드에 저장된 XML 데이터의 접근 없이 수행할 수 있는 혼합 인덱스를 제시한다.

Keywords

Acknowledgement

Supported by : 정보통신연구진흥원

References

  1. J. Shanmugasundaram, H. Gang, K. Tufte,, C. Zhang, D.J. DeWitt & J. Naughton, "Relational databases for querying XML documents: Limitations and opportunities," In Proceedings of the International Conference on Very Large Data Bases, pp.302-314, 1999.
  2. M. Yoshikawa, T. Amagasa, T. Shimura, & S. Uemura, "XRel: a path-based approach to storage and retrieval of XML documents using relational databases," ACM Transactions on Internet Technology, 1(1):110-141, 2001. https://doi.org/10.1145/383034.383038
  3. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, & G. Lohman, "On supporting containment queries in relational database management systems," In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pp.425-436. ACM Press New York, NY, USA, 2001.
  4. P.J. Harding, Q. Li, & B. Moon, "XISS/R: XML indexing and storage system using RDBMS," In Proceedings of the 29th international conference on Very large data bases, pp.1073-1076. VLDB Endowment, 2003.
  5. S. Pal, I. Cseri, O. Seeliger, G. Schaller, L. Giakoumakis, & V. Zolotov, "Indexing XML data stored in a relational database," In Proceedings of the 30th international conference on Very large data bases, pp.1146-1157, 2004.
  6. K. Beyer, and Others, "System RX: one part relational, one part XML," In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp.347-358, ACM New York, NY, USA, 2005.
  7. Z.H. Liu, M. Krishnaprasad & V. Arora, "Native XQuery processing in oracle XMLDB," In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp.828-833, ACM New York, NY, USA, 2005.
  8. M. Rys, D. Chamberlin, & D. Florescu, "XML and relational database management systems: the inside story," In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp.945-947, ACM New York, NY, USA, 2005.
  9. H. Lu, J.X. Yu, G. Wang, S. Zheng, H. Jiang, G. Yu, & A. Zhou, "What makes the differences: benchmarking XML database implementations," ACM Transactions on Internet Technology (TOIT), 5(1): 154-194, 2005. https://doi.org/10.1145/1052934.1052940
  10. T. Grust, J. Rittinger, & J. Teubner, "Why off-the-shelf RDBMSs are better at XPath than you might expect," In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 949-958. ACM Press New York, NY, USA, 2007.
  11. G. Gou & R. Chirkova, "Efficiently querying large xml data repositories: a survey," IEEE Transactions on Knowledge and Data Engineering, 19(10):1381-1403, 2007. https://doi.org/10.1109/TKDE.2007.1060
  12. N. Bruno, N. Koudas, & D. Srivastava, "Holistic twig joins: optimal XML pattern matching," In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp.310-321, 2002.
  13. H. Jiang, W. Wang, H. Lu, & J.X. Yu, "Holistic twig joins on indexed XML documents," In Proceedings of the 29th international conference on Very large data bases, Volume 29, pp.273-284, 2003.
  14. T. Chen, J. Lu, & T.W. Ling, "On boosting holism in XML twig pattern matching using structural indexing techniques," In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp.455-466. ACM New York, NY, USA, 2005.
  15. P. O'Neil & D. Quass, "Improved query performance with variant indexes," In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pp.38-49, 1997.
  16. J. Lu, T. Chen, & T.W. Ling, "Efficient processing of XML twig patterns with parent child edges: a look-ahead approach," In Proceedings of the 13th ACM international conference on Information and knowledge management, pp.533-542, 2004.
  17. H. Jiang, H. Lu & W. Wang, "Efficient processing of XML twig queries with OR-predicates," In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp.59-70, 2004.
  18. T. Yu, T.W. Ling & J. Lu, "twigStackList -: A holistic twig join algorithm for twig query with not-predicates on XML data," LECTURE NOTES IN COMPUTER SCIENCE, 3882:249-264, Springer, 2006.
  19. R. Kaushik, R. Krishnamurthy, J.F. Naughton & R. Ramakrishnan, "On the integration of structure indexes and inverted lists," In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp.779-790, 2004.
  20. B. Choi, M. Mahoui & D. Wood, "On the optimality of holistic algorithms for twig queries," LECTURE NOTES IN COMPUTER SCIENCE, pp.28-37, Springer, 2003.
  21. K. Wu, E. Otoo, & A. Shoshani, "On the performance of bitmap indices for high cardinality attributes," In Proceedings of the 30th international conference on Very large data bases, pp.24-35. VLDB Endowment, 2004.
  22. R. Kaushik, P. Bohannon, J. Naughton & H. Korth, "Covering indexes for branching path queries," In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp.133-144, 2002.
  23. Scientific Data Management Group, University of California Berkeley Lab, "FastBit: An Efficient Bitmap Index Technology," https://sdm.lbl.gov/ fastbit/
  24. J. Lu, T.W. Ling, C.Y. Chan, & T. Chen, "From region encoding to extended dewey: on efficient processing of XML twig pattern matching," In Proceedings of the 31st international conference on Very large data bases, pp.193-204, 2005.
  25. C. Chan & Y. Ioannidis, "An efficient bitmap encoding scheme for selection queries," In Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pp.215-226, 1999.
  26. J.P. Yoon, V. Raghavan, V. Chakilam, & L. Kerschberg, "BitCube: a three-dimensional Bitmap indexing for XML documents," Journal of Intelligent Information Systems, 17(2):241-254, 2001. https://doi.org/10.1023/A:1012861931139