Genealogy-based Indexing Technique for XML Documents

XML문서를 위한 족보 기반 인덱싱 기법

  • 이월영 (이화여자대학교 컴퓨터학과) ;
  • 용환승 (이화여자대학교 컴퓨터학과)
  • Published : 2004.02.01

Abstract

Theses days, a number of data over the Internet are represented using XML because of a virtue of XML. In proportion to the increase of XML data, query processing techniques are required that support quickly and efficiently the diverse queries to search the useful information on XML documents. But, up to now, the researches handling queries for XML data are methodologies focusing on how to process regular path expressions. Therefore, we have developed a new genealogy-based indexing technique to solve various queries such as not only regular path expression but also simple path expression, path expression referencing other elements, and so on. Also, we have applied this technique on object-relational model and evaluated the performance for many documents and various query types. The result shows improved performance in comparison with the other storage techniques.

오늘날 인터넷 상의 많은 데이타들은 XML의 여러 장점들로 인하여 XML을 이용하여 표현되고 있다. 이렇게 XML 데이타가 늘어가는 것에 비례하여 XML 문서상에서 유용한 정보를 검색하기 위하여 다양한 질의를 빠르고 효율적으로 지원할 수 있는 질의 처리 기법이 요구되고 있다. 그러나 현재까지는 XML 데이타를 위한 질의 최적화 연구는 정규 경로 표현을 다루는 방법론에 국한되어 있다. 본 논문은 새로운 족보 기반 인덱싱 기법을 개발하여 정규 경로 표현뿐 아니라, 단순 경로 표현과 다른 엘리먼트를 참조하고 있는 경로 표현과 같은 다양한 질의 처리를 해결하였다. 또한 이 인덱싱 기법을 객체-관계형 모델에 적용하여 여러 종류의 문서와 다양한 질의 종류에 대해 성능을 평가하였고, 다른 저장 기법과 비교하여 성능의 우수성을 입증하였다.

Keywords

References

  1. W3C Consortium, XML 1.0 (Second Edition),W3C Recommendation, 6 Oct. 2000, available at http://www.w3.org.TR/2000/WD-xml-2e-20000814
  2. Q. Li and B. Moon, Indexing and Querying XML data for Regular Path Expressions, VLDB, 2001
  3. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, G. Lohman, On Supporting Containment Queries in Relational database Management Systems, SIGMOD, 2001 https://doi.org/10.1145/375663.375722
  4. D. Florescu and D. Kossmann, Storing and Querying XML data using an RDBMS, IEEE Data Engineering Bulletin, 22(3): 27-34, 1999
  5. J. Shanmugasundaram, K. Tuffe, G. He, C. Zhang, D. DeWitt, and J. Naughton, Relational databases for Querying XML Documents; Limitations and Opportunities, VLDB, 1999
  6. C. Zhang, Q. Luo, D. DeWitt, J. Naughton, and F. Tian, On the Use of a Relational database Management System for XML Information Retrieval, 2000
  7. S. Banerjee, Oracle XML DB, Oracle Corporation Technical White Paper Release 9.2, Jan. 2002
  8. S. Howlett and D. Jennings, SQL Server 2000 and XML: Developing XML-Enabled data Solutions for the Web, MSDN magazined, Jan. 2002 available at http://msdn.microsoft.com/library/default.asp?url=/msdnmag/issues/0800/sql2000/toc.asp
  9. IBM Corporation, DB2 XML Extender, IBM Corporation, 2000, available at http://www-4.ibm.com/
  10. V.Aguilera, S.Cluet, P.Veltri, D.Vodislav, and F.Wattez, Querying XML Documents in Xyleme, SIGIR, 2000
  11. C. Chung, J. Min, and K. Shim, APEX: An Adaptive Path Index for XML data, SIGMOD, 2002 https://doi.org/10.1145/564691.564706
  12. F. Rizzolo and A. Mendlzon, Indexing XML data with ToXin, 4th Int. Workshop on the Web and database, 2001
  13. B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hialtason, and M. Shadmon, A Fast Index for semistructured data, VLDB, 2001
  14. B. F.Cooper, N. Sample, and M. Shadmon, A Parallel Index for semistructured data, SAC, 2002 https://doi.org/10.1145/508791.508963
  15. M. Fernandez and D. Suciu, Optimizing Regular Path Expressions Using Graph Schemas, ICDE, 1998 https://doi.org/10.1109/ICDE.1998.655753
  16. R. Goldman and J. Widom, Dataguides: Enabling Query Formulation and Optimization in semistructured databases, VLDB, 1997
  17. T. Milo and D. Suciu, Index Structures for Path Expressions, ICDT, 1997
  18. J. McHugh, J. Widom, S. Abitboul, Q. Luo, and A. Rajaraman, Indexing semistructured data, Stanford Technical Report, Jan. 1998
  19. W3C Consortium, XML Path language (XPath) Version 1.0, W3C Recommendation 16 Nov. 1999, available at http://www.w3.org/TR/xpath.html
  20. W. Thomas, Automata on Infinite Objects, Handbook of Theoretical Computer Science, Vol. B, 135-191, 1990
  21. W3C Consortium, Xquery1.0 Formal Semantics, W3C Working Draft 16 Aug. 2002, available at http://www.w3.org/TR/query-semantics
  22. P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price, Access Path Selection in a Relational Database Management System, Proceedings of the ACM SIGMOD International Conference on Management of Data, Boston, MA, 1979 https://doi.org/10.1145/582095.582099
  23. J. Kim, W. Lee, K. Lee, 'The Cost Model for XML Documents,' In Proc. of ACS/IEEE International Conference on Computer Systems and Applications, Beirut, Lebanon, Jun. 2001 https://doi.org/10.1109/AICCSA.2001.933973