An Efficient Inverted Index Technique based on RDBMS for XML Documents

XML 문서에 대한 RDBMS에 기반을 둔 효율적인 역색인 기법

  • 서치영 (ITCamp㈜ 연구소) ;
  • 이상원 (성균관대학교 정보통신공학부) ;
  • 김형주 (서울대학교 컴퓨터공학부)
  • Published : 2003.02.01

Abstract

The inverted index widely used in the existing information retrieval field should be extended for XML documents to support containment queries by XML information retrieval systems. In this paper, we consider that there are two methods in storing the inverted index and processing containment queries for XML documents as the previous work suggested: using a RDBMS or using an inverted lift engine. It has two drawbacks to extend the inverted index in the previous work. One is that using a RDBMS is moth worse in the performance than using an inverted list engine. The other is that when containment queries are processed in a RDBMS, there is an increase in the number of a join operation as the path length of a query increases and a join operation always happens between large fables. In this paper. we extend the inverted index in a different way to solve these problems and show the effectiveness of using a RDBMS.

XML 정보검색 시스템이 XML 문서에 대한 포함질의를 지원하기 위해서는 기존의 정의검색 분야에서 널리 쓰이는 역색인 기법을 XML 문서에 대해서도 적용이 가능하도록 확장해야 한다. 본 논문에서는 확장된 역색인 정보를 저장하고 XML 문서에 대한 포함질의를 처리하는 방법을 이전 연구에서와 같이 두 가지 관점에서 제시한다. 하나는 관계형 데이타베이스 관리 시스템(RDBMS)을 이용해서 역색인 정보를 저장하고 질의를 처리하는 방법이고 다른 하나는 RDBMS 대신 역 리스트 엔진(Inverted List Engine)을 이용하는 방법이다. 이전 연구에서 역색인을 확장한 방식은 두 가지 문제점이 존재한다. 하나는 RDBMS를 이용하는 방법이 역 리스트 엔진을 이용하는 방법에 비해 성능 상으로 많이 안 좋다는 점이고, 다른 하나는 RDBMS 상에서 포함질의를 처리 시, 질의의 경로길이에 비례해서 조인연산이 증가하고 조인연산도 크기가 큰 테이블간의 조인이 된다는 점이다. 본 논문에서는 이러한 문제점들을 해결하고자 이전연구와는 다르게 역색인을 확장하여 RDBMS를 이용하는 방법의 효율성을 밝힌다.

Keywords

References

  1. T. Bray, J. Paoli, and C. Sperberg-McQueen, 'Extensible markup language(XML) 1.0', Technical report, W3C Recommendation, 1998
  2. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman, 'On supporting containment queries in relational database management systems', ACM SIGMOD, 2001 https://doi.org/10.1145/376284.375722
  3. Daniela Florescu, Donald Kossman, and Ioana Manolescu, 'Integrating keyword search into XML query processing', WWW9/Computer Networks, 33(1-6):119-135, 2000 https://doi.org/10.1016/S1389-1286(00)00069-4
  4. Takeyuki Shimura, Masatishi Yoshikawa, and Shunsuke Uemura, 'Storage and retrieval of XML documents using object-relational database', Dexa, pp. 206-217
  5. Daniela Florescu and Donald Kossman, 'Storing and querying XML data using anrdbms', IEEE Data Engineering Bulletin, 22(3):27-34, 1999
  6. Jayavel Schanmugasundaram, He Gang, Kristin Tuffe, Chun Zhang, David DeWitt, and Jeffrey Naughto, 'Relational database for quering XML documents: limitation and opportunities', VLDB, 1999
  7. Jayavel Schanmugasundaram, E. Schekita, R.Barr, Michael J. Carey, Bruce G.Lindsay, Hamid Pirahesh, and Berthold Reinwald, 'Efficiently publishing relational data as XML documents', VLDB, 2000
  8. Alin Deutsch, Mary F. Fernandez, and Dan Suciu, 'Storing semistructured data with STORED', ACM SIGMOD, 1999 https://doi.org/10.1145/304182.304220
  9. International Organization for Standardization, 'Information processing-text and office systems-standard generalized markup language(sgml)', iso/iec 8879, 1986
  10. Ricardo Baeza-Yates and Gonzalo Navarro, 'Integrating contents and structure in text retrieval', SIGMOD Record, 25(1):67-69, March 1996 https://doi.org/10.1145/381854.381890
  11. G. E. Blake, M. P. Consens, P. Kilpelainen, P. A. Larson, T. Snider, and F. W. Tompa, 'Text/relational database management system: Harmonizing sql and sgml', In Proceedings of the International Conference on Applications of Database, pages 267-280, June 1994
  12. Ron Sacks-Davis, Timothy Arnold-Moore, and Justin Zobel, 'Database systems for structured documents', In Proceedings of the International Symposium on Advanced Database Technologies and Their Integration, pages 272-283, October 1994
  13. T. Arnold-Moore, M. Fuller, B. Lowe, J. Thom, and R. Wilkinson, 'The elf data model and sgql query language for structured document database', In Proceedings of the Australasian Database Conference, Adelaide, Australia, pages 17-26, 1995
  14. C. L. Clarke, G. V. Cormack, and F. J. Burkowski, 'An algebra structured text search and a framework for its implementation', The Computer Journal, 38(1):43-56, 1995 https://doi.org/10.1093/comjnl/38.1.43
  15. C. L. Clarke, G. V. Cormack, and F. J. Burkowski, 'Schema-independent retrieval from heterogeneous structured text', In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 279-289, 1995
  16. Tuong Dao, Ron Sacks-Davis, and James A. Thom, 'Indexing structured text for queries on containment relationships', In Proceedings of the 7th Australasian Database Conference, 1996
  17. D. Chamberlin, D. Florescu, J. Robie, T. Simeon, and M. Stefanescu, 'XQuery 1.0: An XML Query Language', W3C Working Draft, June 2001
  18. J. Clark and S. DeRose, 'XML Path Language (XPath) Version 1.0', W3C Recommendation, November 1999
  19. J. Robie, J. Lapp, and D. Schach, 'XML Query Language (XQL)', http://www.w3.org/TandS/QL/QL98/pp/xql.html
  20. Alin Deutsch, Mary F. Fernandez, Daniela Florescu, Alon Y. Levy, and Dan Suciu, 'XML-QL: A Query Language for XML', World Wide Web Consortium, August 1998
  21. Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 'Moderm Information Retrieval', Addison- Wesley Longman Inc, 1999
  22. W. B. Frakes and R. Baeza-Yates, 'Information Retrieval: Data Structures & Algorithms', Prentice Hall, Englewood Cliffs, NJ, USA, 1992
  23. Oracle, Oracle Documentation Library, Release 8.1.7, http://otn.oracle.co.kr/docs/Oracle817/index.htm
  24. Sleepycat Software, The berkeley database, http://www.sleepycat.com