xPlaneb: 3-Dimensional Bitmap Index for Index Document Retrieval

xPlaneb: XML문서 검색을 위한 3차원 비트맵 인덱스

  • 이재민 (가톨릭대학교 컴퓨터공학과) ;
  • 황병연 (가톨릭대학교 컴퓨터공학과)
  • Published : 2004.06.01

Abstract

XML has got to be a new standard for data representation and exchanging by its many good points, and the core part of many new researches and emerging technologies. However, the self-describing characteristic, which is one of XML's good points, caused the spreading of XML documents with different structures, and so the need of the research for the effective XML-document search has been proposed. This paper is for the analysis of the problem in BitCube, which is a bitmap indexing that shows high performance grounded on its fast retrieval. In addition, to resolve the problem of BitCube, we did design and implement xPlaneb(XML Plane Web) which it a new 3-dimensional bitmap indexing made of linked lists. We propose an effective information retrieval technique by replacing BitCube operations with new ones and reconstructing 3-dimensional array index of BitCube with effective nodes. Performance evaluation shows that the proposed technique is better than BitCube, as the amount of document increases, in terms of memory consumptions and operation speed.

XML은 다양한 장점을 통해 데이타를 표현하고 교환하기 위한 새로운 표준이 되었으며 현대의 많은 연구와 새로운 기술들에서 핵심적인 요소가 되고 있다. 그러나 XML의 장점인 자기 서술적인 특징은 구조적으로 상이한 XML 문서의 확산을 초래하게 되었고 이에 따라 XML의 효과적인 검색에 대한 연구의 필요성이 대두되게 되었다. 본 논문에서는 빠른 검색 속도를 통해 뛰어난 성능을 입증한 비트맵 인덱싱인 BitCube의 문제점을 분석한다. 또한 BitCube의 문제점을 해결하기 위해 연결 리스트를 이용한 새로운 3차원 비트맵 인덱싱인 xPlaneb(XML Plane Web)를 설계 및 구현한다. 제안된 기법은 BitCube의 3차원 배열 인덱스를 효율적인 노드로 재구성하고 BitCube의 연산을 대체하는 새로운 연산들을 활용하여 효과적으로 정보를 추출한다. 성능 평가를 통해 제안된 기법이 클러스터내의 문서의 양이 증가함에 따라 BitCube보다 메모리 사용량과 연산 수행 속도면에서 더 우수하다는 것을 보였다.

Keywords

References

  1. W3C, 'Extensible Markup Language(XML) Version 1.0 (Second Edition),' http://www.w3c.org/TR/REC-xml, October 2000
  2. J. Yoon, V. Raghavan, and V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' 13th International Conference on Scientific and Statistical Database Management, Virginia, July 2001
  3. J. Yoon, V. Raghavan, V. Chakilam, and L. Kerschberg, 'BitCube: A Three-Dimentional Bitmap Indexing for XML Documents,' Journal of Intelligent Information System, Vol.17, pp.241-254, 2001 https://doi.org/10.1023/A:1012861931139
  4. C. Chan and Y. Ioannidis, 'Bitmap Index Design and Evaluation,' Proceedings of ACM SIGMOD Conference, Seattle, pp.355-366, June 1998 https://doi.org/10.1145/276304.276336
  5. S. Banerjee, Oracle XML DB, An Oracle Technical White Parer, January 2003
  6. L. Ennser, C. Delporte, M. Oba, and K. M. Sunil, Integrating XML with DB2 XML Extender and DB2 Text Extender, IBM Redbook, December 2000
  7. S. Howlett and D. Jennings, 'SQL Server 2000 and XML : Developing XML-Enabled Data Solutions for the Web,' MSDN Magazine, Vol.17, No.1, January 2002
  8. Sybase Corporation, XML Management Package for Sybase Adaptive ServerEnterprise 12.5.1, A Sybase Technical White Paper, September 2003
  9. D. Egnor and R. Lord, 'XYZFind: Structrued Searching in Context with XML,' ACM SIGIR Workshop, Athens, Greece, July 2000
  10. XQEngine, http://www.fatdog.com
  11. M. Rousset, 'Semantic Data Integration in Xyleme,' Presentation at INRIA, September 1999
  12. J. McHugh, J. Wisdom, S. Abiteboul, Q. Luo, and A. Rajaraman, 'Indexing Semistructured Data,' Stanford University Technical Report, February 1998
  13. N. Fuhr and K. Grossjohann, 'XIRQL: An Extension of XQL for Information Retrieval,' ACM SIGIR Workshop, Athens, Greece, July 2000
  14. B. Cooper, N. Sample, M. Franklin, and M. Shadmon, 'A Fast Index for Semistructured Data,' Proceedings of 27th VLDB Conference, Roma, Italy, 2001
  15. Y. Papakonstantinou and V. Vianu, 'DTD Inference for Views of XML Data,' Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on PODS, 2000 https://doi.org/10.1145/335168.335173
  16. Stanford University Technical Report Indexing Semistructured Data J.McHugh;J.Wisdom;S.Abiteboul;Q.Luo;A.Rajaraman
  17. ACM SIGIR Workshop XIRQL: An Extension of XQL for Information Retrieval N.Fuhr;K.Grossjohann
  18. Proceedings of 27th VLDB Conference A Fast Index for Semistructured Data B.Cooper;N.Sample;M.Franklin;M.Shadmon
  19. Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on PODS DTD Inference for Views of XML Data Y.Papakonstantinou;V.Vianu