FiST: XML Document Filtering by Sequencing Twig Patterns

가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링

  • Published : 2006.08.01

Abstract

In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribing users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a novel scalable filtering system called FiST(Filtering by Sequencing Twigs) that transforms twig patterns expressed in XPath and XML documents into sequences using Prufer's method. As a consequence, instead of matching linear paths of twig patterns individually and merging the matches during post-processing, FiST performs holistic matching of twig patterns with incoming documents. FiST organizes the sequences into a dynamic hash based index for efficient filtering. We demonstrate that our holistic matching approach yields lower filtering cost and good scalability under various situations.

최근 XML 문서 필터링에 기반한 출판 -구독 (publish-subscribe) 시스템이 많은 관심을 받고 있다. 전형적인 출판 구독 시스템에서, 구독자들은 XPath 언어로 명세된 프로파일로 자신들의 관심을 표현하고, 새로운 내용들은 사용자 프로파일에 대하여 매칭 여부를 판단하여 관심을 가지고 있는 사용자들에게만 배달된다. 구독자의 수와 그들의 프로파일이 증가할수록, 시스템의 확장성이 출판 구독 시스템의 중요한 성공 요소가 된다. 이 논문에서는 XPath 로 명세된 가지형 패턴과 입력 XML 문서들을 Prufer의 방법을 사용하여 시퀀스로 변환하는 FiST라 불라는 새로운 필터링 시스템을 제안한다. FiST 시스템은 가지형 패턴을 구성하는 선형 경로들에 대하여 각각 매칭을 수행하고 후처리 과정에서 그 결과들을 병합하는 방법을 이용하는 대신에 가지형 패턴 전체를 사용하여 입력 문서에 대하여 매칭을 수행한다. 또한 효율적인 필터링을 위하여 시퀀스들을 해시 기반의 동적 인덱스로 구성한다. 실험 결과를 통해 전체 매칭 접근 방법이 다양한 환경에서 낮은 필터링 비용과 좋은 확장성을 가짐을 알 수 있다.

Keywords

References

  1. Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernandez, Michael Kay, Jonathan Robie and Jrme Simon, XML Path Language (XPath) 2.0 W3C Working Draft 16. Technical Report WD-xpath-20-20020816, World Wide Web Consortium, August 2002
  2. Karim Muller, 'Semi-Automatic Construction of a Question Treebank,' In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 2004
  3. H. Prufer, 'Neuer Beweis eines Satzes tiber Permutationen,' Archiv fur Mathematik und Physik, 27: 142-144, 1998
  4. Praveen R. Rao and Bongki Moon, 'PRIX: Indexing and Querying XML Using Prufer Sequences,' In Proceedings of the 20th IEEE International Conference on Data Engineering, pp. 288-299, Boston, MA, March 2004 https://doi.org/10.1109/ICDE.2004.1320005
  5. Mehmet Altinel and Michael J. Franklin, 'Efficient Filtering of XML Documents for Selective Dissemination of Information,' In Proceeding of the 26th VLDB Conference, pp. 53-64, Cairo, Egypt, September 2000
  6. Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang and Peter Fischer, 'Path sharing and predicate evaluation for high-performance XML filtering,' ACM Trans. Database Systems, Vol. 28, No.4, pp. 467-516, 2003 https://doi.org/10.1145/958942.958947
  7. Chee Yong Chan, Pascal Felber, Minos N. Garofalakis and Raieev Rastogi, 'Efficient Filtering of XML Documents with XPath Expressions,' In Proceedings of the 18th IEEE International Conference on Data Engineering, pp. 235-244, San Jose, CA, February 2002
  8. Ashish Kumar Gupta and Dan Suciu, 'Stream processing of XPath queries with predicates,' In Proceeding of the 2003 ACM-SIGMOD conference, pp. 419-430, San Diego, CA, June 2003 https://doi.org/10.1145/872757.872809
  9. T. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu, 'Processing XML Streams with Deterministic Automata and Stream Indexes,' ACM Trans. on Database Systems, Vol. 29 No.4, pp. 752-788, December 2004 https://doi.org/10.1145/1042046.1042051
  10. Nicolas Bruno, Luis Gravano, Nick Koudas and Divesh Srivastava, 'Navigation- vs. Index-Based XML Multi-Query Processing,' In Proceedings of the 19th IEEE International Conference on Data Engineering, pp. 139-150, Bangalore, India, March 2003
  11. Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr and jussi Myllymaki, 'Implementing a Scalable XML Publish/Subscribe System Using a Relational Database System,' In Proceeding of the 2004 ACM-SIGMOD Conference, pp. 479-490, Paris, France, June 2004 https://doi.org/10.1145/1007568.1007623
  12. Quanzhong Li and Bongki Moon, 'Indexing and Querying XML Data for Regular Path Expressions,' In Proceeding of the 27th VLDB Conference, pp. 361-370, Rome, Italy, September 2001
  13. N. Bruno, N. Koudas, D. Srivastava, 'Holistic Twig Joins: Optimal XML Pattern Matching,' In Proceeding of the 2002 ACM-SIGMOD conference, pp. 310-321, Madison, WI, June 2002 https://doi.org/10.1145/564691.564727
  14. David Megginson, Simple API for XML, http://sax.sourceforge.net/
  15. Apache Xerces C++ Parser. http://xml.apache.org/xerces-c/
  16. Angel Luis Diaz and Douglas Lovell, XML Generator. http://www.alphaworks.ibm.com/tech/xml-generator