Dynamic Data Cubes Over Data Streams

데이타 스트림에서 동적 데이타 큐브

  • 서대홍 (Telcoware 네트워크사업부 Framework Solution 팀) ;
  • 양우석 (연세대학교 컴퓨터과학과) ;
  • 이원석 (연세대학교 컴퓨터과학과)
  • Published : 2008.08.15

Abstract

Data cube, which is multi-dimensional data model, have been successfully applied in many cases of multi-dimensional data analysis, and is still being researched to be applied in data stream analysis. Data stream is being generated in real-time, incessant, immense, and volatile manner. The distribution characteristics of data arc changing rapidly due to those characteristics, so the primary rule of handling data stream is to check once and dispose it. For those characteristics, users are more interested in high support attribute values observed rather than the entire attribute values over data streams. This paper propose dynamic data cube for applying data cube to data stream environment. Dynamic data cube specify user's interested area by the support ratio of attribute value, and dynamically manage the attribute values by grouping each other. By doing this it reduce the memory usage and process time. And it can efficiently shows or emphasize user's interested area by increasing the granularity for attributes that have higher support. We perform experiments to verify how efficiently dynamic data cube works in limited memory usage.

OLAP의 다차원 데이타 모델인 데이타 큐브는 많은 다차원 데이타 분석에 성공적으로 적용되었으며, 데이타 스트림 분석에도 적용하려는 많은 연구가 진행되고 있다. 데이타 스트림은 실시간에 지속적으로 방대하게 생성되며, 데이타의 분포적 특성이 빠르게 변한다는 특징을 가지며, 제한된 메모리 및 처리능력 때문에 한번만 검사하여 처리하는 것을 기본으로 한다. 때문에 데이타 스트림을 메모리에 모두 저장하는 것은 불가능하다. 또한 사용자는 모든 속성 값에 대하여 관심을 두기보다는 일정 지지율 이상을 가진 속성 값에 더욱 관심을 가지게 된다. 본 논문에서는 이러한 데이타 스트림 환경에서 데이타 큐브를 효과적으로 적용하기 위한 동적 데이타 큐브를 제안한다. 동적 데이타 큐브는 속성 값의 지지율에 따라 사용자 관심 영역을 지정하고, 속성 값을 동적으로 그룹화하여 관리한다. 이를 통해 메모리 및 처리시간을 절약하게 된다. 또한 동적으로 지지율이 높은 속성에 대한 분석 상세도를 높여주기 때문에 사용자의 관심영역을 효과적으로 보여준다. 마지막으로 실험을 통하여 제한된 메모리에서 동적 데이타 큐브가 효율적으로 동작함을 검증하였다.

Keywords

References

  1. Inmon, W.H., Building the Data Warehouse, John Wiley, 1992
  2. The OLAP Council., "MD-API the OLAP Application Program Interface Version 0.5 Specification," 1996
  3. Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi, "Modeling multidimensional database. In Proc.," the 13th Intl conference on Data Engineering, Birmingham, U.K., pp. 232-243, 1997
  4. S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology," SIGMOD Record, Vol.26, pp. 65-74, 1997 https://doi.org/10.1145/248603.248616
  5. M. Garofalakis, J. Gehrke, and R. Rastogi., "Querying and Mining Data Streams: You Only Get One Look," In tutorial notes of the 28th International Conference on Very Large Data Bases, TUTORIAL SESSION: Tutorial 1, pp. 635-635, 2002
  6. Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, Y. Dora Cai, "Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams," Distributed and Parallel Databases, Vol.18, No.2, pp. 173-197, 2005 https://doi.org/10.1007/s10619-005-3296-1
  7. George Colliat, "OLAP, relational and multidimensional database systems," ACM SIGMOD Record, Vol.25, No.3, pp. 64-69, 1995
  8. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multi-dimensional regression analysis of time-series data streams," Proceedings of the 28th international conference on VLDB, pp. 323-334, 2002
  9. Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang. "Efficient computation of iceberg cubes with complex measures," ACM SIGMOD Record, Vol.30, No.2, pp. 1-12, 2001 https://doi.org/10.1145/376284.375664
  10. V. Harinarayan, A. Rajaraman, and J.D. Ullman, "Implementing data cubes efficiently," ACM SIGMOD Record, Vol.25, No.2, pp. 205-216, 1996 https://doi.org/10.1145/235968.233333
  11. K. Beyer and R. Ramakrishnan, "Bottom-up computation of sparse and iceberg cubes," ACM SIGMOD Record, Vol.28, No.2, pp. 359-370, June 1999 https://doi.org/10.1145/304181.304214
  12. Z. Shao, J. Han, and D. Xin, "MM-Cubing: Computing iceberg cubes by factorizing the lattice space," Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 213-222, June 2004
  13. D. Xin, J. Han, X. Li, and B.W. Wah, "Star- cubing: Computing iceberg cubes by top-down and bottom-up integration," Proceedings of the 29th international conference on Very large data bases, Vol.29, pp. 476-487, 2003
  14. Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang, "Efficient Computation of Iceberg Cubes with Complex Measures," SIGMOD Conference, Vol.30, No.2, pp. 1-12, 2001 https://doi.org/10.1145/376284.375664
  15. J. Gray, S.Chaudhuri, A.Bosworth, A.Layman, D. Reichart, M.Venkatrao, "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," Data Mining and Knowledge Discovery, Vol.1, pp. 29-53, 1997 https://doi.org/10.1023/A:1009726021843
  16. M.E.J. NEWMAN, "Power laws, Pareto distributions and Zipf's law," Contemporary Physics, Vol.46, No.5, pp. 323-351, 2005 https://doi.org/10.1080/00107510500052444