Dynamic Data Cubes Over Data Streams

Seo, Dae-Hong;Yang, Woo-Sock;Lee, Won-Suk;

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Volume 35 Issue 4
/
Pages.319-332
/
2008
/
1229-7739(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Dynamic Data Cubes Over Data Streams

데이타 스트림에서 동적 데이타 큐브

서대홍 (Telcoware 네트워크사업부 Framework Solution 팀) ;
양우석 (연세대학교 컴퓨터과학과) ;
이원석 (연세대학교 컴퓨터과학과)

Published : 2008.08.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Data cube, which is multi-dimensional data model, have been successfully applied in many cases of multi-dimensional data analysis, and is still being researched to be applied in data stream analysis. Data stream is being generated in real-time, incessant, immense, and volatile manner. The distribution characteristics of data arc changing rapidly due to those characteristics, so the primary rule of handling data stream is to check once and dispose it. For those characteristics, users are more interested in high support attribute values observed rather than the entire attribute values over data streams. This paper propose dynamic data cube for applying data cube to data stream environment. Dynamic data cube specify user's interested area by the support ratio of attribute value, and dynamically manage the attribute values by grouping each other. By doing this it reduce the memory usage and process time. And it can efficiently shows or emphasize user's interested area by increasing the granularity for attributes that have higher support. We perform experiments to verify how efficiently dynamic data cube works in limited memory usage.

OLAP의 다차원 데이타 모델인 데이타 큐브는 많은 다차원 데이타 분석에 성공적으로 적용되었으며, 데이타 스트림 분석에도 적용하려는 많은 연구가 진행되고 있다. 데이타 스트림은 실시간에 지속적으로 방대하게 생성되며, 데이타의 분포적 특성이 빠르게 변한다는 특징을 가지며, 제한된 메모리 및 처리능력 때문에 한번만 검사하여 처리하는 것을 기본으로 한다. 때문에 데이타 스트림을 메모리에 모두 저장하는 것은 불가능하다. 또한 사용자는 모든 속성 값에 대하여 관심을 두기보다는 일정 지지율 이상을 가진 속성 값에 더욱 관심을 가지게 된다. 본 논문에서는 이러한 데이타 스트림 환경에서 데이타 큐브를 효과적으로 적용하기 위한 동적 데이타 큐브를 제안한다. 동적 데이타 큐브는 속성 값의 지지율에 따라 사용자 관심 영역을 지정하고, 속성 값을 동적으로 그룹화하여 관리한다. 이를 통해 메모리 및 처리시간을 절약하게 된다. 또한 동적으로 지지율이 높은 속성에 대한 분석 상세도를 높여주기 때문에 사용자의 관심영역을 효과적으로 보여준다. 마지막으로 실험을 통하여 제한된 메모리에서 동적 데이타 큐브가 효율적으로 동작함을 검증하였다.

Keywords

References

Inmon, W.H., Building the Data Warehouse, John Wiley, 1992
The OLAP Council., "MD-API the OLAP Application Program Interface Version 0.5 Specification," 1996
Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi, "Modeling multidimensional database. In Proc.," the 13th Intl conference on Data Engineering, Birmingham, U.K., pp. 232-243, 1997
S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology," SIGMOD Record, Vol.26, pp. 65-74, 1997 https://doi.org/10.1145/248603.248616
M. Garofalakis, J. Gehrke, and R. Rastogi., "Querying and Mining Data Streams: You Only Get One Look," In tutorial notes of the 28th International Conference on Very Large Data Bases, TUTORIAL SESSION: Tutorial 1, pp. 635-635, 2002
Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, Y. Dora Cai, "Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams," Distributed and Parallel Databases, Vol.18, No.2, pp. 173-197, 2005 https://doi.org/10.1007/s10619-005-3296-1
George Colliat, "OLAP, relational and multidimensional database systems," ACM SIGMOD Record, Vol.25, No.3, pp. 64-69, 1995
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multi-dimensional regression analysis of time-series data streams," Proceedings of the 28th international conference on VLDB, pp. 323-334, 2002
Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang. "Efficient computation of iceberg cubes with complex measures," ACM SIGMOD Record, Vol.30, No.2, pp. 1-12, 2001 https://doi.org/10.1145/376284.375664
V. Harinarayan, A. Rajaraman, and J.D. Ullman, "Implementing data cubes efficiently," ACM SIGMOD Record, Vol.25, No.2, pp. 205-216, 1996 https://doi.org/10.1145/235968.233333
K. Beyer and R. Ramakrishnan, "Bottom-up computation of sparse and iceberg cubes," ACM SIGMOD Record, Vol.28, No.2, pp. 359-370, June 1999 https://doi.org/10.1145/304181.304214
Z. Shao, J. Han, and D. Xin, "MM-Cubing: Computing iceberg cubes by factorizing the lattice space," Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 213-222, June 2004
D. Xin, J. Han, X. Li, and B.W. Wah, "Star- cubing: Computing iceberg cubes by top-down and bottom-up integration," Proceedings of the 29th international conference on Very large data bases, Vol.29, pp. 476-487, 2003
Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang, "Efficient Computation of Iceberg Cubes with Complex Measures," SIGMOD Conference, Vol.30, No.2, pp. 1-12, 2001 https://doi.org/10.1145/376284.375664
J. Gray, S.Chaudhuri, A.Bosworth, A.Layman, D. Reichart, M.Venkatrao, "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," Data Mining and Knowledge Discovery, Vol.1, pp. 29-53, 1997 https://doi.org/10.1023/A:1009726021843
M.E.J. NEWMAN, "Power laws, Pareto distributions and Zipf's law," Contemporary Physics, Vol.46, No.5, pp. 323-351, 2005 https://doi.org/10.1080/00107510500052444

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Dynamic Data Cubes Over Data Streams

데이타 스트림에서 동적 데이타 큐브

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)