DOI QR코드

DOI QR Code

Efficient Processing of an Aggregate Query Stream in MapReduce

맵리듀스에서 집계 질의 스트림의 효율적인 처리 기법

  • 최현진 (숙명여자대학교 컴퓨터과학부) ;
  • 이기용 (숙명여자대학교 컴퓨터과학부)
  • Received : 2013.12.10
  • Accepted : 2014.01.14
  • Published : 2014.02.28

Abstract

MapReduce is a widely used programming model for analyzing and processing Big data. Aggregate queries are one of the most common types of queries used for analyzing Big data. In this paper, we propose an efficient method for processing an aggregate query stream, where many concurrent users continuously issue different aggregate queries on the same data. Instead of processing each aggregate query separately, the proposed method processes multiple aggregate queries together in a batch by a single, optimized MapReduce job. As a result, the number of queries processed per unit time increases significantly. Through various experiments, we show that the proposed method improves the performance significantly compared to a naive method.

맵리듀스는 빅데이터 분석 및 처리에 널리 사용되는 프로그래밍 모델이다. 빅데이터 분석을 위해 흔히 사용되는 질의 중 하나는 집계 질의(aggregate query)이다. 본 논문에서는 여러 사용자가 동시에 여러 집계 질의를 계속해서 요청하는 경우, 맵리듀스를 사용하여 이들 질의를 효율적으로 처리하는 방법을 제안한다. 제안 방법은 각 집계 질의를 개별적으로 처리하지 않고, 여러 집계 질의를 묶어 하나의 최적화된 맵리듀스 잡(job)으로 만들어 일괄 처리한다. 그 결과로 제안 방법은 단순 방법에 비해 시간당 처리하는 질의 수를 크게 증가시킨다. 성능 평가를 통해, 제안 방법은 단순 방법에 비해 질의 처리 속도를 크게 향상시킴을 보인다.

Keywords

References

  1. http://en.wikipedia.org/wiki/Big_data
  2. Mark Beyer, "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data," Gartner, 2011.
  3. http://hadoop.apache.org/
  4. Jeffrey Dean, Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters," In Proceedings of OSDI '04, pp.137-150, 2004.
  5. Hyunjean Choi, Ki Yong Lee, "Efficient Processing of an Aggregate Query Stream in MapReduce," Korea Information Processing Society Fall Conference, Nov., 2013.
  6. Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian, "A comparison of join algorithms for log processing in MaPreduce," In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp.975-986, 2010.
  7. Alper Okcan, Mirek Ridedewald, "Processing Theta-Joins using MapReduce," In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp.949-960, 2011.
  8. Foto N. Afrati, Jeffrey D. Ullman, "Optimizing Multiway Joins in a Map-Reduce Environment," IEEE Transactions on Knowledge and Data Engineering, Vol.23, No.9, pp.1282-1298, 2011. https://doi.org/10.1109/TKDE.2011.47
  9. Hive [Internet], http://hive.apache.orge/
  10. HadoopDB [Internet], http://db.cs.yale.edu/hadoopdb/hadoop db.html.
  11. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, Russell Sears, "MapReduce Online," In Proceedings of NSDI, Vol.10, No.4, p.20, 2010.
  12. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, Rafael Pasquini, "Incoop: MapReduce for Incremental Computations," In Proceedings of SOCC'11, 2011.
  13. Dionysios Logothetis, Chris Trezzo, Kevin C. Webb, Kenneth Yocum, "In-situ MapReduce for log processing," In Proceedings of USENIXATC'11, 2011.
  14. Wang Lam, Lu Liu, Sts Prasad, Anand Rajaraman, Zoheb Vacheri, AnHai Doan, "Muppet: MapReduce-style processing of fast data," In Proceedings of the VLDB Endowment, Vol.5, Issue 12, pp.1814-1825, 2012. https://doi.org/10.14778/2367502.2367520
  15. Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, Nick Koudas, "MRShare: sharing across multiple queries in MapReduce," In Proceedings of the VLDB Endowment, Vol.3, Issue 1-2, pp.494-505, 2010.
  16. Guoping Wang, Chee-Yong Chan, "Multi-Query Optimization in MapReduce Framework," In Proceedings of the VLDB Endowment, Vol.7, No.3, pp.145-156, 2013.
  17. Cloudera Impala [Internet], http://www.cloudera.com/content /cloudera/en/products-and-services/cdh/impala.html.
  18. Tajo [Internet], http://tajo.incubator.apache.org/
  19. Dmitri V. Kalashnikov, Sunil Prabhakar, Susanne E. Hambrusch, "Main Memory Evaluation of Monitoring Queries Over Moving Objects," Distributed and Parallel Databases, Vol.15, No.2, pp.117-135, 2004. https://doi.org/10.1023/B:DAPD.0000013068.25976.88
  20. Xiaohui Yu , Ken Q. Pu , Nick Koudas, "Monitoring k-Nearest Neighbor Queries Over Moving Objects," In Proceedings of the 21st International Conference on Data Engineering, pp.631-642, 2005.
  21. Amazon Elastic Compute Cloud(Amazon EC2) [Internet], http://aws.amazon.com/ec2/

Cited by

  1. Implementation of a DB-Based Virtual File System for Lightweight IoT Clouds vol.3, pp.10, 2014, https://doi.org/10.3745/KTCCS.2014.3.10.311