DOI QR코드

DOI QR Code

Performance evaluation of approximate frequent pattern mining based on probabilistic technique

확률 기법에 기반한 근접 빈발 패턴 마이닝 기법의 성능평가

  • Pyun, Gwangbum (Dept. of Computer Science and Research Institute for Computer and Information Communication, Chungbuk National University) ;
  • Yun, Unil (Dept. of Computer Science and Research Institute for Computer and Information Communication, Chungbuk National University)
  • Received : 2012.10.31
  • Accepted : 2013.01.14
  • Published : 2013.02.28

Abstract

Approximate Frequent pattern mining is to find approximate patterns, not exact frequent patterns with tolerable variations for more efficiency. As the size of database increases, much faster mining techniques are needed to deal with huge databases. Moreover, it is more difficult to discover exact results of mining patterns due to inherent noise or data diversity. In these cases, by mining approximate frequent patterns, more efficient mining can be performed in terms of runtime, memory usage and scalability. In this paper, we study the characteristics of an approximate mining algorithm based on probabilistic technique and run performance evaluation of the efficient approximate frequent pattern mining algorithm. Finally, we analyze the test results for more improvement.

근접 빈발 패턴 마이닝은 향상된 효율성을 위해 정확한 패턴보다 허용되는 범위 안에서 근접 빈발 패턴을 마이닝한다. 데이터베이스의 크기가 증대함에 따라 거대한 데이터베이스를 처리하기 위해서 더 빠른 마이닝 기법이 필요하게 되고 있다. 또한, 노이지나 데이터의 다양성 때문에 패턴을 마이닝 하는 것에 대한 정확한 결과를 찾기가 더 어렵다. 이러한 경우들에 대해, 근접 빈발 패턴 마이닝을 함으로 실행시간, 메모리 사용량, 그리고 확장성의 관점에서 더 효율적인 마이닝을 수행할 수 있다. 이 논문에서는 확률 기법에 근간한 근접 패턴 마이닝 알고리즘에 대한 특성을 살펴보고 척도가 되는 확률 기법에 기반한 근접 패턴 마이닝 알고리즘에 대해 성능 평가를 한다. 최종적으로 성능의 향상을 위해 테스트 결과를 분석한다.

Keywords

References

  1. R. Agrawal and R. Srikant, "Fast Algorithms of Mining Association Rules", International conference on Very Large Data Bases(VLDB), vol. 20, pp.487-499, 1994.
  2. T. Calders, C. Garboni, B. Goethals, Approximation of Frequentness Probability of Itemsets in Uncertain Data. International Conference on Data Mining (ICDM), pp. 749-754, 2010.
  3. C. Chen, X. Yan, F. Zhu and J. Han, gApprox: Mining Frequent Approximate Patterns from a Massive Network. ICDM, pp.445-450, 2007.
  4. J. Han, J. Pei, Y. Yin and R. Mao, "Mining frequent patterns without candidate generation : a frequent pattern tree approach", Data Mining and Knowledge Discovery, vol 8, pp.53-87. 2004. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  5. J. Han, H. Cheng, D. Xin and X.Yan, Frequent pattern mining : current status and future directions, Data Mining and Knowledge Discovery(DMKD), vol.15, no.1, pp. 55-86, Aug 2007. https://doi.org/10.1007/s10618-006-0059-1
  6. C.W. Li, K.F. Jea, An adaptive approximation method to discover frequent itemsets over slidingwindow- based data streams, Expert System with Applications(ESWA) 38(10), pp.13386-13404, 2011. https://doi.org/10.1016/j.eswa.2011.04.167
  7. M. Ren, L. Guo, Mining Recent Approximate Frequent Items in Wireless Sensor Networks, Fuzzy Systems and Knowledge Discovery, pp. 463-467, 2009.
  8. P. Wong, T. Chan, M. H. Wong and K. Leung, Predicting Approximate Protein-DNA Binding Cores Using Association Rule Mining, ICDE pp.965-976, 2012.
  9. R.C. Wong and A.W. Fu, "Mining top-K frequent itemsets from data streams", Data Mining Knowledge Discovery. Vol.13, pp.193-217, 2006. https://doi.org/10.1007/s10618-006-0042-x
  10. J.X. Yu, Z. Chong, H. Lu and A. Zhou, False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams, International conference on Very Large Data Bases(VLDB) vol. 30, pp.204-215, Aug. 2004.
  11. U. Yun and K. Ryu, Approximate Weight frequent pattern mining with/without noisy environments, Knowledge-Based System, vol. 24, no. 1, pp. 73-82, Feb 2011. https://doi.org/10.1016/j.knosys.2010.07.007
  12. Y. Zhao, C. Zhang and S. Zhang, Efficient Frequent Itemsets Mining by Sampling, Advances in Intelligent IT: Active Media Technology, pp.112- 117, 2006.
  13. F. Zhu, X. Yan, J. Han and P.S. Yu, Efficient Discovery of frequent Approximate Sequential Patterns, International Conference on DataMining (ICDM), pp.751-756, Dec 2007.
  14. Frequent itemset Mining dataset repository. Availble at (http://fimi.cs.helsinki.fi/data/)

Cited by

  1. Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining vol.16, pp.2, 2015, https://doi.org/10.7472/jksii.2015.16.2.77
  2. Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports vol.14, pp.6, 2013, https://doi.org/10.7472/jksii.2013.14.6.01
  3. 슬라이딩 윈도우 기반의 스트림 하이 유틸리티 패턴 마이닝 기법 성능분석 vol.17, pp.6, 2013, https://doi.org/10.7472/jksii.2016.17.6.53