DOI QR코드

DOI QR Code

Problems in Fuzzy c-means and Its Possible Solutions

Fuzzy c-means의 문제점 및 해결 방안

  • 허경용 (동의대학교 영상미디어센터) ;
  • 서진석 (동의대학교 게임공학과) ;
  • 이임건 (동의대학교 영상정보공학과)
  • Received : 2010.10.07
  • Accepted : 2010.11.08
  • Published : 2011.01.31

Abstract

Clustering is one of the well-known unsupervised learning methods, in which a data set is grouped into some number of homogeneous clusters. There are numerous clustering algorithms available and they have been used in various applications. Fuzzy c-means (FCM), the most well-known partitional clustering algorithm, was established in 1970's and still in use. However, there are some unsolved problems in FCM and variants of FCM are still under development. In this paper, the problems in FCM are first explained and the available solutions are investigated, which is aimed to give researchers some possible ways of future research. Most of the FCM variants try to solve the problems using domain knowledge specific to a given problem. However, in this paper, we try to give general solutions without using any domain knowledge. Although there are more things left than discovered, this paper may be a good starting point for researchers newly entered into a clustering area.

클러스터링은 주어진 데이터 집합을 균일한 특성을 가지는 몇 개의 그룹으로 묶는 대표적인 비교사 학습 방법 중 하나로 지금까지 다양한 형태의 알고리듬이 개발되어 다양한 응용 분야에서 사용되어 왔다. 이 중 fuzzy c-means (FCM)는 분할 기반의 클러스터링 기법에 속하는 알고리듬으로 1970년대에 정립된 이후 지금까지 사용되고 있는 대표적인 클러스터링 알고리듬 중의 하나이다. 하지만 FCM에는 여러 가지 문제점이 있으며 이를 해결하기 위해 지금까지도 다양한 FCM의 변형이 제안되고 있다. 이 논문에서는 먼저 FCM의 문제점을 살펴보고 이를 해결하기 위해 제안된 방법들을 통해 연구 방향을 제시하고자 한다. FCM의 문제점을 해결하고자 하는 대부분의 FCM 변형은 주어진 문제 영역의 지식을 활용하고 있다. 하지만 이 논문에서는 문제 영역을 한정하지 않고 모든 문제에 적용할 수 있는 일반적인 방안을 제시하는데 초점을 둔다. 제시하는 방안은 앞으로 더 많은 연구가 필요하지만 클러스터링을 연구하고자 하는 이들에게 최근의 연구 동향과 더불어 출발점을 제시할 수 있을 것으로 기대한다.

Keywords

References

  1. R. Xu and D. Wunsch, Clustering, Wiley-IEEE Press, 2008.
  2. J. B. MacQueen, "Some methods for classification and analysis of multivariate observations," Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, pp. 281-297, 1967.
  3. L. A. Zadeh, "Fuzzy sets," Information and Control vol. 8, no. 3, pp. 338-353, 1965. https://doi.org/10.1016/S0019-9958(65)90241-X
  4. E. H. Ruspini, "A new approach to clustering," Information and Control, vol. 16, pp. 22-32, 1969.
  5. J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters," Journal of Cybernetics, pp. 32-57, 1974
  6. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Springer, 1981.
  7. H. Frigui and R. Krishnapuram, "Clustering by competitive agglomeration," Pattern Recognition, vol. 30, no. 7, pp. 1109-1119, 1997. https://doi.org/10.1016/S0031-3203(96)00140-9
  8. Gyeongyong Heo, Young Woon Woo, "Extensions of X-means with Efficient Learning the Number of Clusters ," Journal of the KIMICS, Vol. 12, No. 4, pp. 772-780, 2008
  9. G. Heo and P. Gader, "Learning the Number of Gaussian Components Using Hypothesis Test," Proceedings of the 2009 International Joint Conference on Neural Networks, pp. 1206-1212, 2009.
  10. A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1-38, 1977.
  11. Z. Zhang, C. Chen, J. Sun, and K. L. Chan, "EM algorithms for Gaussian mixtures with split-and-merge operation," Pattern Recognition, vol. 36, no. 9, pp. 1973-1983, 2003. https://doi.org/10.1016/S0031-3203(03)00059-1
  12. Y. Li and L. Li, "A Novel Split and Merge EM Algorithm for Gaussian Mixture Model," Proceedings of the 5th International Conference on Natural Computation, pp. 479-483, 2009.
  13. R. N. Dave, "Characterization and detection of noise in clustering," Pattern Recognition Letters, vol. 12, no. 11, pp. 657-664, 1991. https://doi.org/10.1016/0167-8655(91)90002-4
  14. Y. Namkoong, G. Heo, and Y. W. Woo, "An Extension of Possibilistic Fuzzy C-Means with Regularization," Proceedings of the 2010 IEEE International Conference on Fuzzy Systems, pp. 696-701, 2010.
  15. A. Tikhonov, "On solving incorrectly posed problems and method of regularization," Dokl. Acad. Nauk USSR, vol. 151, pp. 501-504, 1963.
  16. G. Heo, P. Gader, and H. Frigui, "RKF-PCA: Robust Kernel Fuzzy PCA," Neural Networks, vol. 22, no. 5-6, pp. 642-650, 2009. https://doi.org/10.1016/j.neunet.2009.06.013
  17. C. F. Lin and S. D. Wang, "Fuzzy support vector machines," IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464-471, 2002. https://doi.org/10.1109/72.991432
  18. P. J. Huber and E. M. Ronchetti, Robust Statistics, 2nd edition, Wiley, 2009.
  19. R. Krishnapuram and J. M. Keller, "A Possibilistic Approach to Clustering," IEEE Transactions on Fuzzy Systems vol. 1, no. 2, pp. 98-110, 1993. https://doi.org/10.1109/91.227387
  20. N. R. Pal, K. Pal, J. M. Keller and J. C. Bezdek, "A Possibilistic Fuzzy c-Means Clustering Algorithm," IEEE Transactions on Fuzzy Systems vol. 13, no. 4, pp. 517-530, 2005. https://doi.org/10.1109/TFUZZ.2004.840099
  21. Gyeongyong Heo, Sewoon Choe, Young Woon Woo, " Improvement of the PFCM(Possibilistic Fuzzy C-Means) Clustering Method," Journal of the KIMICS, Vol. 13, No. 1, pp. 177-185, 2009.
  22. B. Feil and J. Abonyi, "Geodesic Distance Based Fuzzy Clustering," Advances in Soft Computing, vol. 39/2007, pp. 50-59, 2007. https://doi.org/10.1007/978-3-540-70706-6_5
  23. F. Fouss, A. Pirotte, J. M. Renders, and M. Saerens, "Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 355-369, 2007. https://doi.org/10.1109/TKDE.2007.46
  24. B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001.
  25. M. Girolami, "Mercer kernel-based clustering in feature space," IEEE Transactions on Neural Networks, vol. 13, no. 3, pp. 780-784, 2002. https://doi.org/10.1109/TNN.2002.1000150
  26. M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A survey of kernel and spectral methods for clustering," Pattern Recognition, vol. 41, no. 1, pp. 176- 190, 2008. https://doi.org/10.1016/j.patcog.2007.05.018
  27. J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no. 8, pp. 888-905, 2000.
  28. U. von Luxburg, "A tutorial on spectral clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007. https://doi.org/10.1007/s11222-007-9033-z
  29. Gyeongyong Heo, Kwang-Baek Kim, Young Woon Woo, "Magnifying Block Diagonal Structure for Spectral Clustering ," Journal of Korea Multimedia Society, Vol. 11, No. 9, pp. 1302-1309, 2008
  30. I. S. Dhillon, Y. Guan, and B. Kulis, "A unified view of kernel k-means, spectral clustering and graph cuts," Department of Computer Science, University of Texas, Tech. Rep. TR-04-25, 2005.
  31. M. Garey, D. Johnson, and H. Witsenhausen, "The complexity of the generalized Lloyd-Max problem," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 255-256, 1982. https://doi.org/10.1109/TIT.1982.1056488
  32. J. He, M. Lan, C. L. Tan, S. Y. Sung, and H. B. Low, "Initialization of cluster refinement algorithms: A review and comparative study," Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, pp. 297-302, 2004.
  33. A. Likas, N. Vlassis, and J. J. Verbeek, "The global k-means clustering algorithm," Pattern Recognition, vol. 36, pp. 451-461, 2003. https://doi.org/10.1016/S0031-3203(02)00060-2
  34. G. Heo and P. Gader, "An Extension of Global Fuzzy C-means Using Kernel Methods," Proceedings of the 2010 IEEE International Conference on Fuzzy Systems, pp. 690-695, 2010.
  35. X. L. Xie and G. Beni, "A validity measure for fuzzy clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991. https://doi.org/10.1109/34.85677
  36. M. Meila, "Comparing clusterings - an information based distance," Journal of Multivariate Analysis, vol. 98, no. 5, pp. 873-895, 2007. https://doi.org/10.1016/j.jmva.2006.11.013
  37. D. Pascual, F. Pla, and J. S. Sanchez, "Cluster validation using information stability measures," Pattern Recognition Letters, vol. 31, pp. 454-461, 2010. https://doi.org/10.1016/j.patrec.2009.07.009
  38. Q. Deng, Y. Luo, and J. Ge, "Dual threshold based unsupervised face image clustering," Proceedings of the 2nd International Conference on Industrial Mechatronics and Automation, pp. 436-439, 2010.
  39. D. Jiang, C. Tang, A. Zhang, "Cluster analysis for gene expression data: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1370-1386, 2004. https://doi.org/10.1109/TKDE.2004.68
  40. L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik, "Dimensionality Reduction: A Comparative Review," Tilburg University, Technical Report, TiCC-TR 2009-005, 2009.