DOI QR코드

DOI QR Code

Data Clustering Method Using a Modified Gaussian Kernel Metric and Kernel PCA

  • Lee, Hansung (SW.Content Research Laboratory, ETRI) ;
  • Yoo, Jang-Hee (SW.Content Research Laboratory, ETRI) ;
  • Park, Daihee (Department of Computer and Information Science, Korea University)
  • Received : 2013.06.06
  • Accepted : 2013.11.25
  • Published : 2014.06.01

Abstract

Most hyper-ellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complex-shaped clusters. The proposed methods lead to a significant improvement in the clustering results over K-means algorithm, fuzzy C-means algorithm, GMM-EM algorithm, and HEC algorithm based on minimum-volume ellipsoids using Mahalanobis distance.

Keywords

References

  1. A.K. Jain, "Data Clustering: 50 Years Beyond K-means," Pattern Recognition Lett., vol. 31, no. 8, June 2010, pp. 651-666. https://doi.org/10.1016/j.patrec.2009.09.011
  2. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2nd ed., 2007.
  3. J. Mao and A.K. Jain, "A Self-Organizing Network for Hyperellipsoidal Clustering (HEC)," IEEE Trans. Neural Netw., vol. 7, no. 1, Jan. 1996, pp. 16-29. https://doi.org/10.1109/72.478389
  4. W. Song et al., "The Hyperellipsoidal Clustering Using Genetic Algorithm," IEEE Int. Conf. Intell. Process. Syst., Beijing, China, Oct. 28-31, 1997, pp. 592-596.
  5. H. Ichihashi, M. Ohue, and T. Miyoshi, "Fuzzy C-Means Clustering Algorithm with Pseudo Mahalanobis Distances," Proc. Third Asian Fuzzy Syst. Symp., Changwon, Rep. of Korea, June 18-21, 1998, pp. 148-152.
  6. M. Moshtaghi et al., "An Efficient Hyperellipsoidal Clustering Algorithm for Resource-Constrained Environments," Pattern Recogn., vol. 44, no. 9, Sept. 2011, pp. 2197-2209. https://doi.org/10.1016/j.patcog.2011.03.007
  7. W. Song et al., "Comments on a Self-Organizing Network for Hyper-ellipsoidal Clustering (HEC)," IEEE Trans. Neural Netw., vol. 8, no. 6, Nov. 1997, pp. 1561-1563. https://doi.org/10.1109/72.641479
  8. R. Krishnapuram and J. Kim, "A Clustering Algorithm Based on Minimum Volume," IEEE Int. Conf. Fuzzy Syst., vol. 2, New Orleans, LA, USA, Sept. 8-11, 1996, pp. 1387-1392.
  9. H. Lee, J. Park, and D. Park, "Hyper-ellipsoidal Clustering Algorithm Using Linear Matrix Inequality," J. Korea Institute Intell. Syst., vol. 12, no. 4, Aug. 2002, pp. 300-305. https://doi.org/10.5391/JKIIS.2002.12.4.300
  10. M. Kumar and J.B. Orlin, "Scale-Invariant Clustering with Minimum Volume Ellipsoids," Comput. Operations. Res., vol. 35, no. 4, Apr. 2008, pp. 1017-1029. https://doi.org/10.1016/j.cor.2006.07.001
  11. R. Shioda and L. Tuncel, "Clustering via Minimum Volume Ellipsoids," Comput. Optim. Appl., vol. 37, no. 3, July 2007, pp. 247-295. https://doi.org/10.1007/s10589-007-9024-1
  12. S. Boyd and L. Vandenberghe, Convex Optimization, 1st ed., Cambridge, UK: Cambridge University Press, 2004.
  13. M.J. Todd and E.A. Yildirim, "On Khachiyan's Algorithm for the Computation of Minimum-Volume Enclosing Ellipsoids," Discr. Appl. Math., vol. 155, no. 13, Aug. 15, 2007, pp. 1731-1744. https://doi.org/10.1016/j.dam.2007.02.013
  14. N. Moshtagh, Minimum Volume Enclosing Ellipsoids, Tech. report, the School of Engineering and Applied Science, Univ. Pennsylvania, PA, 2005.
  15. P. Kumar and E.A. Yildirim, "Minimum Volume Enclosing Ellipsoids and Core Set," J. Optim. Theory Appl., vol. 126, no. 1, July 2005, pp. 1-21. https://doi.org/10.1007/s10957-005-2653-6
  16. J. Cao et al., "A Max-Flow-Based Similarity Measure for Spectral Clustering," ETRI J., vol. 35, no. 2, Apr. 2013, pp. 311-320. https://doi.org/10.4218/etrij.13.0112.0520
  17. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge, UK: Cambridge University Press, 2004.
  18. B. Scholkopf, A. Smola, and K.-R. Muller, "Kernel Principal Component Analysis," Proc. ICANN, LNCS, Lausanne, Switzerland, vol. 1327, Oct. 8-10, 1997, pp. 583-588.
  19. D. Tax and P. Juszczak, "Kernel Whitening for One-Class Classification," Proc. Pattern Recogn. Support Vector Mach., Niagara Fall, Canada, vol. 2388, Aug. 10, 2002, pp. 40-52.
  20. CVX Research, Inc. CVX: Matlab Software for Disciplined Convex Programming, version 2.0. Accessed Apr. 2013. http://cvxr.com/cvx
  21. M. Grant and S. Boyd, "Graph Implementations for Nonsmooth Convex Programs," Recent Advances Learning Contr. LNCIS, vol. 371, 2008, pp. 95-110.
  22. D. Cai, X. He, and J. Han, "Document Clustering Using Locality Preserving Indexing," IEEE Trans. KDE, vol. 17, no. 12, Dec. 2005, pp. 1624-1637.
  23. The machine learning dataset of UCI are available. Accessed Jan. 2013. http://archive.ics.uci.edu/ml/

Cited by

  1. Wind Power Pattern Forecasting Based on Projected Clustering and Classification Methods vol.37, pp.2, 2014, https://doi.org/10.4218/etrij.15.2314.0070
  2. Parking Space Recognition for Autonomous Valet Parking Using Height and Salient-Line Probability Maps vol.37, pp.6, 2014, https://doi.org/10.4218/etrij.15.0114.0112
  3. Affine-transformation invariant clustering models vol.7, pp.None, 2014, https://doi.org/10.1186/s40488-020-00111-y