A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls

Kim, Jae-Hee;Ko, Yoon-Sil;

doi:10.5351/KJAS.2009.22.4.745

The Korean Journal of Applied Statistics (응용통계연구)

Volume 22 Issue 4
/
Pages.745-758
/
2009
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls

군집분석 비교 및 한우 관능평가데이터 군집화

Kim, Jae-Hee (Department of Statistics, Duksung Women's University) ;
Ko, Yoon-Sil (Department of Statistics, Duksung Women's University)

김재희 (덕성여자대학교 정보 통계학과) ;
고윤실 (덕성여자대학교 정보 통계학과)

Published : 2009.08.31

https://doi.org/10.5351/KJAS.2009.22.4.745 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

자발적인 군집을 유도하는 다변량 통계기법으로 널리 사용되는 군집분석은 데이터에 기반한 탐색적 방법으로 쓰이며 군집원칙에 따라 여러 가지 방법이 제안되어 왔다. 또한 군집화된 결과에 대하여 유효성을 측정하는 측도도 다양한방법이 개발되었다. 본 연구에서는 계층적 군집분석 방법으로 최장연결법과 Ward의 방법, 비계층적 군집분석 방법으로 K-평균법 그리고 확률분포정보를 활용한 모형기반 군집분석방법을 이용하여 모의실험으로 군집분석을 실시하고 군집유효성 측도로는 연결성, Dunn 지수, 실루엣을 구하여 각 군집방법에 대해 유효성을 비교한다. 또한, 한우 관능평가 데이터에 군집분석을 적용하여 최적의 군집 상황을 구하고자 한다.

Keywords

References

Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and Non-Gaussian clustering, Biometrics, 49, 803-821 https://doi.org/10.2307/2532201
Brock, G., Pihur, V., Datta, S. and Datta, S. (2008). clValid: An R package for cluster validation, Journal of Statistical Software, 25, 1-21
Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics, 19, 459-466 https://doi.org/10.1093/bioinformatics/btg025
Dunn (1974). Well-separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104 https://doi.org/10.1080/01969727408546059
Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method-answers via model-based cluster analysis, Computation Journal, 41, 578-588
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631 https://doi.org/10.1198/016214502760047131
Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis, Bioinformatics, 21, 3201-3212 https://doi.org/10.1093/bioinformatics/bti517
Hartigan, J. A. and Wong, M. A. (1979). K-means clustering algorithm, Applied Statistics, 28, 100-108 https://doi.org/10.2307/2346830
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York
Pollard, D. (1981). Strong consistency of K-means clustering, Annals of Statistics, 9, 135-140 https://doi.org/10.1214/aos/1176345339
Pollard, D. (1982). Central limit theorems for K-means clustering, Annals of Statistics, 10, 919-926
Rousseeuw, P. J. (1987). Silhouettes: Graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65 https://doi.org/10.1016/0377-0427(87)90125-7
Scott, A. J. and Symons, M. (1971). Clustering methods based on likelihood ratio criteria, Biometrics, 27, 387-397 https://doi.org/10.2307/2529003
Ward, Jr., J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236-244 https://doi.org/10.2307/2282967
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001). Model-based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987 https://doi.org/10.1093/bioinformatics/17.10.977

Cited by

Gene Screening and Clustering of Yeast Microarray Gene Expression Data vol.24, pp.6, 2011, https://doi.org/10.5351/KJAS.2011.24.6.1077

The Korean Journal of Applied Statistics (응용통계연구)

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls

군집분석 비교 및 한우 관능평가데이터 군집화

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)