The Effect of Bias in Data Set for Conceptual Clustering Algorithms

Lee, Gye Sung;

doi:10.7236/IJASC.2019.8.3.46

International journal of advanced smart convergence

Volume 8 Issue 3
/
Pages.46-53
/
2019
/
2288-2847(pISSN)
/
2288-2855(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

Lee, Gye Sung (Department of Software, Dankook University)

Received : 2019.07.04
Accepted : 2019.07.13
Published : 2019.09.30

https://doi.org/10.7236/IJASC.2019.8.3.46 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

Keywords

References

I.H. Witten, E. Frank, M Hall, and C. Palestro, "Data Mining: Practical Machine Learning Tools and Techniques," Elsevier Science & Technology, pp. 9-33, 2017.
T. Mitchell, Machine Learning, McGraw-Hill Education, 1997.
P. Berkhin, "A Survey of Clustering Data Mining Techniques," Grouping Multidimensional Data, Springer, Berlin, Heidelberg, pp. 25-71, 2006. DOI:https://doi.org/10.1007/3-540-28349-8_2
U. Luxburg, "Clustering Stability: An Overview," Foundations and Trends in Machine Learning," Vol. 2, No. 3, pp. 235-274, 2010, DOI:10.1561/2200000008
M.B. Zafar, I. Valera, M.G. Rodriguez, and K. Gummadi, "Fairness Constraints: Mechanisms for Fair Classification,"
M. Hildebrandt, "Preregistration of Machine Learning Research Design. Against P-hacking," in Being Profiled: Cogitas Ergo Sum, ed. E. Bayamlioglu, I. Baraliuc, L. Janssens, and M. Hildebrandt, Amsterdam University Press, 2018.
D. Fisher, "Knowledge acquisition via incremental conceptual clustering", Machine Learning Vol.2, pp. 139-172, 1987. DOI: htps://doi.org/10.1007/BF00114265
V. Kanageswari and A.Pethalakshmi, "A Novel Approach of Clustering Using COBWEB". International Journal of Information Technology (IJIT), Vol. 3 No. 3, pp 37-42, Jun 2017. DOI: https://doi.org/10.33144/24545414
D. Fisher, "Iterative Optimization and Simplification of Hierarchical Clustering," Journal of AI Research, Vol.4, pp. 147-179, 1996. DOI: https://doi.org/10.1613/jair.276
F. Cao, J. Liang, and L. Bai, "A New Initialization Method for Categorical Data Clustering," Expert Systems with Applications, Vol 35, Issue 7, pp. 10223-10228, Sep. 2009, DOI: https://doi.org/10.1016/j.eswa.2009.01.060
G. Biswas, J.B. Weinberg, and D. Fisher, "ITERATE: A Conceptual Clustering Algorithm for Data Mining," IEEE Tr. on Systems, Man and Cybernetics, Vol. 28, Part C No. 2. 1998. DOI: https://doi.org/10.1109/5326.669556
UCI repository, https://archive.ics.uci.edu

International journal of advanced smart convergence

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)