DOI QR코드

DOI QR Code

Sample-spacing Approach for the Estimation of Mutual Information

SAMPLE-SPACING 방법에 의한 상호정보의 추정

  • Huh, Moon-Yul (Dept. of Statistics, Sungkyunkwan University) ;
  • Cha, Woon-Ock (Dept. of Multimedia Engineering, Hansung University)
  • 허문열 (성균관대학교 통계학과) ;
  • 차운옥 (한성대학교 공과대학 멀티미디어공학과)
  • Published : 2008.04.30

Abstract

Mutual information is a measure of association of explanatory variable for predicting target variable. It is used for variable ranking and variable subset selection. This study is about the Sample-spacing approach which can be used for the estimation of mutual information from data consisting of continuous explanation variables and categorical target variable without estimating a joint probability density function. The results of Monte-Carlo simulation and experiments with real-world data show that m = 1 is preferable in using Sample-spacing.

상호정보(mutual information: MI)는 설명변수의 목적변수에 대한 예측정도를 나타내는 척도로서, 목적변수에 대한 설명 변수의 중요도 순위를 구하거나 목적 변수를 잘 설명해주는 설명변수의 집합을 구하는 변수선택문제에 유용하게 사용된다. 본 논문에서는 연속형 설명변수와 범주형 목적변수로 구성된 데이터로부터 결합확률분포를 추정하지 않고도 MI 추정량을 구할 수 있는 Sample-spacing 방법에 대한 연구를 수행하였다. 몬테 칼로 모의 실험과 실제데이터에 대한 실험결과, MI 추정을 위해 Sample-spacing 방법을 사용할 때 m = 1을 사용하면 충분히 신뢰할만한 결과를 얻을 수 있다는 것을 알 수 있었다.

Keywords

References

  1. Ahmad, I. A. and Lin, P. E. (1976). A nonparametric estimation of the entropy for ab-solutely continuous distribution, IEEE Transactions on Information Theory, 22, 372-375 https://doi.org/10.1109/TIT.1976.1055550
  2. Ahmed, N. A. and Gokhale, D. V. (1989). Entropy expressions and their estimators for multivariate distribution, IEEE Transactions on Information Theory, 35, 688-692 https://doi.org/10.1109/18.30996
  3. Beirlant, J., Dudewicz, E. J., Gyorfi, L. and Meulen, E. (1997). Nonparametric entropy es-timation: An overview, International Journal of Mathematical and Statistical Sciences, 6, 17-39
  4. Blake, C. and Merz, C. J. UCI machine learning repository, http://www.ics.uci.edu/ mlearn/MLRepository
  5. Brillinger, D. R. (2004). Some data analysis using mutual information, Brazilian Journal of Probability and Statistics, 18, 163-183
  6. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory, John Wiley & Sons, New York
  7. Dmitriev, G., Yu, G. and Tarasenko, F. P. (1973). On the estimation of functionals of the probability density and its derivatives, Theory of Probability and Its Applications, 18, 628-633 https://doi.org/10.1137/1118083
  8. Huh, M. Y. (2005). DAVIS(http://stat.skku.ac.kr/myhuh/DAVIS.html)
  9. Joe, H. (1989). Estimation of entropy and other functionals of a multivariate density, Annals of Institute of Statistical Mathematics, 41, 683-697 https://doi.org/10.1007/BF00057735
  10. Lazo, A. V. and Rathie, P. (1978). On the entropy of continuous probability distributions, IEEE Transactions on Information Theory, 24, 120-122 https://doi.org/10.1109/TIT.1978.1055832
  11. Miller, E. G. L. and Fisher III, J. W. (2003). ICA using spacings estimates of entropy, The Journal of Machine Learning Research, 4, 1271-1295 https://doi.org/10.1162/jmlr.2003.4.7-8.1271
  12. Mokkadem, A. (1989). Estimation of the entropy and information of absolutely continuous random variables, IEEE Transactions on Information Theory, 35, 193-196 https://doi.org/10.1109/18.42194
  13. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, London
  14. Tsybakov, A. B. and van der Meulen, E. C. (1994). Root-n consistent estimators of entropy for densities with unbounded support, Scandinavian Journal of Statistics, 23, 75-83
  15. van Hulle, M. M. (2002). Multivariate edgeworth-based entropy estimation, Neural Com-putation, 14, 1887-1906 https://doi.org/10.1162/089976602760128054