DOI QR코드

DOI QR Code

Mixed effects least squares support vector machine for survival data analysis

생존자료분석을 위한 혼합효과 최소제곱 서포트벡터기계

  • 황창하 (단국대학교 정보통계학과) ;
  • 심주용 (인제대학교 데이터정보학과)
  • Received : 2012.06.14
  • Accepted : 2012.07.09
  • Published : 2012.07.31

Abstract

In this paper we propose a mixed effects least squares support vector machine (LS-SVM) for the censored data which are observed from different groups. We use weights by which the randomly right censoring is taken into account in the nonlinear regression. The weights are formed with Kaplan-Meier estimates of censoring distribution. In the proposed model a random effects term representing inter-group variation is included. Furthermore generalized cross validation function is proposed for the selection of the optimal values of hyper-parameters. Experimental results are then presented which indicate the performance of the proposed LS-SVM by comparing with a standard LS-SVM for the censored data.

최소제곱 서포트벡터기계 (least squares support vector machine)는 분류 및 비선형 회귀분석에서 유용하게 사용되고 있는 통계적 기법이다. 본 논문에서는 각 집단별로 생존자료가 관측된 경우 적용할 수 있는 LS-SVM을 제안한다. 제안된 모형은 임의우측 중도절단자료를 비선형 회귀모형에 적용할 수 있게 Kaplan- Meier의 중도절단분포의 추정값을 이용하여 구해진 가중값을 사용하고, 집단 간의 변동을 나타내기 위하여 임의효과항을 포함한다. 벌칙상수와 커널모수의 최적값을 구하기 위하여 일반화 교차타당성함수가 사용되고 모의실험에서는 임의효과항을 포함하지 않은 LS-SVM과 성능을 비교함으로써 제안된 방법의 우수성을 보이기로 한다.

Keywords

References

  1. Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, 429-436. https://doi.org/10.1093/biomet/66.3.429
  2. Cox, D. R. (1972). Regression models과 life tables. Journal of the Royal Statistical Society, 34, 187-202.
  3. Craven, P. andWahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerishe Mathematik, 31, 377-390.
  4. Cristianini, N. and Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernelbased learning methods, Cambridge University Press, Cambridge.
  5. Davidian, M. and Giltinan, D. M. (1995). Nonlinear models for repeated measurement data, Chapman and Hall, London.
  6. Green, P. and Silverman, B. W. (1994). Nonparametric regression and generalized linear models, Chapman and Hall, London.
  7. Gunn, S. R. (1998). Support vector machines for classification and regression, Technical Report, Deptment of Electronics and Computer Science, Southampton University.
  8. Harville, D. A. (1976). Extension of the Gauss-Markov theorem to include the estimation of random effects. Annals of Statistics, 4, 384-395. https://doi.org/10.1214/aos/1176343414
  9. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of American Statistical Association, 72, 320-340. https://doi.org/10.1080/01621459.1977.10480998
  10. Hwang, C. and Shim, J. (2011). Cox proportional hazard model with L1 penalty. Journal of the Korean Data & Information Society, 22, 613-618.
  11. Jo, D. H., Shim, J. and Seok, K. H. (2010). Doubly penalized kernel method for heteroscedastic autoregressive data. Journal of the Korean Data & Information Society, 21, 155-162.
  12. Kalbfleisch, J. D. and Prentice, R. L. (1980). The statistical analysis of failure time data, John Wiley & Sons Inc., New York.
  13. Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations, Journal of American Statistical Association, 53, 457-481. https://doi.org/10.1080/01621459.1958.10501452
  14. Kim, M., Park, H., Hwang, C and Shim, J. (2008). Claims reserving via kernel machine. Journal of the Korean Data &Information Society, 19, 1419-1427.
  15. Koul, H., Susarla, V. and Van Ryzin J. (1981). Regression analysis with randomly right censored data. Annal of Statistics, 9, 1276-1288. https://doi.org/10.1214/aos/1176345644
  16. Laird, N. M. and Ware, J. H. (1982). Random effects models for longitudinal data. Biometrics, 56, 89-97.
  17. McCulloch, C, E. and Searle, S. R. (2000). Generalized, linear, and mixed models, John Wiley and Sons, New York.
  18. Mercer, J. (1909). Functions of positive and negative type and their connection with theory of integral equations. Philosophical Transactions of Royal Society A, 415-446.
  19. Miller, R. G. (1981). Survival analysis, Wiley, New York.
  20. Moulton, L. H. and Dibley, M. J. (1997). Multivariate time-to-event models for studies of recurrent childhood diseases. International Journal of Epidemiology, 26, 1334-1339. https://doi.org/10.1093/ije/26.6.1334
  21. Saunders, C., Gammerman, A. and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proceedings of 15th International Conference on Machine Learning, Madison, WI, 515-521.
  22. Scholkopf, B. and Smola, A. (2002). Learning with kernels-support vector machines, regularization, optimization and beyond, Cambridge, MA, MIT Press.
  23. Shim, J. and Lee, J. T. (2009). Kernel method for autoregressive data. Journal of the Korean Data & Information Society, 20, 467-472.
  24. Suykens, J. A. K. and Vandewalle, J. (1999). Least square support vector machine classifier. Neural Processing Letters, 9, 293-300. https://doi.org/10.1023/A:1018628609742
  25. Vonesh, E. F. and Chinchilli, V. M. (1996). Linear and nonlinear models for the analysis of repeated measurements, Marcel Dekker, New York.
  26. Vapnik, V. N. (1995). The nature of statistical learning theory, Springer, New York.
  27. Vapnik, V. N. (1998). Statistical learning theory, John Wiley, New York.
  28. Wahba, G. (1990). Spline models for observational data, BMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM, Philadelphia.

Cited by

  1. Expected shortfall estimation using kernel machines vol.24, pp.3, 2013, https://doi.org/10.7465/jkdi.2013.24.3.625
  2. Classification of universities in Daegu·Gyungpook by support vector cluster analysis vol.24, pp.4, 2013, https://doi.org/10.7465/jkdi.2013.24.4.783
  3. Intergenerational economic mobility in Korea using a quantile regression analysis vol.25, pp.4, 2014, https://doi.org/10.7465/jkdi.2014.25.4.715
  4. Generating of Pareto frontiers using machine learning vol.24, pp.3, 2013, https://doi.org/10.7465/jkdi.2013.24.3.495
  5. Generalized kernel estimating equation for panel estimation of small area unemployment rates vol.24, pp.6, 2013, https://doi.org/10.7465/jkdi.2013.24.6.1199
  6. The Effects of Maternity and Parental Leave and Childcare Leave Usability on Childbirth: Focusing on New-Married Women vol.96, pp.None, 2012, https://doi.org/10.33949/tws.2018.96.1.003
  7. Student Loan and Marriage: Focusing on Female Graduates from 4-Year Universities vol.38, pp.1, 2012, https://doi.org/10.15709/hswr.2018.38.1.520