Variable selection in L1 penalized censored regression

  • Hwang, Chang-Ha (Department of Statistics, Dankook University) ;
  • Kim, Mal-Suk (Division of Computer Technology, Yeungnam College of Science & Technology) ;
  • Shi, Joo-Yong (Department of Data Science, Inje University)
  • Received : 2011.07.11
  • Accepted : 2011.08.10
  • Published : 2011.10.01

Abstract

The proposed method is based on a penalized censored regression model with L1-penalty. We use the iteratively reweighted least squares procedure to solve L1 penalized log likelihood function of censored regression model. It provide the efficient computation of regression parameters including variable selection and leads to the generalized cross validation function for the model selection. Numerical results are then presented to indicate the performance of the proposed method.

Keywords

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transaction on Automatic Control, 19, 716-723. https://doi.org/10.1109/TAC.1974.1100705
  2. Bair, E. and Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gGene expression data. PLoS Biology, 2, 511-522.
  3. Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, 429-436. https://doi.org/10.1093/biomet/66.3.429
  4. Cho, D. H., Shim, J. and Seok, K. H. (2010). Doubly penalized kernel method for heteroscedastic autoregressive data. Journal of the Korean Data & Information Science Society, 21, 155-162.
  5. Cox, D. R. (1972) Regression models and life tables (with discussions). Journal of the Royal Statistical Society B, 74, 187-220.
  6. Gehan, E. A. (1965). Generalized Wilcoxon test for comparing arbitrarily singe-censored samples. Biometrika, 52, 202-223.
  7. Ghosh, K. S. and Ghosal, S. (2006). Semiparametric accelerated failure time models for censored data. Bayesian Statistics and Its Applications, 15, 213-229.
  8. Hu, S. and Rao, J. S. (2010). Sparse penalization with censoring constraints for estimating high dimensional AFT models with applications to microarray data analysis, Technical Report 07 of Division of Biostatistics, Case Western Reserve University, OH.
  9. Huang, J., Ma, S. and Xie, H. (2005). Regularized estimation in the accelerated failure time model with high dimensional covariates, Technical Report No. 349, Department of Statistics and Actuarial Science, The University of Iowa, IA.
  10. Hwang, C. and Shim, J. (2010). Semiparametric support vector machine for accelerated failure time model. Journal of the Korean Data & Information Science Society, 21, 467-477.
  11. Jin, Z., Lin, D. Y., Wei, L. J. and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341-353. https://doi.org/10.1093/biomet/90.2.341
  12. Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of American Statistical Association, 53, 457-481. https://doi.org/10.1080/01621459.1958.10501452
  13. Koul, H., Susarla, V. and Van Ryzin, J. (1981). Regression analysis with randomly right censored data. The Annals of Statistics, 9, 1276-1288. https://doi.org/10.1214/aos/1176345644
  14. Krishnapuram, B., Carlin, L., Figueiredo, M. A. T. and Hartermink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 957-968. https://doi.org/10.1109/TPAMI.2005.127
  15. Li, H. (2006). Censored data regression in high-dimension and low-sample size settings for genomic applications, UPenn Biostatistics Working Paper 9, University of Pennsylvania, PA.
  16. Orbe, J., Ferreira, E. and Nunez-Anton, V. (2003). Censored partial regression. Biostatistics, 4, 109-121. https://doi.org/10.1093/biostatistics/4.1.109
  17. Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane J. M. and et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine, 346, 1937-1947. https://doi.org/10.1056/NEJMoa012914
  18. Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistical Medicine, 11, 2093-2099. https://doi.org/10.1002/sim.4780111607
  19. Shim, J. (2005). Censored kernel ridge regression. Journal of the Korean Data & Information Science Society, 16, 1045-1052.
  20. Shim, J. and Lee, J. T. (2009). Kernel method for autoregressive data. Journal of the Korean Data & Information Science Society, 20, 467-472.
  21. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
  22. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
  23. Williams, P. M. (1995). Bayesian regularization and pruning using a Laplace prior. Neural Computation, 7, 117-143. https://doi.org/10.1162/neco.1995.7.1.117
  24. Zhou, M. (1992). M-estimation in censored linear models. Biometrika, 79, 837-841. https://doi.org/10.1093/biomet/79.4.837
  25. Zhou, M. (1998). Regression with censored data : The synthetic data and least squares approach, Technical Report 374, University of Kentucky, KY.