Variable selection for multiclassi cation by LS-SVM

  • Received : 2010.07.09
  • Accepted : 2010.08.28
  • Published : 2010.09.30

Abstract

For multiclassification, it is often the case that some variables are not important while some variables are more important than others. We propose a novel algorithm for selecting such relevant variables for multiclassification. This algorithm is base on multiclass least squares support vector machine (LS-SVM), which uses results of multiclass LS-SVM using one-vs-all method. Experimental results are then presented which indicate the performance of the proposed method.

Keywords

References

  1. Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X. et al. (2000). Distinct types of diffuse large celllymphoma identified by gene expression profiling. Nature, 403, 503-511. https://doi.org/10.1038/35000501
  2. Cho, D. H., Shim, J. and and Seok, K. H. (2010). Doubly penalized kernel method for heteroscedastic autoregressive data. Journal of Korean Data & Information Science Society, 21, 155-162.
  3. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M. and Downing, J. et al. (1999). Molecular classification of cancer: Classdiscovery and class prediction by gene expression monitoring. Science, 286, 531-537. https://doi.org/10.1126/science.286.5439.531
  4. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 89-422.
  5. Hwang, H. (2010). Fixed size LS-SVM for multiclassification problems of large data sets. Journal of Korean Data & Information Science Society, 21, 1561-567.
  6. Khan, J., Bittner, M. L., Saal, L. H., Teichmann, U., Azorsa, D. O., Gooden, G. C., Pavan, W. J., Trent, J. M. and Meltzer, P.S. (1999). cDNA microarrays detect activation of a myogenic transcription program by the PAX3-FKHR fusion oncogene. Proceedings of the National Academy of Sciences, 96, 13264-13269. https://doi.org/10.1073/pnas.96.23.13264
  7. Kimeldorf, G. S. and Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and its Applications, 33, 82-95. https://doi.org/10.1016/0022-247X(71)90184-3
  8. Koo, J. Y., Sohn, I., Kim, S. and Lee, J. W. (2006). Structured polychotomous machine diagnosis of multiple cancer types using gene expression. Bioinformatics, 22, 950-990. https://doi.org/10.1093/bioinformatics/btl029
  9. Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, A, 415-446.
  10. Pomeroy, S., Tamayo, P., Gaasenbeek, M., Sturla, L., Angelo, M., McLaughlin, M., Kim, J., Goumnerova, L., Black, P. and Lau, C., et al. (2002). Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature, 415, 436-442. https://doi.org/10.1038/415436a
  11. Scholkopf, B., Burges, C. and Vapnik, V. (1995). Extracting support data for a given task. Proceedings of First Conference on Knowledge Discovery and Data Mining, 252-257.
  12. Shim, J., Bae, J. S. and Hwang, C. (2008). Multiclass classification via LS-SVR. Communications of the Korean Statistical Society, 15, 441-450. https://doi.org/10.5351/CKSS.2008.15.3.441
  13. Shim, J. and Lee, J. T. (2009). Kernel method for autoregressive data. Journal of Korean Data & Information Science Society, 20, 467-4720.
  14. Shim, J., Park, H. and Hwang, C. (2009a). A kernel machine for estimation of mean and volatility functions. Journal of Korean Data & Information Science Society, 20, 905-912.
  15. Shim, J., Sohn, I., Kim, S., Lee, J.W., Green, P. E. and Hwang, C. (2009b). Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine. Computational Statistics and Data Analysis, 53, 1736-1742. https://doi.org/10.1016/j.csda.2008.04.028
  16. Suykens, J. A. K. and Vandewalle, J. (1999a). Least square support vector machine classifier. Neural Processing Letters, 9, 293-300. https://doi.org/10.1023/A:1018628609742
  17. Suykens, J. A. K. and Vandewalle, J. (1999b). Multiclass least squares support vector machines. Proceeding of the International Joint Conference on Neural Networks, 900-903.
  18. Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99, 6567-6572. https://doi.org/10.1073/pnas.082099299
  19. Vapnik, V. N. (1995). The nature of statistical learning theory, Springer, New York.
  20. Vapnik, V. N. (1998). Statistical learning theory, Springer, New York.
  21. Weston, J. and Watkins, C. (1998). Multi-class SVM, Technical Report 98-04, Royal Holloway University of London.
  22. Zhang, H.H., Ahn, J., Lin, X. and Park, C. (2006). Gene selection using support vector machines with non-convex penalty. Bioinformatics, 22, 88-95. https://doi.org/10.1093/bioinformatics/bti736