DOI QR코드

DOI QR Code

Optimization of Support Vector Machines for Financial Forecasting

재무예측을 위한 Support Vector Machine의 최적화

  • Kim, Kyoung-Jae (Department of Management Information Systems, Dongguk University_Seoul) ;
  • Ahn, Hyun-Chul (School of Management Information Systems, Kookmin University)
  • 김경재 (동국대학교_서울 경영정보학과) ;
  • 안현철 (국민대학교 경영정보학부)
  • Received : 2011.11.18
  • Accepted : 2011.12.19
  • Published : 2011.12.31

Abstract

Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Support vector machines(SVM)은 비교적 최근에 등장한 데이터마이닝 기법이지만, 재무, CRM 등의 경영학 분야에서 많이 연구되고 있다. SVM은 인공신경망과 필적할 만큼의 예측 정확도를 보이는 사례가 많았지만, 암상자로 불리는 인공신경망 모형에 비해 구축된 예측모형의 구조를 이해하기 쉽고, 인공신경망에 비해 과도적합의 가능성이 적어서 적은 수의 데이터에서도 적용 가능하다는 장점을 가지고 있다. 하지만, 일반적인 SVM을 이용하려면, 인공신경망과 마찬가지로 여러 가지 설계요소들을 설계자가 선택하여야 하기 때문에 임의성이 높고, 국부 최적해에 수렴할 가능성도 크다. 또한, 많은 수의 데이터가 존재하는 경우에는 데이터를 분석하고 이용하는데 시간이 소요되고, 종종 잡음이 심한 데이터가 포함된 경우에는 기대하는 수준의 예측성과를 얻지 못할 가능성이 있다. 본 연구에서는 일반적인 SVM의 장점을 그대로 유지하면서, 전술한 두 가지 단점을 보완한 새로운 SVM 모형을 제안한다. 본 연구에서 제안하는 모형은 사례선택기법을 일반적인 SVM에 융합한 것으로 대용량의 데이터에서 예측에 불필요한 데이터를 선별적으로 제거하여 예측의 정확도와 속도를 제고할 수 있는 방법이다. 본 연구에서는 잡음이 많고 예측이 어려운 것으로 알려진 재무 데이터를 활용하여 제안 모형의 유용성을 확인하였다.

Keywords

References

  1. 안현철, 김경재, "다양한 다분류 SVM을 적용한 기업채권평가", Asia Pacific Journal of Information Systems, 19권 2호(2009), 157-178
  2. 안현철, 김경재, 한인구, "다분류 Support Vector Machine을 이용한 한국 기업의 지능형 기업 채권평가모형", 경영학연구, 35권 5호(2006), 1479-1496.
  3. Ahn, H. and K. Kim, "Using genetic algorithms to optimize k-nearest neighbors for data mining", Annals of Operations Research, Vol.163, No.1(2008), 5-18. https://doi.org/10.1007/s10479-008-0325-2
  4. Chang, C.-C. and C.-J Lin, LIBSVM:a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/ libsvm, 2001
  5. Gates, G. W., "The reduced nearest neighbor rule", IEEE Transactions on Information Theory, Vol.18, No.3(1972), 431-433. https://doi.org/10.1109/TIT.1972.1054809
  6. Harnett, D. L., A. K. Soni, Statistical methods for business and economics, Addison-Wesley, MA, 1991
  7. Hart, P. E., "The condensed nearest neighbor rule", IEEE Transactions on Information Theory, Vol.14(1968), 515-516. https://doi.org/10.1109/TIT.1968.1054155
  8. Kim, K., "Financial time series forecasting using support vector machines", Neurocomputing, Vol.55(2003), 307-319. https://doi.org/10.1016/S0925-2312(03)00372-2
  9. Kim, K., "Artificial neural networks with evolutionary instance selection for financial forecasting", Expert Systems with Applications, Vol.30, No.3(2006), 519-526. https://doi.org/10.1016/j.eswa.2005.10.007
  10. Kuncheva, L. I., "'Change-glasses' approach in pattern recognition", Pattern Recognition Letters, Vol.14(1993), 619-623. https://doi.org/10.1016/0167-8655(93)90046-G
  11. Liu, H. and H. Motoda, "Feature transformation and subset selection", IEEE Intelligent Systems, Vol.13, No.2(1998), 26-28.
  12. McSherry, D., "Automating case selection in the construction of a case library", Knowledge Based Systems, Vol.13, No.2/3(2000), 133- 140. https://doi.org/10.1016/S0950-7051(00)00054-X
  13. Reeves, C. R. and D. R. Bush, Using genetic algorithms for training data selection in RBF networks, In Liu, H. and H. Motoda, Instance selection and construction for data mining, Kluwer Academic Publishers, Massachusetts, (2001), 339-356.
  14. Reeves, C. R. and S. J. Taylor, Selection of training sets for neural networks by a genetic algorithm, In Eiden, A. E., T. Back, M. Schoenauer and H.-P. Schwefel, Parallel problem-solving from nature-PPSN V, Springer-Verlag, Berlin, 1998.
  15. Ritter, G. L., H. B. Woodruff, S. R. Lowry, and T. L. Isenhour, "An algorithm for a selective nearest neighbor decision rule", IEEE Transactions on Information Theory, Vol.21, No.6(1975), 665-669. https://doi.org/10.1109/TIT.1975.1055464
  16. Smyth, B., "Case-base maintenance", Proceedings of the 11th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, (1998), 507-516.
  17. Tay, F. E. H. and L. Cao, "Application of support vector machines in financial time series forecasting", Omega, Vol.29(2001), 309-317. https://doi.org/10.1016/S0305-0483(01)00026-3
  18. Tetko, I. V. and A. E. P. Villa, "Efficient partition of learning data sets for neural network training", Neural Networks, Vol.10, No.8 (1997), 1361-1374. https://doi.org/10.1016/S0893-6080(97)00005-1
  19. Vapnik, V. N., The Nature of Statistical Learning Theory, Springer-Verlag, 1995
  20. Vapnik, V. N., Statistical Learning Theory, Wiley, New York, 1998
  21. Wilson, D. L., "Asymptotic properties of nearest neighbor rules using edited data", IEEE Transactions on Systems, Man, and Cybernetics, Vol.2, No.3(1972), 408-421.
  22. Wilson, D. R. and T. R. Martinez, "Reduction techniques for instance-based learning algorithms", Machine Learning, Vol.38(2000), 257-286. https://doi.org/10.1023/A:1007626913721

Cited by

  1. GA-optimized Support Vector Regression for an Improved Emotional State Estimation Model vol.8, pp.6, 2011, https://doi.org/10.3837/tiis.2014.06.014
  2. 캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘 vol.26, pp.4, 2011, https://doi.org/10.13088/jiis.2020.26.4.173