DOI QR코드

DOI QR Code

Performance Evaluation of Machine Learning Optimizers

기계학습 옵티마이저 성능 평가

  • Joo, Gihun (Dept. of Medical Bigdata Convergence, Kangwon National University) ;
  • Park, Chihyun (Dept. of Medical Bigdata Convergence, Kangwon National University) ;
  • Im, Hyeonseung (Dept. of Medical Bigdata Convergence, Kangwon National University)
  • Received : 2020.08.31
  • Accepted : 2020.09.21
  • Published : 2020.09.30

Abstract

Recently, as interest in machine learning (ML) has increased and research using ML has become active, it is becoming more important to find an optimal hyperparameter combination for various ML models. In this paper, among various hyperparameters, we focused on ML optimizers, and measured and compared the performance of major optimizers using various datasets. In particular, we compared the performance of nine optimizers ranging from SGD, which is the most basic, to Momentum, NAG, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax, and Nadam, using the MNIST, CIFAR-10, IRIS, TITANIC, and Boston Housing Price datasets. Experimental results showed that when Adam or Nadam was used, the loss of various ML models decreased most rapidly and their F1 score was also increased. Meanwhile, AdaMax showed a lot of instability during training and AdaDelta showed slower convergence speed and lower performance than other optimizers.

최근 기계학습에 대한 관심이 높아지고 연구가 활성화됨에 따라 다양한 기계학습 모델에서 최적의 하이퍼 파라미터 조합을 찾는 것이 중요해지고 있다. 본 논문에서는 다양한 하이퍼 파라미터 중에서 옵티마이저에 중점을 두고, 다양한 데이터에서 주요 옵티마이저들의 성능을 측정하고 비교하였다. 특히, 가장 기본이 되는 SGD부터 Momentum, NAG, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax, Nadam까지 총 9개의 옵티마이저의 성능을 MNIST, CIFAR-10, IRIS, TITANIC, Boston Housing Price 데이터를 이용하여 비교하였다. 실험 결과, 전체적으로 Adam과 Nadam을 사용하였을 때 기계학습 모델의 손실 함숫값이 가장 빠르게 감소하는 것을 확인할 수 있었으며, F1 score 또한 높아짐을 확인할 수 있었다. 한편, AdaMax는 학습 중에 불안정한 모습을 많이 보여주었으며, AdaDelta는 다른 옵티마이저들에 비하여 수렴 속도가 느리며 성능이 낮은 것을 확인할 수 있었다.

Keywords

References

  1. N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol.12, no.1, pp.145-151, 1999. DOI: 10.1016/S0893-6080(98)00116-6
  2. Y. E. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)," Dokl AN SSSR, vol.269, no.3, pp. 543-547, 1983.
  3. J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol.12, pp.2121-2159, 2011. DOI: 10.5555/1953048.2021068
  4. S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv: 1609.04747, 2016.
  5. M. D. Zeiler, "Adadelta: An adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
  6. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv: 1412.6980, 2014.
  7. T. Dozat, "Incorporating nesterov momentum into adam," in the 4th International Conference on Learning Representations (ICLR 2016) Workshop track, 2016.
  8. D. Choi et al., "On empirical comparisons of optimizers for deep learning," arXiv preprint arXiv: 1910.05446, 2019.
  9. M. Mahsa and T. Lee, "Comparison of Optimization Algorithms in Deep Learning-Based Neural Networks for Hydrological Forecasting: Case Study of Nam River Daily Runoff," J. Korean Soc. Hazard Mitig., vol.18, no.6, pp.377-384, 2018. DOI: 10.9798/KOSHAM.2018.18.6.377
  10. W. Jung, B.-S. Lee, and J. Seo, "Performance Comparison of the Optimizers in a Faster R-CNN Model for Object Detection of Metaphase Chromosomes," J. Korea Inst. Inf. Commun. Eng., vol.23, no.11, pp.1357-1363, 2019.
  11. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in the 3rd International Conference on Learning Representations (ICLR 2015), 2015.
  12. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp.770-778, 2016.

Cited by

  1. 딥러닝을 이용한 벼 도복 면적 추정 vol.66, pp.2, 2020, https://doi.org/10.7740/kjcs.2021.66.2.105