DOI QR코드

DOI QR Code

CNN Deep Learning Acceleration Algorithm for Mobile System

모바일 시스템을 위한 CNN 딥 러닝 가속화 알고리즘

  • 박성우 (단국대학교 전자전기공학과) ;
  • 한경호 (단국대학교 전자전기공학과) ;
  • 장우영 (단국대학교 전자전기공학과)
  • Received : 2018.07.24
  • Accepted : 2018.08.25
  • Published : 2018.10.31

Abstract

A mobile system with limited computing and storage capacity mainly processes the training and inference of deep learning in a data center. Therefore, it is difficult for a mobile system to provide private artificial intelligence services, and users may be reluctant to transfer personal information to data centers. Therefore, this paper proposes a deep learning acceleration algorithm for convolutional neural network where a mobile system enables learning and inference itself. The proposed algorithm efficiently reduces the size of the convolutional neural network by a low-rank approximation method that compacts the information of the neural network into some weights, and a pruning method that removes non-critical weights. Experimental results show that the proposed algorithm achieves the speed of inference 1.65 times faster, requires the number of fine-tune fewer 1.5 times, and reduces the memory capacity for storing weights 2 times less than the conventional prunning algorithm.

컴퓨팅 및 저장 용량이 제한된 모바일 시스템은 딥 러닝(Deep Learning) 학습과 추론을 데이터 센터에서 주로 처리한다. 따라서 모바일 시스템은 개인을 위한 특별한 인공지능 서비스를 제공하기 어렵고, 사용자들은 개인 정보를 데이터 센터로 전송하는 것을 꺼려할 수 있다. 따라서, 본 논문에서는 모바일 시스템에서 딥 러닝의 학습(Train)과 추론(Inferencd)를 할 수 있는 컨볼루션 신경망(Convolutional Neural Network) 기반 딥 러닝 가속 알고리즘을 제안한다. 제안 알고리즘은 컨볼루션 신경망에서 낮은 랭크 근사(Low-Rank Approximation)로 신경망 정보를 소수의 가중치로 집중시키고, 중요하지 않은 가중치를 제거하는 가지치기(Pruning) 기법으로 컨볼루션 신경망의 크기를 효율적으로 줄인다. 실험을 통해 제안된 알고리즘은 기존의 가지치기 알고리즘보다 추론의 속도를 1.65배 빠르며, 재학습 횟수를 1.5배 줄이고, 가중치들 저장을 위한 메모리 용량을 2배 줄일 수 있다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. D. Jeong, "Trend on Artificial Intelligence Technology and Its Related Industry", JKIIT, Vol. 15, No. 2, pp. 21-28, Feb. 2017. https://doi.org/10.14801/jkiit.2017.15.5.21
  2. K. Kang and D, Kang, "A Study of Face Detection Algorithm Using CNN Based on Symmetry-LGP & Uniform-LGP and the Skin Color", JKIIT, Vol. 15, No. 1, pp. 107-113, Jan. 2017.
  3. R. Girshick, "Fast R-CNN", International Conference on ComputerVision (ICCV), pp. 1440-1448, Sep. 2015.
  4. R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning", Proceedings of the 25th international conference on Machine learning(ICML), pp. 160-167, Jul. 2008.
  5. G. Chen, C. Parada, and G. Heigold, "Smallfootprint keyword spotting using deep neural networks", International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087-4091, May 2014.
  6. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, Vol. 86, No. 11, pp. 2278-2324, Nov. 1998. https://doi.org/10.1109/5.726791
  7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", In Advances in neural informationprocessing systems (NIPS), pp. 1097-1105, Jan. 2012.
  8. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", ICLR, arXiv preprint arXiv:1409.1556, pp. 1-14, Apr. 2015.
  9. G. Poli and M. R. Zorzan, "Processing neocognitron of face recognition on high performance environment based on GPU with CUDA architecture", 20th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD'08. pp. 81-88, Oct. 2008.
  10. M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar and S. W. Keckler, "vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design", In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium, pp. 1-13, Oct. 2016.
  11. Y. D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin, "Compression of deep convolutional neural networks for fast and low power mobile applications", Computer Science 2015, arXiv preprint arXiv: 1511.06530, pp. 1-16, Nov. 2015.
  12. J. Wu, C. Leng, Y. Wang, Q. Hu and J. Cheng, "Quantized convolutional neural networks for mobile devices", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820-4828, May 2016.
  13. J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, and Y. Wang, "Going Deeper with Embedded FPGA platform for Convolutional Neural Network", In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 26-35, Feb. 2016.
  14. S. Anwar, K. Hwang, and W. Sung, "Structured pruning of deep convolutional neural networks", ACM Journal on Emerging Technologies in Computing Systems (JETC), Vol. 13, No. 3, 12pages, May 2017.
  15. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", arXiv preprint arXiv:1510.00149, pp. 1-14, Oct. 2015.
  16. S. Han, J. Pool, J. Tran, and W. Dally, "Learning both weights and connections for efficient neural network" Advances in neural information processing systems (NIPS), pp. 1135-1143, Oct. 2015.
  17. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov and A. Rabinovich, "Going deeper with convolutions", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9 Jun. 2015.
  18. J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das and S. Mahlke, "Scalpel: Customizing dnn pruning to the underlying hardware parallelism", In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), pp. 548-560, Jun. 2017.
  19. J. Chung and T. Shin, "Simplifying deep neural networks for neuromorphic architectures", Design Automation Conference (DAC), 53nd ACM/EDAC/IEEE, pp. 1-6, Jun. 2016.
  20. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding", In Proceedings of the 22nd ACM international conference on Multimedia, pp. 675-678, Nov. 2014.
  21. K. Baker, "Singular value decomposition tutorial", The Ohio State University 24, 2005.
  22. J. Ye, "Generalized low rank approximations of matrices", Machine Learning, Vol. 61, No. 1-3, pp. 167-191, Nov. 2005. https://doi.org/10.1007/s10994-005-3561-6
  23. T. Roughgarden and G. Valiant, "CS168: The Modern Algorithmic Toolbox Lecture# 9: The Singular Value Decomposition (SVD) and Low-Rank Matrix Approximations", in Stanford University Lecture, pp. 2-7, Apr. 2015.
  24. J. Park, S. Li, W. Wen, P. T. P. Tang, H. Li, Y. Chen, and P. Dubey, "Faster cnns with direct sparse convolutions and guided pruning", arXiv preprint arXiv:1608.01409, pp. Nov. 2016.
  25. A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, and W. J. Dally, "Scnn: An accelerator for compressed-sparse convolutional neural networks", In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), pp. 27-40, Jun. 2017.

Cited by

  1. SNS 기반 여론 감성 분석 vol.6, pp.1, 2018, https://doi.org/10.17703/jcct.2020.6.1.111
  2. Analysis of the 19th Presidential TV Debate Using Deep Learning Based Video Processing Algorithms : Analysis of the frequency, facial expression and gaze vol.64, pp.5, 2018, https://doi.org/10.20879/kjjcs.2020.64.5.009
  3. 영상 처리와 딥러닝을 이용한 악보 코드 변환 프로그램 vol.21, pp.1, 2021, https://doi.org/10.7236/jiibc.2021.21.1.69