DOI QR코드

DOI QR Code

Implementation of low power BSPE Core for deep learning hardware accelerators

딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현

  • Received : 2020.09.18
  • Accepted : 2020.09.28
  • Published : 2020.09.30

Abstract

In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.

본 논문에서 BSPE는 전력이 많이 소모되는 기존의 곱셈 알고리즘을 대체했다. Bit-serial Multiplier를 이용해 하드웨어 자원을 줄였으며, 메모리 사용량을 줄이기 위해 가변적인 정수 형태의 데이터를 사용한다. 또한, 부분 합을 더하는 MOA(Multi Operand Adder)에 LOA(Lower-part OR Approximation)를 적용해서 MOA의 자원 사용량 및 전력사용량을 줄였다. 따라서 기존 MBS(Multiplication by Barrel Shifter)보다 하드웨어 자원과 전력이 각각 44%와 42%가 감소했다. 또한, BSPE Core를 위한 hardware architecture design을 제안한다.

Keywords

References

  1. C. W. Cho, G. Y. Lee, "Low power for deep learning hardware accelerators Bit-Serial Multiplier based Processing Element," IKEEE Conference, 2020.
  2. C. W. Cho, G. Y. Lee, "Bit-Serial multiplier based Neural Processing Element with Approximate adder tree," International SoC Design Conference (ISOCC), 2020.
  3. Mahdiani, Hamid Reza, et al. "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No.4 pp.850-862, 2009. DOI: 10.1109/TCSI.2009.2027626
  4. Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.157-160, 2018. DOI: 10.1145/3229631.3235024
  5. Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machinelearning," ACM SIGARCH Computer Architecture News, Vol.42, No.1, pp.269-284, 2014. DOI: 10.1145/2541940.2541967
  6. Chen, Yu-Hsin, et al. "Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks," IEEE journal of solidstate circuits, Vol.52, No.1 pp.127-138, 2016. DOI: 10.1109/JSSC.2016.2616357
  7. Jouppi, Norman P., et al. "In-datacenter performance analysis of a tensor processing unit," Proceedings of the 44th Annual International Symposium on Computer Architecture, Vol.45, No.2, 2017. DOI: 10.1145/3140659.3080246
  8. Lee, Jinmook, et al. "UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision," 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018. DOI: 10.1109/ISSCC.2018.8310262
  9. Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.187-160, 2018. DOI: 10.1145/3229631.3235024
  10. Park, Hyunbin, Dohyun Kim, and Shiho Kim. "Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks," arXiv preprint arXiv:1812.07517, 2018.
  11. Sharma, Hardik, et al. "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018. DOI: 10.1109/ISCA.2018.00069
  12. Alwani, Manoj, et al. "Fused-layer CNN accelerators," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016. DOI: 10.5555/3195638.3195664

Cited by

  1. 데이터 재사용 기법을 이용한 저 면적 DNN Core vol.25, pp.1, 2020, https://doi.org/10.7471/ikeee.2021.25.1.229