Implementation of low power BSPE Core for deep learning hardware accelerators

Jo, Cheol-Won;Lee, Kwang-Yeob;Nam, Ki-Hun;

doi:10.7471/ikeee.2020.24.3.895

Journal of IKEEE (전기전자학회논문지)

Volume 24 Issue 3
/
Pages.895-900
/
2020
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

Implementation of low power BSPE Core for deep learning hardware accelerators

딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현

Jo, Cheol-Won (Dept. of Computer Eng., Seokyeong University) ;
Lee, Kwang-Yeob (Dept. of Electronics and Computer Eng., Seokyeong University) ;
Nam, Ki-Hun (Dept. of Computer Eng., Seokyeong University)

Received : 2020.09.18
Accepted : 2020.09.28
Published : 2020.09.30

https://doi.org/10.7471/ikeee.2020.24.3.895 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.

본 논문에서 BSPE는 전력이 많이 소모되는 기존의 곱셈 알고리즘을 대체했다. Bit-serial Multiplier를 이용해 하드웨어 자원을 줄였으며, 메모리 사용량을 줄이기 위해 가변적인 정수 형태의 데이터를 사용한다. 또한, 부분 합을 더하는 MOA(Multi Operand Adder)에 LOA(Lower-part OR Approximation)를 적용해서 MOA의 자원 사용량 및 전력사용량을 줄였다. 따라서 기존 MBS(Multiplication by Barrel Shifter)보다 하드웨어 자원과 전력이 각각 44%와 42%가 감소했다. 또한, BSPE Core를 위한 hardware architecture design을 제안한다.

Keywords

References

C. W. Cho, G. Y. Lee, "Low power for deep learning hardware accelerators Bit-Serial Multiplier based Processing Element," IKEEE Conference, 2020.
C. W. Cho, G. Y. Lee, "Bit-Serial multiplier based Neural Processing Element with Approximate adder tree," International SoC Design Conference (ISOCC), 2020.
Mahdiani, Hamid Reza, et al. "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No.4 pp.850-862, 2009. DOI: 10.1109/TCSI.2009.2027626
Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.157-160, 2018. DOI: 10.1145/3229631.3235024
Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machinelearning," ACM SIGARCH Computer Architecture News, Vol.42, No.1, pp.269-284, 2014. DOI: 10.1145/2541940.2541967
Chen, Yu-Hsin, et al. "Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks," IEEE journal of solidstate circuits, Vol.52, No.1 pp.127-138, 2016. DOI: 10.1109/JSSC.2016.2616357
Jouppi, Norman P., et al. "In-datacenter performance analysis of a tensor processing unit," Proceedings of the 44th Annual International Symposium on Computer Architecture, Vol.45, No.2, 2017. DOI: 10.1145/3140659.3080246
Lee, Jinmook, et al. "UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision," 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018. DOI: 10.1109/ISSCC.2018.8310262
Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.187-160, 2018. DOI: 10.1145/3229631.3235024
Park, Hyunbin, Dohyun Kim, and Shiho Kim. "Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks," arXiv preprint arXiv:1812.07517, 2018.
Sharma, Hardik, et al. "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018. DOI: 10.1109/ISCA.2018.00069
Alwani, Manoj, et al. "Fused-layer CNN accelerators," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016. DOI: 10.5555/3195638.3195664

Cited by

데이터 재사용 기법을 이용한 저 면적 DNN Core vol.25, pp.1, 2020, https://doi.org/10.7471/ikeee.2021.25.1.229

Journal of IKEEE (전기전자학회논문지)

Implementation of low power BSPE Core for deep learning hardware accelerators

딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)