Abstract
Quantization techniques are widely used in CNN inference to reduce the cost of hardware at the expense of small accuracy losses. However, after the quantization, there is still a multiplication cost for the fixed-point quantized CNN weights. Therefore, a novel CNN quantization technique is introduced, which can be implemented without using any multiplier. We evaluated our quantization technique using VGG-16 and Alexnet networks, and the Tiny ImageNet dataset. The quantization technique causes 0.39% and 0.98% accuracy losses for the 8-bit CNN weights compared to floating-point implementations of VGG-16 and Alexnet, respectively. After, a fine-tuning method for our quantization is introduced, which further reduces the accuracy loss. The fine-tuning reduced the accuracy losses on 8-bit quantized VGG-16 and Alexnet to 0.24% and 0.39%, respectively. Two different processing element architectures, which do not include any multiplier hardware, are designed to perform multiply-accumulate (MAC) operations of CNN models quantized by our technique. Two different systolic array prototypes are designed employing the two PE architectures to compare with the traditional fixed-point MAC implementation. The systolic array architectures containing our processing element designs reduced the power consumption of the systolic array up to 14.2% and 21.6%.
Original language | English |
---|---|
Title of host publication | 2021 24th Euromicro Conference on Digital System Design (DSD) |
Subtitle of host publication | Proceedings |
Editors | L. O'Conner |
Place of Publication | Piscataway |
Publisher | IEEE |
Pages | 18-23 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-6654-2703-6 |
ISBN (Print) | 978-1-6654-2704-3 |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 24th Euromicro Conference on Digital System Design (DSD) - Virtual at Palermo, Spain Duration: 1 Sept 2021 → 3 Sept 2021 |
Conference
Conference | 2021 24th Euromicro Conference on Digital System Design (DSD) |
---|---|
Abbreviated title | DSD 2021 |
Country/Territory | Spain |
City | Virtual at Palermo |
Period | 1/09/21 → 3/09/21 |
Keywords
- Quantization
- deep learning
- hardware implementation
- low power
- ASIC