Abstract
In this paper we discuss a high performance implementation for Convolutional Neural Networks (CNNs) inference on the latest generation of Dataflow Engines (DFEs). We discuss the architectural choices made during the design phase taking into account the DFE chip properties. We then perform design space exploration, considering the memory bandwidth and resources utilisation constraints derived from the used DFE and the chosen architecture. Finally, we discuss the high performance implementation and compare the obtained performance against other implementations, showing that our proposed design reaches 2,450 GOPS when running VGG16 as a test case.
Original language | English |
---|---|
Title of host publication | Proceedings - 35th IEEE International Conference on Computer Design, ICCD 2017 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 435-438 |
Number of pages | 4 |
ISBN (Electronic) | 9781538622544 |
DOIs | |
Publication status | Published - 22 Nov 2017 |
Externally published | Yes |
Event | 35th IEEE International Conference on Computer Design, ICCD 2017 - Boston, United States Duration: 5 Nov 2017 → 8 Nov 2017 |
Conference
Conference | 35th IEEE International Conference on Computer Design, ICCD 2017 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 5/11/17 → 8/11/17 |
Keywords
- CNN
- Deep Learning
- DFE
- DSE
- FPGA
- Inference