TY - GEN
T1 - Evolutionary bin packing for memory-efficient dataflow inference acceleration on FPGA
AU - Kroes, Mairin
AU - Petrica, Lucian
AU - Cotofana, Sorin
AU - Blott, Michaela
PY - 2020
Y1 - 2020
N2 - Convolutional Neural Network (CNN) dataflow inference accelerators implemented in Field-Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA On-Chip Memories (OCM), which results in poor OCM utilization and ultimately limits the size and types of CNNs which can be effectively accelerated on FPGAs. In this work, we present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the mapping as a bin packing problem and determine that traditional bin packing algorithms are not well suited to solve the problem within FPGA- and CNN-specific constraints. We hybridize genetic algorithms and simulated annealing with traditional bin packing heuristics to create flexible mappers capable of grouping parameter memories such that each group optimally fits FPGA on-chip memories. We evaluate these algorithms on a variety of FPGA inference accelerators. Our hybrid mappers converge to optimal solutions in a matter of seconds for all CNN use-cases, achieve an increase of up to 65% in OCM utilization efficiency for deep CNNs, and are up to 200× faster than current state-of-the-art simulated annealing approaches.
AB - Convolutional Neural Network (CNN) dataflow inference accelerators implemented in Field-Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA On-Chip Memories (OCM), which results in poor OCM utilization and ultimately limits the size and types of CNNs which can be effectively accelerated on FPGAs. In this work, we present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the mapping as a bin packing problem and determine that traditional bin packing algorithms are not well suited to solve the problem within FPGA- and CNN-specific constraints. We hybridize genetic algorithms and simulated annealing with traditional bin packing heuristics to create flexible mappers capable of grouping parameter memories such that each group optimally fits FPGA on-chip memories. We evaluate these algorithms on a variety of FPGA inference accelerators. Our hybrid mappers converge to optimal solutions in a matter of seconds for all CNN use-cases, achieve an increase of up to 65% in OCM utilization efficiency for deep CNNs, and are up to 200× faster than current state-of-the-art simulated annealing approaches.
UR - http://www.scopus.com/inward/record.url?scp=85091785319&partnerID=8YFLogxK
U2 - 10.1145/3377930.3389808
DO - 10.1145/3377930.3389808
M3 - Conference contribution
T3 - GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference
SP - 1125
EP - 1133
BT - GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery (ACM)
T2 - 2020 Genetic and Evolutionary Computation Conference, GECCO 2020
Y2 - 8 July 2020 through 12 July 2020
ER -