Abstract
We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm's loops as necessary for the match. The method provides a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting a backtracking search algorithm to the GPU platform and building an optimized implementation of the ResNeXt-50 neural network.
Original language | English |
---|---|
Title of host publication | Optical Design and Testing XI |
Editors | Yongtian Wang, Tina E. Kidger, Osamu Matoba, Rengmao Wu |
Publisher | SPIE |
ISBN (Electronic) | 9781510646391 |
DOIs | |
Publication status | Published - 2021 |
Event | Optical Design and Testing XI 2021 - Nantong, China Duration: 10 Oct 2021 → 12 Oct 2021 |
Publication series
Name | Proceedings of SPIE - The International Society for Optical Engineering |
---|---|
Volume | 11895 |
ISSN (Print) | 0277-786X |
ISSN (Electronic) | 1996-756X |
Conference
Conference | Optical Design and Testing XI 2021 |
---|---|
Country/Territory | China |
City | Nantong |
Period | 10/10/21 → 12/10/21 |
Keywords
- control ow graph
- DPLL
- GPU
- loop optimization
- resnet
- SIMD