A Fast Tridiagonal Solver for Intel MIC Architecture

Xinliang Wang, Wei Xue, Jidong Zhai, Yangtong Xu, Weimin Zheng, Hai Xiang Lin

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

10 Citations (Scopus)


The tridiagonal solver is an important kernel in many scientific and engineering applications. Although quite a few parallel algorithms have been exploited recently, challenges still remain when solving tridiagonal systems on many-core architectures. In this paper, quantitative analysis is conducted to guide the selection of algorithms on different architectures, and a fast register-PCR-pThomas algorithm for Intel MIC architecture is presented to achieve the best utilization of in-core vectorization and registers. By choosing the most suitable number of PCR reductions, we further propose an improved algorithm, named register-PCR-half-pThomas algorithm, which minimizes the computational cost and the number of registers for use. The optimizations of manual prefetching and strength reduction are also discussed for the new algorithm. Evaluation on Intel Xeon Phi 7110P shows that our register-PCR-pThomas can outperform the MKL solver by 7.7 times in single precision and 3.4 times in double precision. Moreover, our optimized register-PCR-half-pThomas can further boost the performance by another 43.9% in single precision and 20.4% in double precision. Our best algorithm can outperform the latest official GPU library (cuSPARSE) on Nvidia K40 by 3.7 times and 2.4 times in single and double precision, respectively.
Original languageEnglish
Title of host publication2016 IEEE 30th International Parallel and Distributed Processing Symposium (IPDPS)
EditorsJeff Hollingsworth
Place of PublicationLos Alamitos
Number of pages10
ISBN (Print) 978-1-5090-2140-6
Publication statusPublished - 2016
Event30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 - Chicago, United States
Duration: 23 May 201627 May 2016


Conference30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016
Abbreviated titleIPDPM
Country/TerritoryUnited States


  • Tridiagonal solver
  • Intel MIC architecture
  • Vectorization


Dive into the research topics of 'A Fast Tridiagonal Solver for Intel MIC Architecture'. Together they form a unique fingerprint.

Cite this