Fine-Grain Parallel RRB-Solver for 5-/9-Point Stencil Problems Suitable for GPU-Type Processors

Martijn de Jong, Auke van der Ploeg, Auke Ditzel, Kees Vuik

Research output: Contribution to journalArticleScientificpeer-review


Preconditioners based on incomplete factorization are very popular for a fast convergence of the PCG-algorithm. However, these preconditioners are hard to parallelize since most operations are inherently sequential. In this paper we present the RRB-solver, which is a PCG-type solver using an incomplete Cholesky factorization based on the Repeated Red-Black (RRB) method. The RRB-solver scales nearly as well as Multigrid, and in this paper we show that this method can be parallelized very efficiently on modern computing architectures including GPUs. For an efficient parallel implementation a clever storage scheme turns out to be the key. The storage scheme is called r1=r2=b1=b2 and it ensures that memory transfers are coalesced throughout the algorithm, yielding near-optimal performance of the RRB-solver. The r1=r2=b1=b2-storage scheme in combination with a CUDA implementation on the GPU gives speedup factors of more than 30 compared to a sequential implementation on one CPU core for 5-/9-point stencils problems. A comparison with algebraic Multigrid further shows that the RRB-solver can be implemented very efficiently on the GPU.
Original languageEnglish
Pages (from-to)375-393
Number of pages19
JournalElectronic Transactions on Numerical Analysis
Publication statusPublished - 2017


  • repeated red-black
  • conjugate gradient
  • incomplete Cholesky
  • parallelization
  • CUDA
  • 2D Poisson


Dive into the research topics of 'Fine-Grain Parallel RRB-Solver for 5-/9-Point Stencil Problems Suitable for GPU-Type Processors'. Together they form a unique fingerprint.

Cite this