An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

11 Downloads (Pure)

Abstract

The generalized Dryja–Smith–Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver’s computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.
Original languageEnglish
Title of host publicationProceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
EditorsL. O'Conner
Place of PublicationPiscataway
PublisherIEEE
Pages680-689
Number of pages10
ISBN (Electronic)979-8-3503-3766-2
ISBN (Print)979-8-3503-3767-9
DOIs
Publication statusPublished - 2023
Event2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) - St. Petersburg, United States
Duration: 15 May 202319 May 2023

Publication series

NameProceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

Conference

Conference2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Country/TerritoryUnited States
CitySt. Petersburg
Period15/05/2319/05/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Linear systems
  • Distributed processing
  • Scalability
  • Software algorithms
  • Graphics processing units
  • Supercomputers
  • Software

Fingerprint

Dive into the research topics of 'An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs'. Together they form a unique fingerprint.

Cite this