An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

Ichitaro  Yamazaki; Alexander Heinlein; Sivasankaran Rajamanickam

doi:10.1109/IPDPS54959.2023.00073

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

Numerical Analysis

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

11 Downloads (Pure)

Abstract

The generalized Dryja–Smith–Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver’s computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.

Original language	English
Title of host publication	Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Editors	L. O'Conner
Place of Publication	Piscataway
Publisher	IEEE
Pages	680-689
Number of pages	10
ISBN (Electronic)	979-8-3503-3766-2
ISBN (Print)	979-8-3503-3767-9
DOIs	https://doi.org/10.1109/IPDPS54959.2023.00073
Publication status	Published - 2023
Event	2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) - St. Petersburg, United States Duration: 15 May 2023 → 19 May 2023

Publication series

Name	Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

Conference

Conference	2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Country/Territory	United States
City	St. Petersburg
Period	15/05/23 → 19/05/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Linear systems
Distributed processing
Scalability
Software algorithms
Graphics processing units
Supercomputers
Software

Access to Document

10.1109/IPDPS54959.2023.00073

An_Experimental_Study_of_Two-level_Schwarz_Domain-Decomposition_Preconditioners_on_GPUsFinal published version, 1.18 MB

Cite this

Yamazaki, I., Heinlein, A., & Rajamanickam, S. (2023). An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. In L. O'Conner (Ed.), Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 680-689). (Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023). IEEE. https://doi.org/10.1109/IPDPS54959.2023.00073

Yamazaki, Ichitaro ; Heinlein, Alexander ; Rajamanickam, Sivasankaran. / An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). editor / L. O'Conner. Piscataway : IEEE, 2023. pp. 680-689 (Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023).

@inproceedings{3273e3dfae1c46afacc9ce1508e50b7d,

title = "An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs",

abstract = "The generalized Dryja–Smith–Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver{\textquoteright}s computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.",

keywords = "Linear systems, Distributed processing, Scalability, Software algorithms, Graphics processing units, Supercomputers, Software",

author = "Ichitaro Yamazaki and Alexander Heinlein and Sivasankaran Rajamanickam",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ; Conference date: 15-05-2023 Through 19-05-2023",

year = "2023",

doi = "10.1109/IPDPS54959.2023.00073",

language = "English",

isbn = "979-8-3503-3767-9",

series = "Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023",

publisher = "IEEE",

pages = "680--689",

editor = "L. O'Conner",

booktitle = "Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)",

address = "United States",

}

Yamazaki, I, Heinlein, A & Rajamanickam, S 2023, An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. in L O'Conner (ed.), Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, IEEE, Piscataway, pp. 680-689, 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, United States, 15/05/23. https://doi.org/10.1109/IPDPS54959.2023.00073

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. / Yamazaki, Ichitaro ; Heinlein, Alexander; Rajamanickam, Sivasankaran.
Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). ed. / L. O'Conner. Piscataway: IEEE, 2023. p. 680-689 (Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

AU - Yamazaki, Ichitaro

AU - Heinlein, Alexander

AU - Rajamanickam, Sivasankaran

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2023

Y1 - 2023

N2 - The generalized Dryja–Smith–Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver’s computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.

AB - The generalized Dryja–Smith–Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver’s computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.

KW - Linear systems

KW - Distributed processing

KW - Scalability

KW - Software algorithms

KW - Graphics processing units

KW - Supercomputers

KW - Software

UR - http://www.scopus.com/inward/record.url?scp=85166660928&partnerID=8YFLogxK

U2 - 10.1109/IPDPS54959.2023.00073

DO - 10.1109/IPDPS54959.2023.00073

M3 - Conference contribution

SN - 979-8-3503-3767-9

T3 - Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

SP - 680

EP - 689

BT - Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

A2 - O'Conner, L.

PB - IEEE

CY - Piscataway

T2 - 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Y2 - 15 May 2023 through 19 May 2023

ER -

Yamazaki I, Heinlein A, Rajamanickam S. An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. In O'Conner L, editor, Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Piscataway: IEEE. 2023. p. 680-689. (Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023). doi: 10.1109/IPDPS54959.2023.00073

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this