Abstract
Block variants of the Jacobi-Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm when it comes to computing multiple or clustered eigenvalues. In practice, however, they are typically avoided because the total number of matrix-vector operations increases. In this paper we present the implementation of a block Jacobi-Davidson solver. By detailed performance engineering and numerical experiments we demonstrate that the increase in operations is typically more than compensated by performance gains through better cache usage on modern CPUs, resulting in a method that is both more efficient and robust than its single vector counterpart. The steps to be taken to achieve a block speedup involve both kernel optimizations for sparse matrix and block vector operations, and algorithmic choices to allow using blocked operations in most parts of the computation. We discuss the aspect of avoiding synchronization in the algorithm and show by numerical experiments with our hybrid parallel implementation that a significant speedup through blocking can be achieved for a variety of matrices on up to 5 120 CPU cores as long as at least about 20 eigenpairs are sought.
Original language | English |
---|---|
Pages (from-to) | C697-C722 |
Journal | SIAM Journal on Scientific Computing |
Volume | 37 |
Issue number | 6 |
DOIs | |
Publication status | Published - 2015 |
Externally published | Yes |
Keywords
- Block methods
- High performance computing
- Hybrid parallel implementation
- Jacobi-Davidson
- Multicore processors
- Performance engineering
- Sparse eigenvalue problems