Center for Applied Scientific Computing

Lawrence Livermore National Laboratory

L-561

PO Box 808

Livermore, CA 94551

Abstract

Loosely stated, the "Holy Grail" of parallel linear solvers is a black box solver that is both algorithmically scalable, so that the number of operations required is O(n), and implementationally scalable, so that efficiency of the parallel implementation is constant in some scaled-speedup sense. In the sequential world, algebraic multigrid methods are black box solvers (in the sense that they take only the matrix as input, though in the sense that parameters are tuned for certain problem classes they are really "grey box" solvers) that achieve algorithmic scalability for a growing family of problems. Realistically, AMG methods will probably never be a true Holy Grail, but they can certainly prove an invaluable tool in a suite of powerful preconditioners for very difficult problems.

Meanwhile, general-purpose parallel linear algebra packages such as PETSc and Aztec are quite notably missing powerful global preconditioners. Domain-decomposition preconditioners, even with Schwarz overlapping, are the mainstays of these packages but suffer from lack of robustness and poor convergence. General purpose (as opposed to problem specific) global parallel preconditioners are sorely needed in this context.

In this talk we discuss work in progress on our parallel AMG implementation. We discuss the overall software design in terms of tools culled from existing linear algebra packages. We describe the interface to the routine in the context of the DOE Linear Solver Interface standardization effort and show that this routine will be easily usable within existing libraries and from applications. We show which key functionalities are not provided in existing packages and our strategies for providing these. We show that the only new routines required are in the setup phase of the algorithm, and include a new parallelized routine for coarse grid selection based in part on algorithms for finding maximal independent sets in a graph, as well as a parallelized routine for sparse matrix-matrix-matrix multiply that is required to form coarse grid operators. We present results from algorithmic studies comparing the quality of coarse grids created with the parallel algorithm versus the sequential algorithm, and preliminary results on the efficiency of some of the key kernels.