A computer's central processing unit (CPU) can perform a mathematical operation much faster than data can be transferred from main memory to the CPU. This disparity in speed continues to grow each year. Thus, scientific codes do not attain speeds which could be possible if the CPU speed were the only factor influencing code performance. The typical hardware solution is to place several layers of small, fast cache between the CPU and main memory. Cache hardware by itself cannot guarantee good scientific code performance. Better algorithms (or restructured forms of standard ones) are necessary to ensure better utilization of the cache hierarchy.
In geometric multigrid, the solve time is typically dominated by the smoothing and residual steps. Thus, a speedup in these steps should result in a similar speedup in the entire multigrid code. We consider Gauss-Seidel smoothing in the context of using geometric multigrid to solve a two or three dimensional second order elliptic partial differential equation on an unstructured grid.
We present a variant of the Gauss-Seidel method which keeps data in cache memory much longer than a non-cache aware implementation. As a result, this method is faster than non-cache implementations. The cache aware variant returns bitwise the same answer as a standard Gauss-Seidel method on the same grid ordering. Thus, all convergence results that hold for multigrid with standard Gauss-Seidel hold for multigrid with cache aware Gauss-Seidel.
The cache aware Gauss-Seidel method relies on information from the underlying problem discretization as well as load balancing ideas from parallel computing. The key step to the cache aware method is an inexpensive one time grid reordering. Upper bounds on the complexity of this reordering phase are derived for triangular, tetrahedral, quadrilateral, and hexahedral grids.
A multigrid implementation that uses the grid reordering techniques and cache aware Gauss-Seidel method is described. Code profiling statistics show that the cache aware multigrid method make better use of large cache memory than standard multigrid methods. Numerical experiments demonstrate that the cache aware multigrid code is faster than non-cache aware codes.