Almost Fully Utilized Caches for Iterative Methods

Craig C. Douglas

University of Kentucky
Departments of Computer Science, Mechanical Engineering, and
Center for Computational Sciences
325 McVey Hall - CCS
Lexington, KY 40506-0045, USA

Jonathan Hu

University of Kentucky
Department of Mathematics
715 Patterson Hall
Lexington, KY 40506-0027, USA

Wolfgang Karl
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik
Technische Universität München
D-80290 München, Germany

Markus Kowarschik*
Lehrstuhl für Systemsimulation (IMMD 10)
Institut für Informatik
Universität Erlangen-Nürnberg
Martensstraße 3, D-91058 Erlangen, Germany

Ulrich Rüde
Lehrstuhl für Systemsimulation (IMMD 10)
Institut für Informatik
Universität Erlangen-Nürnberg
Martensstraße 3, D-91058 Erlangen, Germany

Christian Weiß
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik
Technische Universität München
D-80290 München, Germany


Abstract

Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the current DRAM technology is not able to provide the data fast enougth to keep the CPU busy. Although the fundamentals of cache optimizations are quite simple, current compilers are not able to optimize even simple iterative methods.

We analyse the memory and cache behavior of iterative methods with extensive profiling and describe program transformation techniques to improve the cache performance of two and three dimensional multigrid algorithms. Examples are provided for both structured and unstructured grid problems.