Almost Fully Utilized Caches for Iterative Methods
Craig C. Douglas
University of Kentucky
Departments of Computer Science, Mechanical Engineering, and
Center for Computational Sciences
325 McVey Hall - CCS
Lexington, KY 40506-0045, USA
Jonathan Hu
University of Kentucky
Department of Mathematics
715 Patterson Hall
Lexington, KY 40506-0027, USA
Wolfgang Karl
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik
Technische Universität München
D-80290 München, Germany
Markus Kowarschik*
Lehrstuhl für Systemsimulation (IMMD 10)
Institut für Informatik
Universität Erlangen-Nürnberg
Martensstraße 3, D-91058 Erlangen, Germany
Ulrich Rüde
Lehrstuhl für Systemsimulation (IMMD 10)
Institut für Informatik
Universität Erlangen-Nürnberg
Martensstraße 3, D-91058 Erlangen, Germany
Christian Weiß
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik
Technische Universität München
D-80290 München, Germany
Abstract
Conventional implementations of iterative numerical algorithms, especially
multigrid methods, merely reach a disappointing small percentage of the
theoretically available CPU performance when applied to representative large
problems. One of the most important reasons for this phenomenon is that the
current DRAM technology is not able to provide the data fast enougth to keep
the CPU busy. Although the fundamentals of cache optimizations are quite
simple, current compilers are not able to optimize even simple iterative
methods.
We analyse the memory and cache behavior of iterative methods with extensive
profiling and describe program transformation techniques to improve the cache
performance of two and three dimensional multigrid algorithms. Examples are
provided for both structured and unstructured grid problems.