Considerations for Parallel CFD Enhancements on SGI ccNUMA and Cluster Architectures

Mark Kremenetsky

 Supercomputer Applications Silicon Graphics, Inc.,M/S 41L-932  Mountain View, California 94043


      The maturity of Computational Fluid Dynamics (CFD) methods and the increasing computational power of contemporary computers has enabled industry to incorporate CFD technology in several stages of design processes. As the application of the CFD technology grows from component level analysis to system level, the complexity and the size of models increase continuously. Successful simulation requires synergy  between CAD, grid generation and solvers.     The requirement for shorter design cycles has put severe limitations on the turnaround time of the numerical  simulations. The time required for (1) mesh generation for computational domains of complex geometry and (2) obtaining numerical solutions for flows with complex physics has traditionally been the pacing item for CFD applications. Unstructured grid generation techniques and parallel algorithms have been instrumental  in making such calculations affordable. Availability of these algorithms in commercial packages has grown in the last few years and parallel performance has become a very important factor in the selection of such methods for production work.     Although extensive research has been devoted in determining the optimum parallel paradigm, in practice  the best parallel performance can be obtained only when algorithm and paradigms take into consideration  the architectural design of the target computer system they are intended for. This paper addresses the issues related to efficient performance of the commercial CFD software FLUENT (based on AMG linear solver) on a cache coherent Non  Uniform Memory (ccNUMA) Architecture. Also presented are results from implementation of FLUENT on cluster systems of workstation for both the Linux and SGI IRIX operating systems. Issues related to performance of the message passing system and memory-processor affinity are investigated for efficient  scalability of FLUENT when applied to a variety of industrial problems.