The maturity of Computational Fluid Dynamics (CFD) methods and the increasing computational power of contemporary computers has enabled industry to incorporate CFD technology into several stages of a design process. As the application of CFD technology grows from component level analysis to system level, the complexity and size of models are increasing continuously. Successful simulation requires synergy between CAD, grid generation and CFD solvers.
The requirement for shorter design cycles has put severe limitations on the turnaround time of the numerical simulations. The time required for (1) mesh generation for computational domains of complex geometry and (2) obtaining numerical solutions for flows with complex physics has traditionally been the pacing item for CFD applications. Unstructured grid generation techniques and parallel algorithms have been instrumental in making such calculations affordable. Availability of these algorithms in commercial packages has grown in the last few years and parallel performance has become a very important factor in the selection of such methods for production work.
Although extensive research has been devoted in determining the optimum parallel paradigm, in practice the best parallel performance can be obtained only when algorithm and paradigms take into consideration the architectural design of the target computer system they are intended for. This paper addresses the issues related to efficient performance of the commercial CFD software FLUENT on a cache coherent Non Uniform Memory Architecture, or ccNUMA. Also presented are results from implementation of FLUENT on a cluster of systems for both the Linux and SGI IRIX operating systems. Issues related to performance of the message passing system and data placement are investigated for efficient scalability of FLUENT when applied to a variety of industrial problems.
Key words: computational fluid dynamics, FLUENT, parallel performance