Parallel Partitioning
26th April 2006, Cambridge Flow Solutions Ltd
1. Summary
This case illustrates the benefits to CFD engineers of being able to use existing parallel hardware to alleviate set-up bottlenecks for very large CFD cases in commercial flow solvers. Our efficient mesh partitioning, in parallel for parallel, can provide set-up times of tens of seconds for 100M+ cell cases, rather than tens of minutes, seamlessly integrated into the standard CFD process. Equally significantly, CFD computation time itself can also be improved by up to around 10%.
2. Introduction
As the ambition of CFD users continues to rise, along with their access to parallel hardware, a number of new bottlenecks arise when configuring very large meshes for CFD analysis. Not least of these is partitioning, where a given model is distributed suitably across parallel computers such that CFD analysis can take place efficiently. Here, we provide a fully parallel algorithm and implementation: to provide very swift partitioning itself, improved CFD run times, and also to reduce existing hardware requirements (e.g. the RAM overhead incurred when attempting the usual serial approach to this issue)
Our tool provides:
- Seamless integration into standard Fluent set-up procedures. A standard Fluent file is read, partitioned in parallel, and written again in the same standard format to read straight into Fluent.
- Simple access to the full range of options provided by the Metis partitioning approach. This enables users to exploit this well-known algorithm to their best advantage – choosing for example to weight partitions based on mesh face distribution, to suit a face-based CFD solver in Fluent.
- Parallel operation as standard, across standard CFD hardware, using standard message-passing protocols (LAM-MPI).
- Partitioning for meshes up to 2 billion cells.
3. Demonstration
The test-case used here is a 125M cell mesh, typical of the expectations of high-end Fluent users. Partitioning is demonstrated across different numbers of processors, compared against the standard serial implementation in Fluent, the objective being simply to exploit the parallel hardware to aid pre-processing. The times reported are for partitioning itself – note also though that our tool does not require the usual grid-building stage, which would provide us an additional gain of 5 minutes or so in the overall clock-time to read/partition/write a file.
The specific test framework is as follows:
- Multiple 2GHz AMD-64 processors
- SuSE SLES-9 x86-64 O/S
- Gigabit Ethernet connection
- LAM/MPI implementation
The benefit to the tool of the additional processing power is quite clear. Note also that, as the limitations of interconnect speed begins to bite, the parallelisation has provided a gain ultimately of about 2 orders of magnitude in partitioning speed over the standard approach. The total parallel memory requirement is also about half that of the serial case, and need only be distributed over the parallel hardware rather than concentrated at a single serial CPU.

Sample test mesh for benchmarking |

Partitioning applied to complex external aerodynamics |
CFD computation times vary with partition configuration, as the mesh topology (tet/hex/hybrid) determines the specific behaviour within the flow solver. We have found that a simple face-based weighting approach in the partitioning gives a good gain for complex geometry with hybrid mesh, of order 5-10% for the size of meshes considered here. Other classes of mesh and flow physics can exploit different partitioning properties of the algorithm.
This graph shows the performace gain in our tool during actual flow solution in Fluent, for a typical hybrid mesh in a standard low-speed aerodynamic test case.
4. Conclusion
As expectations of CFD rise to 100M+ cell meshes, it seems a shame not to exploit existing parallel CFD hardware to help ease deployment of existing commercial CFD software. Here, we see such gains illustrated by our parallel partitioning implementation – we see that this will enable high-end users to ease their set-up issues, and we expect also to improve their calculation times, each integrated fully into their existing process.
|