The starting point for this project was the iterative solution of sparse and block structured linear systems that arise from the analysis of turbulent fl ows in Computational Fluid Dynamics applications. In the last two years we have studied preconditioning techniques based on block multilevel incomplete LU factorization preconditioners for this problem class, and we have found them to be quite effective in reducing the number of GMRES iterations. These preconditioners exhibit better parallelism than standard ILU algorithms due to the recursive factorization and they may be noticeably more robust for comparable memory usage especially for solving large problems. Additionally, exploiting the available block structure in the matrix can maximize computational effi ciency. Sparse matrices arising from the solution of systems of partial differential equations often exhibit a perfect block structure, meaning that the nonzero blocks in the sparsity pattern are fully dense (and typically small), e.g., when several unknown quantities are associated with the same grid point. However, similar block orderings can be sometimes unravelled also on general unstructured matrices, by ordering consecutively rows and columns with a similar sparsity pattern, and treating some zero entries of the reordered matrix as nonzero elements, and the nonzero blocks as dense, with a little sacrifi ce of memory. The reordering results in linear systems with blocks of variable size in general. Our recently developed parallel package pVBARMS (parallel variable block algebraic recursive multilevel solver) for distributed memory computers takes advantage of these frequently occurring structures in the design of the multilevel incomplete LU factorization preconditioner, and maximizes computational effi ciency achieving increased throughput during the computation and improved reliability on realistic applications. The method detects automatically any existing block structure in the matrix without any user’s prior knowledge of the underlying problem, and exploits it to maximize computational effi ciency. We describe a novel parallel MPI-based implementation of pVBARMS for distributed memory computers based on the block Jacobi, the additive Schwarz and the Schur-complement methods. We address hybrid MPI and OpenMP implementations and gain further parallelism using Many-Integrated Codes (MIC) technology. Implementation details are always critical aspects to consider in the design of sparse matrix algorithms. Therefore, we revisit our original implementation of the partial (block) factorization step with a careful selection of the parameters, and we compare different algorithms for computing the block ordering based on either graph or matrix analysis. Finally, we report on the numerical and parallel scalability of the pVBARMS package for solving turbulent Navier-Stokes equations on a suite of two- and three-dimensional test cases, among which the calculation of the fl ow past the DPW3-W1 wing confi guration of the third AIAA Drag Prediction Workshop, which is the application that motivated this study. In our formulation the mean fl ow and turbulent transport equations are solved in coupled form using a Newton-Krylov algorithm. These analyses are carried with coarse- to medium-sized grids featuring up to 6.5 million nodes at Reynolds number equal to 5∙10^6.

Parallel performance enhancements of pVBARMS, a package for parallel variable block algebraic multilevel solvers, with applications to turbulent fl ow simulations

BONFIGLIOLI, Aldo
2014-01-01

Abstract

The starting point for this project was the iterative solution of sparse and block structured linear systems that arise from the analysis of turbulent fl ows in Computational Fluid Dynamics applications. In the last two years we have studied preconditioning techniques based on block multilevel incomplete LU factorization preconditioners for this problem class, and we have found them to be quite effective in reducing the number of GMRES iterations. These preconditioners exhibit better parallelism than standard ILU algorithms due to the recursive factorization and they may be noticeably more robust for comparable memory usage especially for solving large problems. Additionally, exploiting the available block structure in the matrix can maximize computational effi ciency. Sparse matrices arising from the solution of systems of partial differential equations often exhibit a perfect block structure, meaning that the nonzero blocks in the sparsity pattern are fully dense (and typically small), e.g., when several unknown quantities are associated with the same grid point. However, similar block orderings can be sometimes unravelled also on general unstructured matrices, by ordering consecutively rows and columns with a similar sparsity pattern, and treating some zero entries of the reordered matrix as nonzero elements, and the nonzero blocks as dense, with a little sacrifi ce of memory. The reordering results in linear systems with blocks of variable size in general. Our recently developed parallel package pVBARMS (parallel variable block algebraic recursive multilevel solver) for distributed memory computers takes advantage of these frequently occurring structures in the design of the multilevel incomplete LU factorization preconditioner, and maximizes computational effi ciency achieving increased throughput during the computation and improved reliability on realistic applications. The method detects automatically any existing block structure in the matrix without any user’s prior knowledge of the underlying problem, and exploits it to maximize computational effi ciency. We describe a novel parallel MPI-based implementation of pVBARMS for distributed memory computers based on the block Jacobi, the additive Schwarz and the Schur-complement methods. We address hybrid MPI and OpenMP implementations and gain further parallelism using Many-Integrated Codes (MIC) technology. Implementation details are always critical aspects to consider in the design of sparse matrix algorithms. Therefore, we revisit our original implementation of the partial (block) factorization step with a careful selection of the parameters, and we compare different algorithms for computing the block ordering based on either graph or matrix analysis. Finally, we report on the numerical and parallel scalability of the pVBARMS package for solving turbulent Navier-Stokes equations on a suite of two- and three-dimensional test cases, among which the calculation of the fl ow past the DPW3-W1 wing confi guration of the third AIAA Drag Prediction Workshop, which is the application that motivated this study. In our formulation the mean fl ow and turbulent transport equations are solved in coupled form using a Newton-Krylov algorithm. These analyses are carried with coarse- to medium-sized grids featuring up to 6.5 million nodes at Reynolds number equal to 5∙10^6.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/86493
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact