Efficient Application of Hanging-Node Constraints for Matrix-Free High-Order FEM Computations on CPU and GPU


This contribution presents an efficient algorithm for resolving hanging-node constraints on the fly for high-order finite-element computations on adaptively refined meshes, using matrix-free implementations. We concentrate on unstructured hex-dominated meshes and on multi-component elements with nodal Lagrange shape functions in at least one of their components. The application of general constraints is split up into two distinct operators, one specialized in the hanging-node part and a generic one for the remaining constraints, such as Dirichlet boundary conditions. The former implements in-face interpolations efficiently by a sequence of 1D interpolations with sum factorization according to the refinement configuration of the cell. We discuss ways to efficiently encode and decode such refinement configurations. Furthermore, we present distinct differences in the interpolation step on GPU and CPU, as well as compare different vectorization strategies for the latter. Experimental comparisons with a state-of-the-art algorithm that does not exploit the tensor-product structure show that, on CPUs, the additional costs of cells with hanging-node constraints can be reduced by a factor of 5–10 for a Laplace operator evaluation with high-order elements (k≥3) and affine meshes. For non-affine meshes, the costs for the application of hanging-node constraints can be completely hidden behind the memory transfer. The algorithm has been integrated into the open-source finite-element library deal.II.
QR Code: Link to publication