Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids
arxiv(2024)
摘要
This paper introduces a code generator designed for node-level optimized,
extreme-scalable, matrix-free finite element operators on hybrid tetrahedral
grids. It optimizes the local evaluation of bilinear forms through various
techniques including tabulation, relocation of loop invariants, and
inter-element vectorization - implemented as transformations of an abstract
syntax tree. A key contribution is the development, analysis, and generation of
efficient loop patterns that leverage the local structure of the underlying
tetrahedral grid. These significantly enhance cache locality and arithmetic
intensity, mitigating bandwidth-pressure associated with compute-sparse,
low-order operators. The paper demonstrates the generator's capabilities
through a comprehensive educational cycle of performance analysis, bottleneck
identification, and emission of dedicated optimizations. For three differential
operators (-Δ, -∇· (k(𝐱) ∇ ),
α(𝐱) 𝐜𝐮𝐫𝐥 𝐜𝐮𝐫𝐥 + β(𝐱)), we
determine the set of most effective optimizations. Applied by the generator,
they result in speed-ups of up to 58× compared to reference
implementations. Detailed node-level performance analysis yields matrix-free
operators with a throughput of 1.3 to 2.1 GDoF/s, achieving up to 62
performance on a 36-core Intel Ice Lake socket. Finally, the solution of the
curl-curl problem with more than a trillion (10^12) degrees of freedom on
21504 processes in less than 50 seconds demonstrates the generated operators'
performance and extreme-scalability as part of a full multigrid solver.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要