Home > Publications database > Three Dirac operators on two architectures with one piece of code and no hassle |
Contribution to a conference proceedings/Contribution to a book | FZJ-2019-00526 |
2018
SISSA
Trieste
This record in other databases:
Please use a persistent id in citations: http://hdl.handle.net/2128/21382
Abstract: A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for N_v right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for N_c=3, N_v=12 read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively.
![]() |
The record appears in these collections: |