Three Dirac operators on two architectures with one piece of code and no hassle

Durr, Stephan

Contribution to a conference proceedings/Contribution to a book

FZJ-2019-00526

Three Dirac operators on two architectures with one piece of code and no hassle

Durr, S. (Corresponding author)FZJ*

2018
SISSA Trieste

36th Annual International Symposium on Lattice Field Theory, Lattice 2018, Lattice 2018, East Lansing, USA, 22 Jul 2018 - 28 Jul 2018 Trieste : SISSA, Proceedings of Science LATTICE2018, 7 p. (2018)

This record in other databases:

Please use a persistent id in citations: http://hdl.handle.net/2128/21382

Abstract: A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for N_v right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for N_c=3, N_v=12 read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively.

Contributing Institute(s):