Extreme-scaling applications en route to exascale

Brömmel, Dirk; Wylie, Brian J. N.; Frings, Wolfgang

doi:10.1145/2938615.2938616

Contribution to a conference proceedings/Contribution to a book

FZJ-2016-04722

Extreme-scaling applications en route to exascale

Brömmel, D.FZJ* ; Frings, W.FZJ* ; Wylie, B. J. N. (Corresponding author)FZJ*

2016
ACM Press New York

Proceedings of the Exascale Applications and Software Conference 2016
Exascale Applications and Software Conference 2016, EASC'16, Stockholm, Sweden, 26 Apr 2016 - 29 Apr 2016 New York : ACM Press 10 pp. (2016) [10.1145/2938615.2938616]

This record in other databases:

Please use a persistent id in citations: http://hdl.handle.net/2128/12345 doi:10.1145/2938615.2938616

Abstract: Feedback from the previous year's very successful workshop motivated the organisation of a three-day workshop from 1 to 3 February 2016, during which the 28-rack JUQUEEN BlueGene/Q system with 458 752 cores was reserved for over 50 hours. Eight international code teams were selected to use this opportunity to investigate and improve their application scalability, assisted by staff from JSC Simulation Laboratories and Cross-Sectional Teams. Ultimately seven teams had codes successfully run on the full JUQUEEN system. Strong scalability demonstrated by Code Saturne and Seven-League Hydro, both using 4 OpenMP threads for 16 MPI processes on each compute node for a total of 1 835 008 threads, qualify them for High-Q Club membership. Existing members CIAO and iFETI were able to show that they had additional solvers which also scaled acceptably. Furthermore, large-scale in-situ interactive visualisation was demonstrated with a CIAO simulation using 458 752 MPI processes running on 28 racks coupled via JUSITU to VisIt. The two adaptive mesh refinement utilities, ICI and p4est, showed that they could respectively scale to run with 458 752 and 971 504 MPI ranks, but both encountered problems loading large meshes. Parallel file I/O issues also hindered large-scale executions of PFLOTRAN. Poor performance of a NEST-import module which loaded and connected 1.9 TiB of neuron and synapse data was tracked down to an internal data-structure mismatch with the HDF5 file objects that prevented use of MPI collective file reading, which when rectified is expected to enable large-scale neuronal network simulations.Comparative analysis is provided to the 25 codes in the High-Q Club at the start of 2016, which includes five codes that qualified from the previous workshop. Despite more mixed results, we learnt more about application file I/O limitations and inefficiencies which continue to be the primary inhibitor to large-scale simulations.

Contributing Institute(s):