Conference Presentation (After Call) FZJ-2022-05211

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Resilience in (Time-Parallel) Spectral Deferred Corrections



2022

Time-X Annual Meeting, LeuvenLeuven, Belgium, 25 Apr 2022 - 27 Apr 20222022-04-252022-04-27

Please use a persistent id in citations:

Abstract: Advancement in computational speed is nowadays gained by using more processing units rather than faster ones. Faults in the processing units caused by numerous sources including radiation and aging have been neglected in the past. However, the increasing size of HPC machines makes them more susceptible and it is important to develop a resilience strategy to avoid losing millions of CPU hours. Parallel-in-time methods target the very largest of computers and are hence required to come with algorithm-based fault tolerance. We look here at spectral deferred corrections (SDC), which is a time marching scheme that is at the heart of parallel-in-time methods such as PFASST. Due to its iterative nature, there is ample opportunity to plug in computationally inexpensive fault tolerance schemes, many of which are also easy to implement. We experimentally examine the capability of various strategies to recover from single bit flips both for serial SDC as well as the time-parallel extension referred to as block Gauß-Seidel SDC.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  2. DFG project 450829162 - Raum-Zeit-parallele Simulation multimodale Energiesystemen (450829162) (450829162)
  3. TIME-X - TIME parallelisation: for eXascale computing and beyond (955701) (955701)

Appears in the scientific report 2022
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Präsentationen > Konferenzvorträge
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2022-11-28, letzte Änderung am 2023-03-10


OpenAccess:
Volltext herunterladen PDF
Externer link:
Volltext herunterladenFulltext by OpenAccess repository
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)