000911981 001__ 911981 000911981 005__ 20230310131330.0 000911981 0247_ $$2Handle$$a2128/32854 000911981 037__ $$aFZJ-2022-05211 000911981 1001_ $$0P:(DE-Juel1)190575$$aBaumann, Thomas$$b0$$eCorresponding author$$ufzj 000911981 1112_ $$aTime-X Annual Meeting$$cLeuven$$d2022-04-25 - 2022-04-27$$wBelgium 000911981 245__ $$aResilience in (Time-Parallel) Spectral Deferred Corrections 000911981 260__ $$c2022 000911981 3367_ $$033$$2EndNote$$aConference Paper 000911981 3367_ $$2DataCite$$aOther 000911981 3367_ $$2BibTeX$$aINPROCEEDINGS 000911981 3367_ $$2DRIVER$$aconferenceObject 000911981 3367_ $$2ORCID$$aLECTURE_SPEECH 000911981 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1669702322_11585$$xAfter Call 000911981 520__ $$aAdvancement in computational speed is nowadays gained by using more processing units rather than faster ones. Faults in the processing units caused by numerous sources including radiation and aging have been neglected in the past. However, the increasing size of HPC machines makes them more susceptible and it is important to develop a resilience strategy to avoid losing millions of CPU hours. Parallel-in-time methods target the very largest of computers and are hence required to come with algorithm-based fault tolerance. We look here at spectral deferred corrections (SDC), which is a time marching scheme that is at the heart of parallel-in-time methods such as PFASST. Due to its iterative nature, there is ample opportunity to plug in computationally inexpensive fault tolerance schemes, many of which are also easy to implement. We experimentally examine the capability of various strategies to recover from single bit flips both for serial SDC as well as the time-parallel extension referred to as block Gauß-Seidel SDC. 000911981 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0 000911981 536__ $$0G:(GEPRIS)450829162$$aDFG project 450829162 - Raum-Zeit-parallele Simulation multimodale Energiesystemen (450829162)$$c450829162$$x1 000911981 536__ $$0G:(EU-Grant)955701$$aTIME-X - TIME parallelisation: for eXascale computing and beyond (955701)$$c955701$$fH2020-JTI-EuroHPC-2019-1$$x2 000911981 8564_ $$uhttps://juser.fz-juelich.de/record/911981/files/resilientSDC.pdf$$yOpenAccess 000911981 909CO $$ooai:juser.fz-juelich.de:911981$$pec_fundedresources$$pdriver$$pVDB$$popen_access$$popenaire 000911981 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190575$$aForschungszentrum Jülich$$b0$$kFZJ 000911981 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0 000911981 9141_ $$y2022 000911981 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 000911981 920__ $$lyes 000911981 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000911981 9801_ $$aFullTexts 000911981 980__ $$aconf 000911981 980__ $$aVDB 000911981 980__ $$aUNRESTRICTED 000911981 980__ $$aI:(DE-Juel1)JSC-20090406