000912024 001__ 912024
000912024 005__ 20230310131333.0
000912024 0247_ $$2Handle$$a2128/32842
000912024 037__ $$aFZJ-2022-05254
000912024 041__ $$aEnglish
000912024 1001_ $$0P:(DE-Juel1)190575$$aBaumann, Thomas$$b0$$eCorresponding author$$ufzj
000912024 1112_ $$aTUHH Institutsseminar$$cHamburg$$wGermany
000912024 245__ $$aResilience in (Time-Parallel) Spectral Deferred Corrections$$f2022-05-09 - 
000912024 260__ $$c2022
000912024 3367_ $$033$$2EndNote$$aConference Paper
000912024 3367_ $$2DataCite$$aOther
000912024 3367_ $$2BibTeX$$aINPROCEEDINGS
000912024 3367_ $$2ORCID$$aLECTURE_SPEECH
000912024 3367_ $$0PUB:(DE-HGF)31$$2PUB:(DE-HGF)$$aTalk (non-conference)$$btalk$$mtalk$$s1669699328_21286$$xInvited
000912024 3367_ $$2DINI$$aOther
000912024 520__ $$aAdvancement in computational speed is nowadays gained by using more processing units rather than faster ones. Faults in the processing units caused by numerous sources including radiation and aging have been neglected in the past. However, the increasing size of HPC machines makes them more susceptible and it is important to develop a resilience strategy to avoid losing millions of CPU hours. Parallel-in-time methods target the very largest of computers and are hence required to come with algorithm-based fault tolerance. We look here at spectral deferred corrections (SDC), which is a time marching scheme that is at the heart of parallel-in-time methods such as PFASST. Due to its iterative nature, there is ample opportunity to plug in computationally inexpensive fault tolerance schemes, many of which are also easy to implement. We experimentally examine the capability of various strategies to recover from single bit flips in time serial SDC, which will later be applied to parallel-in-time methods.
000912024 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
000912024 536__ $$0G:(GEPRIS)450829162$$aDFG project 450829162 - Raum-Zeit-parallele Simulation multimodale Energiesystemen (450829162)$$c450829162$$x1
000912024 536__ $$0G:(EU-Grant)955701$$aTIME-X - TIME parallelisation: for eXascale computing and beyond (955701)$$c955701$$fH2020-JTI-EuroHPC-2019-1$$x2
000912024 8564_ $$uhttps://juser.fz-juelich.de/record/912024/files/resilientSDC_longVersion.pdf$$yOpenAccess
000912024 909CO $$ooai:juser.fz-juelich.de:912024$$pec_fundedresources$$pdriver$$pVDB$$popen_access$$popenaire
000912024 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190575$$aForschungszentrum Jülich$$b0$$kFZJ
000912024 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
000912024 9141_ $$y2022
000912024 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000912024 920__ $$lyes
000912024 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000912024 9801_ $$aFullTexts
000912024 980__ $$atalk
000912024 980__ $$aVDB
000912024 980__ $$aUNRESTRICTED
000912024 980__ $$aI:(DE-Juel1)JSC-20090406