000911978 001__ 911978
000911978 005__ 20230310131330.0
000911978 0247_ $$2Handle$$a2128/32852
000911978 037__ $$aFZJ-2022-05208
000911978 041__ $$aEnglish
000911978 1001_ $$0P:(DE-Juel1)190575$$aBaumann, Thomas$$b0$$eCorresponding author$$ufzj
000911978 1112_ $$a11th Parallel-in-Time Workshop$$cMarseilles$$d2022-07-11 - 2022-07-15$$wFrance
000911978 245__ $$aResilience in Spectral Deferred Corrections
000911978 260__ $$c2022
000911978 3367_ $$033$$2EndNote$$aConference Paper
000911978 3367_ $$2BibTeX$$aINPROCEEDINGS
000911978 3367_ $$2DRIVER$$aconferenceObject
000911978 3367_ $$2ORCID$$aCONFERENCE_POSTER
000911978 3367_ $$2DataCite$$aOutput Types/Conference Poster
000911978 3367_ $$0PUB:(DE-HGF)24$$2PUB:(DE-HGF)$$aPoster$$bposter$$mposter$$s1669702263_13478$$xAfter Call
000911978 520__ $$aAdvancement in computational speed is nowadays gained by using more processing units rather than faster ones. Faults in the processing units caused by numerous sources including radiation and aging have been neglected in the past. However, the increasing size of HPC machines makes them more susceptible and it is important to develop a resilience strategy to avoid losing millions of CPU hours. Parallel-in-time methods target the very largest of computers and are hence required to come with algorithm-based fault tolerance. We look here at spectral deferred corrections (SDC), which is a time marching scheme that is at the heart of parallel-in-time methods such as PFASST. Due to its iterative nature, there is ample opportunity to plug in computationally inexpensive fault tolerance schemes, many of which are also easy to implement. We experimentally examine the capability of various strategies to recover from single bit flips both for serial SDC as well as a small-scale parallel-in-time version with diagonal preconditioners.
000911978 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
000911978 536__ $$0G:(GEPRIS)450829162$$aDFG project 450829162 - Raum-Zeit-parallele Simulation multimodale Energiesystemen (450829162)$$c450829162$$x1
000911978 536__ $$0G:(EU-Grant)955701$$aTIME-X - TIME parallelisation: for eXascale computing and beyond (955701)$$c955701$$fH2020-JTI-EuroHPC-2019-1$$x2
000911978 7001_ $$0P:(DE-HGF)0$$aGötschel, Sebastian$$b1
000911978 7001_ $$0P:(DE-HGF)0$$aLunet, Thibaut$$b2
000911978 7001_ $$0P:(DE-HGF)0$$aRuprecht, Daniel$$b3
000911978 7001_ $$0P:(DE-Juel1)169281$$aSchöbel, Ruth$$b4$$ufzj
000911978 7001_ $$0P:(DE-Juel1)132268$$aSpeck, Robert$$b5$$ufzj
000911978 8564_ $$uhttps://juser.fz-juelich.de/record/911978/files/Poster.pdf$$yOpenAccess
000911978 909CO $$ooai:juser.fz-juelich.de:911978$$pec_fundedresources$$pdriver$$pVDB$$popen_access$$popenaire
000911978 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190575$$aForschungszentrum Jülich$$b0$$kFZJ
000911978 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a TUHH$$b1
000911978 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a TUHH$$b2
000911978 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a TUHH$$b3
000911978 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)169281$$aForschungszentrum Jülich$$b4$$kFZJ
000911978 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132268$$aForschungszentrum Jülich$$b5$$kFZJ
000911978 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
000911978 9141_ $$y2022
000911978 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000911978 920__ $$lyes
000911978 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000911978 980__ $$aposter
000911978 980__ $$aVDB
000911978 980__ $$aUNRESTRICTED
000911978 980__ $$aI:(DE-Juel1)JSC-20090406
000911978 9801_ $$aFullTexts