001     888522
005     20230310131358.0
024 7 _ |a arXiv:2010.13342
|2 arXiv
024 7 _ |a altmetric:93201252
|2 altmetric
037 _ _ |a FZJ-2020-04986
100 1 _ |a Agullo, Emmanuel
|0 P:(DE-HGF)0
|b 0
245 _ _ |a Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
260 _ _ |c 2020
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1607523859_13191
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
500 _ _ |a 45 pages, 3 figures, submitted to The International Journal of High Performance Computing Applications
520 _ _ |a This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge.
536 _ _ |a 511 - Computational Science and Mathematical Methods (POF3-511)
|0 G:(DE-HGF)POF3-511
|c POF3-511
|x 0
|f POF III
536 _ _ |a DFG project 450829162 - Raum-Zeit-parallele Simulation multimodale Energiesystemen (450829162)
|0 G:(GEPRIS)450829162
|c 450829162
|x 1
588 _ _ |a Dataset connected to arXivarXiv
700 1 _ |a Altenbernd, Mirco
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Anzt, Hartwig
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Bautista-Gomez, Leonardo
|0 P:(DE-HGF)0
|b 3
700 1 _ |a Benacchio, Tommaso
|0 P:(DE-HGF)0
|b 4
700 1 _ |a Bonaventura, Luca
|0 P:(DE-HGF)0
|b 5
700 1 _ |a Bungartz, Hans-Joachim
|0 P:(DE-HGF)0
|b 6
700 1 _ |a Chatterjee, Sanjay
|0 P:(DE-HGF)0
|b 7
700 1 _ |a Ciorba, Florina M.
|0 P:(DE-HGF)0
|b 8
700 1 _ |a DeBardeleben, Nathan
|0 P:(DE-HGF)0
|b 9
700 1 _ |a Drzisga, Daniel
|0 P:(DE-HGF)0
|b 10
700 1 _ |a Eibl, Sebastian
|0 P:(DE-HGF)0
|b 11
700 1 _ |a Engelmann, Christian
|0 P:(DE-HGF)0
|b 12
700 1 _ |a Gansterer, Wilfried N.
|0 P:(DE-HGF)0
|b 13
700 1 _ |a Giraud, Luc
|0 P:(DE-HGF)0
|b 14
700 1 _ |a Goeddeke, Dominik
|0 P:(DE-HGF)0
|b 15
700 1 _ |a Heisig, Marco
|0 P:(DE-HGF)0
|b 16
700 1 _ |a Jezequel, Fabienne
|0 P:(DE-HGF)0
|b 17
700 1 _ |a Kohl, Nils
|0 P:(DE-HGF)0
|b 18
700 1 _ |a Li, Xiaoye Sherry
|0 P:(DE-HGF)0
|b 19
700 1 _ |a Lion, Romain
|0 P:(DE-HGF)0
|b 20
700 1 _ |a Mehl, Miriam
|0 P:(DE-HGF)0
|b 21
700 1 _ |a Mycek, Paul
|0 P:(DE-HGF)0
|b 22
700 1 _ |a Obersteiner, Michael
|0 P:(DE-HGF)0
|b 23
700 1 _ |a Quintana-Orti, Enrique S.
|0 P:(DE-HGF)0
|b 24
700 1 _ |a Rizzi, Francesco
|0 P:(DE-HGF)0
|b 25
700 1 _ |a Ruede, Ulrich
|0 P:(DE-HGF)0
|b 26
700 1 _ |a Schulz, Martin
|0 P:(DE-HGF)0
|b 27
700 1 _ |a Fung, Fred
|0 P:(DE-HGF)0
|b 28
700 1 _ |a Speck, Robert
|0 P:(DE-Juel1)132268
|b 29
|u fzj
700 1 _ |a Stals, Linda
|0 P:(DE-HGF)0
|b 30
|e Corresponding author
700 1 _ |a Teranishi, Keita
|0 P:(DE-HGF)0
|b 31
700 1 _ |a Thibault, Samuel
|0 P:(DE-HGF)0
|b 32
700 1 _ |a Thoennes, Dominik
|0 P:(DE-HGF)0
|b 33
700 1 _ |a Wagner, Andreas
|0 P:(DE-HGF)0
|b 34
700 1 _ |a Wohlmuth, Barbara
|0 P:(DE-HGF)0
|b 35
856 4 _ |u https://arxiv.org/abs/2010.13342
909 C O |o oai:juser.fz-juelich.de:888522
|p VDB
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 29
|6 P:(DE-Juel1)132268
913 1 _ |a DE-HGF
|b Key Technologies
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|2 G:(DE-HGF)POF3-500
|v Computational Science and Mathematical Methods
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF3
|l Supercomputing & Big Data
914 1 _ |y 2020
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a preprint
980 _ _ |a VDB
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21