001     852392
005     20221109161716.0
024 7 _ |a 10.1002/cpe.4905
|2 doi
024 7 _ |a 1040-3108
|2 ISSN
024 7 _ |a 1096-9128
|2 ISSN
024 7 _ |a 1532-0626
|2 ISSN
024 7 _ |a 1532-0634
|2 ISSN
024 7 _ |a 2128/20250
|2 Handle
024 7 _ |a WOS:000450236200021
|2 WOS
037 _ _ |a FZJ-2018-05355
041 _ _ |a English
082 _ _ |a 004
100 1 _ |a Davidović, Davor
|0 0000-0003-2649-9236
|b 0
|e Corresponding author
245 _ _ |a Accelerating the computation of FLAPW methods on heterogeneous architectures
260 _ _ |a Chichester
|c 2018
|b Wiley
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1543505419_22948
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Legacy codes in computational science and engineering have been very successful in providing essential functionality to researchers. However, they are not capable of exploiting the massive parallelism provided by emerging heterogeneous architectures. The lack of portable performance and scalability puts them at high risk, ie, either they evolve or they are destined to be executed on older platforms and small clusters. One example of a legacy code which would heavily benefit from a modern redesign is FLEUR, a software for electronic structure calculations. In previous work, the computational bottleneck of FLEUR was partially reengineered to have a modular design that relies on standard building blocks, namely, BLAS and LAPACK libraries. In this paper, we demonstrate how the initial redesign enables the portability to heterogeneous architectures. More specifically, we study different approaches to port the code to architectures consisting of multi-core CPUs equipped with one or more coprocessors such as Nvidia GPUs and Intel Xeon Phis. Our final code attains over 70% of the architectures' peak performance, and outperforms Nvidia's and Intel's libraries. On JURECA, the large tier-0 cluster where FLEUR is often executed, the code takes advantage of the full power of the computing nodes, attaining 5× speedup over the sole use of the CPUs.
536 _ _ |a 511 - Computational Science and Mathematical Methods (POF3-511)
|0 G:(DE-HGF)POF3-511
|c POF3-511
|f POF III
|x 0
536 _ _ |a Simulation and Data Laboratory Quantum Materials (SDLQM) (SDLQM)
|0 G:(DE-Juel1)SDLQM
|c SDLQM
|f Simulation and Data Laboratory Quantum Materials (SDLQM)
|x 2
588 _ _ |a Dataset connected to CrossRef
700 1 _ |a Fabregat-Traver, Diego
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Höhnerbach, Markus
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Di Napoli, Edoardo
|0 P:(DE-Juel1)144723
|b 3
773 _ _ |a 10.1002/cpe.4905
|g p. e4905 -
|0 PERI:(DE-600)2052606-4
|n 24
|p e4905 -
|t Concurrency and computation
|v 30
|y 2018
|x 1532-0626
856 4 _ |u https://juser.fz-juelich.de/record/852392/files/1712.07206.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:852392
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)144723
913 1 _ |a DE-HGF
|b Key Technologies
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|2 G:(DE-HGF)POF3-500
|v Computational Science and Mathematical Methods
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF3
|l Supercomputing & Big Data
914 1 _ |y 2018
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1160
|2 StatID
|b Current Contents - Engineering, Computing and Technology
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b CONCURR COMP-PRACT E : 2015
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
915 _ _ |a WoS
|0 StatID:(DE-HGF)0111
|2 StatID
|b Science Citation Index Expanded
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Thomson Reuters Master Journal List
920 _ _ |l no
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21