000852392 001__ 852392
000852392 005__ 20221109161716.0
000852392 0247_ $$2doi$$a10.1002/cpe.4905
000852392 0247_ $$2ISSN$$a1040-3108
000852392 0247_ $$2ISSN$$a1096-9128
000852392 0247_ $$2ISSN$$a1532-0626
000852392 0247_ $$2ISSN$$a1532-0634
000852392 0247_ $$2Handle$$a2128/20250
000852392 0247_ $$2WOS$$aWOS:000450236200021
000852392 037__ $$aFZJ-2018-05355
000852392 041__ $$aEnglish
000852392 082__ $$a004
000852392 1001_ $$00000-0003-2649-9236$$aDavidović, Davor$$b0$$eCorresponding author
000852392 245__ $$aAccelerating the computation of FLAPW methods on heterogeneous architectures
000852392 260__ $$aChichester$$bWiley$$c2018
000852392 3367_ $$2DRIVER$$aarticle
000852392 3367_ $$2DataCite$$aOutput Types/Journal article
000852392 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1543505419_22948
000852392 3367_ $$2BibTeX$$aARTICLE
000852392 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000852392 3367_ $$00$$2EndNote$$aJournal Article
000852392 520__ $$aLegacy codes in computational science and engineering have been very successful in providing essential functionality to researchers. However, they are not capable of exploiting the massive parallelism provided by emerging heterogeneous architectures. The lack of portable performance and scalability puts them at high risk, ie, either they evolve or they are destined to be executed on older platforms and small clusters. One example of a legacy code which would heavily benefit from a modern redesign is FLEUR, a software for electronic structure calculations. In previous work, the computational bottleneck of FLEUR was partially reengineered to have a modular design that relies on standard building blocks, namely, BLAS and LAPACK libraries. In this paper, we demonstrate how the initial redesign enables the portability to heterogeneous architectures. More specifically, we study different approaches to port the code to architectures consisting of multi-core CPUs equipped with one or more coprocessors such as Nvidia GPUs and Intel Xeon Phis. Our final code attains over 70% of the architectures' peak performance, and outperforms Nvidia's and Intel's libraries. On JURECA, the large tier-0 cluster where FLEUR is often executed, the code takes advantage of the full power of the computing nodes, attaining 5× speedup over the sole use of the CPUs.
000852392 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000852392 536__ $$0G:(DE-Juel1)SDLQM$$aSimulation and Data Laboratory Quantum Materials (SDLQM) (SDLQM)$$cSDLQM$$fSimulation and Data Laboratory Quantum Materials (SDLQM)$$x2
000852392 588__ $$aDataset connected to CrossRef
000852392 7001_ $$0P:(DE-HGF)0$$aFabregat-Traver, Diego$$b1
000852392 7001_ $$0P:(DE-HGF)0$$aHöhnerbach, Markus$$b2
000852392 7001_ $$0P:(DE-Juel1)144723$$aDi Napoli, Edoardo$$b3
000852392 773__ $$0PERI:(DE-600)2052606-4$$a10.1002/cpe.4905$$gp. e4905 -$$n24$$pe4905 -$$tConcurrency and computation$$v30$$x1532-0626$$y2018
000852392 8564_ $$uhttps://juser.fz-juelich.de/record/852392/files/1712.07206.pdf$$yOpenAccess
000852392 909CO $$ooai:juser.fz-juelich.de:852392$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000852392 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144723$$aForschungszentrum Jülich$$b3$$kFZJ
000852392 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000852392 9141_ $$y2018
000852392 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000852392 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology
000852392 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bCONCURR COMP-PRACT E : 2015
000852392 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000852392 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000852392 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5
000852392 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000852392 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000852392 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000852392 920__ $$lno
000852392 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000852392 980__ $$ajournal
000852392 980__ $$aVDB
000852392 980__ $$aUNRESTRICTED
000852392 980__ $$aI:(DE-Juel1)JSC-20090406
000852392 9801_ $$aFullTexts