000820439 001__ 820439
000820439 005__ 20221109161714.0
000820439 0247_ $$2arXiv$$aarXiv:1611.00606
000820439 0247_ $$2Handle$$a2128/12845
000820439 0247_ $$2DOI$$a10.1007/978-3-319-53862-4_17
000820439 037__ $$aFZJ-2016-05749
000820439 041__ $$aEnglish
000820439 1001_ $$0P:(DE-HGF)0$$aFabregat-Traver, Diego$$b0$$eCorresponding author
000820439 1112_ $$aJARA High-Performance Computing Symposium$$cAachen$$d2016-10-04 - 2016-10-05$$gJHPCS$$wGermany
000820439 245__ $$aHybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods
000820439 260__ $$bSpringer-Verlag$$c2016
000820439 300__ $$a200-211
000820439 3367_ $$2ORCID$$aCONFERENCE_PAPER
000820439 3367_ $$033$$2EndNote$$aConference Paper
000820439 3367_ $$2BibTeX$$aINPROCEEDINGS
000820439 3367_ $$2DRIVER$$aconferenceObject
000820439 3367_ $$2DataCite$$aOutput Types/Conference Paper
000820439 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1522149178_28199
000820439 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
000820439 4900_ $$aLecture Notes in Computer Science$$v10164
000820439 520__ $$aIn this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum J\'ulich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5x with respect to our optimized shared-memory code, which in turn means between 7.5x and 12.5x speedup with respect to the original FLEUR code.
000820439 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000820439 536__ $$0G:(DE-Juel1)SDLQM$$aSimulation and Data Laboratory Quantum Materials (SDLQM) (SDLQM)$$cSDLQM$$fSimulation and Data Laboratory Quantum Materials (SDLQM)$$x2
000820439 588__ $$aDataset connected to arXivarXiv
000820439 7001_ $$0P:(DE-HGF)0$$aDavidović, Davor$$b1
000820439 7001_ $$0P:(DE-HGF)0$$aHöhnerbach, Markus$$b2
000820439 7001_ $$0P:(DE-Juel1)144723$$aDi Napoli, Edoardo$$b3$$ufzj
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.pdf$$yOpenAccess
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.gif?subformat=icon$$xicon$$yOpenAccess
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000820439 8564_ $$uhttps://juser.fz-juelich.de/record/820439/files/1611.00606v1.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000820439 909CO $$ooai:juser.fz-juelich.de:820439$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000820439 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144723$$aForschungszentrum Jülich$$b3$$kFZJ
000820439 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000820439 9141_ $$y2016
000820439 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000820439 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000820439 980__ $$acontrib
000820439 980__ $$aVDB
000820439 980__ $$acontb
000820439 980__ $$aI:(DE-Juel1)JSC-20090406
000820439 980__ $$aUNRESTRICTED
000820439 9801_ $$aFullTexts