000824109 001__ 824109 000824109 005__ 20210129224942.0 000824109 037__ $$aFZJ-2016-06733 000824109 041__ $$aEnglish 000824109 1001_ $$0P:(DE-Juel1)145478$$aHerten, Andreas$$b0$$eCorresponding author$$ufzj 000824109 1112_ $$aPerspectives of GPU computing in Science$$cRome$$d2016-09-26 - 2016-09-28$$gGPU2016$$wItaly 000824109 245__ $$aAccelerating Plasma Physics with GPUs 000824109 260__ $$c2016 000824109 3367_ $$033$$2EndNote$$aConference Paper 000824109 3367_ $$2BibTeX$$aINPROCEEDINGS 000824109 3367_ $$2DRIVER$$aconferenceObject 000824109 3367_ $$2ORCID$$aCONFERENCE_POSTER 000824109 3367_ $$2DataCite$$aOutput Types/Conference Poster 000824109 3367_ $$0PUB:(DE-HGF)24$$2PUB:(DE-HGF)$$aPoster$$bposter$$mposter$$s1480074937_10113$$xAfter Call 000824109 502__ $$cSapienza Università di Roma 000824109 520__ $$aJuSPIC is a particle-in-cell (PIC) code, developed in the Simulation Lab for Plasma Physics of the Jülich Supercomputing Centre. The open source code is based on PSC by H. Ruhl, slimmed-down and rewritten in modern Fortran. JuSPIC simulates particles under the influence of electromagnetic fields, using the relativistic Vlasov equation and Maxwell's equations (integrated using the Finite Difference Time Domain scheme). The program uses a regular mesh for the Maxwell fields and the particle charge/current densities. Inside the mesh, quasi-particles with continuous coordinates are modeled via distribution functions. JuSPIC is part of the High-Q club, attesting that it can efficiently scale to the full JUQUEEN supercomputer (currently the #13 on the Top 500 list): 1.8 million threads running on 458 thousand cores can collaboratively compute plasma simulations. Local node-level parallelism is achieved by means of OpenMP, communication between nodes relies on MPI. To leverage the latest generation of supercomputers coming equipped with dedicated accelerator technologies (GPUs and other many-core architectures), JuSPIC is currently being extended. In this poster we present a GPU-accelerated version of the program, making use of different programming models. We show first results of performance studies, comparing OpenACC and CUDA. While OpenACC aims to offer portability and flexibility by means of few changes to the code, the performance of the generated program might suffer in practice. To measure the deficit, the compute-intensive parts of the program are in addition also implemented in CUDA Fortran. To explore scalability properties of the application for static particle distributions on a heterogeneous architecture, we make use of semi-empirical performance models. 000824109 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0 000824109 7001_ $$0P:(DE-Juel1)144441$$aPleiter, Dirk$$b1$$ufzj 000824109 7001_ $$0P:(DE-Juel1)143606$$aBrömmel, Dirk$$b2$$ufzj 000824109 909CO $$ooai:juser.fz-juelich.de:824109$$pVDB 000824109 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145478$$aForschungszentrum Jülich$$b0$$kFZJ 000824109 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144441$$aForschungszentrum Jülich$$b1$$kFZJ 000824109 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)143606$$aForschungszentrum Jülich$$b2$$kFZJ 000824109 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0 000824109 9141_ $$y2016 000824109 915__ $$0StatID:(DE-HGF)0550$$2StatID$$aNo Authors Fulltext 000824109 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000824109 980__ $$aposter 000824109 980__ $$aVDB 000824109 980__ $$aUNRESTRICTED 000824109 980__ $$aI:(DE-Juel1)JSC-20090406