## JURECA Porting and Tuning Workshop In the past couple of years, Jülich Supercomputing Centre (JSC) has been running Porting and Tuning Workshops on its highly scalable BlueGene/Q system JUQUEEN. These workshops attracted up to 47 participants from our wide, European user base and focussed on the specialized hardware and software of BlueGene/Q. The expected trend for the near future is a continued increase in complexity for the supercomputers to come. A precursor of this trend is the latest system installed in Jülich, the general purpose cluster JURECA, the "Jülich Research on Exascale Cluster Architectures" system. While it is comprised of standard Intel processors and thus appears to be a off-the-shelf computer, with a high-speed Mellanox EDR InfiniBand network, nodes with varying amounts of memory and additional 75 GPU nodes with 2 NVIDIA K80 accelerators each it is not. The complexity starts with the Intel Xeon chips supporting simultaneous multithreading with up to 48 threads per node, each of them being able to use FMA instructions and wide SIMD vectors. The latter being essential to reach peak performance. Ideally, this can be combined with the available GPUs, adding another way to parallelize codes and increasing the complexity when it comes to coordinating distributed and shared memory parallelization with executing kernels on the accelerators and transferring the necessary data. The workshop will take place June 6-8., its goal will be making the users aware of the increasing effort to reach peak performance and suggesting possible routes for this task. The topics will include best practices for JURECA, possibilities for visualization, and scientific big data analytics. We will also cover efficient I/O and ways to achieve multi-threading and vectorization. To facilitate GPU programming, we will compare OpenCL, OpenACC and CUDA. Special focus will be put on node-level performance and performance analysis. Since this is a very broad range of topics, we will only have time to introduce the ideas and highlight their applicability. At the heart of the workshop will be extensive hands-on sessions with the participants' codes, aimed at helping with porting applications to JU-RECA and understanding performance bottlenecks. This will be supervised by members of staff from JSC's Simulation Laboratories and cross-sectional teams Application Optimization, Performance Analysis, and Mathematical Methods and Algorithms. At the end of the workshop the participants should have their codes running on JURECA and have a clear picture on how to improve the performance. contact: Dirk Brömmel, d.broemmel@fz-juelich.de Dirk Brommel Julich Supercomputing Centre (JSC), Germany