000861601 001__ 861601 000861601 005__ 20250314084118.0 000861601 020__ $$a978-3-030-11987-4 000861601 0247_ $$2doi$$a10.1007/978-3-030-11987-4_6 000861601 0247_ $$2Handle$$a2128/21896 000861601 037__ $$aFZJ-2019-02051 000861601 1001_ $$0P:(DE-Juel1)142180$$aSchlütter, Marc$$b0$$eCorresponding author 000861601 1112_ $$a11th International Workshop on Parallel Tools for High Performance Computing$$cDresden$$d2017-09-11 - 2017-09-12$$wGermany 000861601 245__ $$aSCIPHI Score-P and Cube Extensions for Intel Phi 000861601 260__ $$aCham$$bSpringer International Publishing$$c2019 000861601 29510 $$aTools for High Performance Computing 2017 000861601 300__ $$a85-104 000861601 3367_ $$2ORCID$$aCONFERENCE_PAPER 000861601 3367_ $$033$$2EndNote$$aConference Paper 000861601 3367_ $$2BibTeX$$aINPROCEEDINGS 000861601 3367_ $$2DRIVER$$aconferenceObject 000861601 3367_ $$2DataCite$$aOutput Types/Conference Paper 000861601 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1553601793_29279 000861601 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb 000861601 520__ $$aThe Knights Landing processors offers unique features with regards to memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement infrastructure and the Cube report explorer. With the Knights Landing edition, Intel introduced a new memory architecture, utilizing two types of memory, MCDRAM and DDR4 SDRAM. To assist the user in the decision where to place data structures, we introduce a MCDRAM candidate metric to the Cube report explorer. In addition we track all MCDRAM allocations through the hbwmalloc interface, providing memory metrics like leaked memory or the high-water mark on a per-region basis, as already known for the ubiquitous malloc/free. A Score-P metric plugin that records memory statistics via numastat on a per process level enables a timeline analysis using the Vampir toolset. To get the best performance out of , the large vector processing units need to be utilized effectively. The ratio between computation and data access and the vector processing unit (VPU) intensity are introduced as metrics to identify vectorization candidates on a per-region basis. The Portable Hardware Locality (hwloc) Broquedis et al. (hwloc: a generic framework for managing hardware affinities in hpc applications, 2010 [2]) library allows us to visualize the distribution of the KNL-specific performance metrics within the Cube report explorer, taking the hardware topology consisting of processor tiles and cores into account. 000861601 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0 000861601 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x1 000861601 588__ $$aDataset connected to CrossRef Book 000861601 7001_ $$0P:(DE-Juel1)132244$$aFeld, Christian$$b1 000861601 7001_ $$0P:(DE-Juel1)132249$$aSaviankou, Pavel$$b2 000861601 7001_ $$0P:(DE-Juel1)132163$$aKnobloch, Michael$$b3 000861601 7001_ $$0P:(DE-Juel1)168253$$aHermanns, Marc-André$$b4 000861601 7001_ $$0P:(DE-Juel1)132199$$aMohr, Bernd$$b5 000861601 773__ $$a10.1007/978-3-030-11987-4_6 000861601 8564_ $$uhttps://link.springer.com/chapter/10.1007%2F978-3-030-11987-4_6 000861601 8564_ $$uhttps://juser.fz-juelich.de/record/861601/files/2019_Book_ToolsForHighPerformanceComputi.pdf$$yOpenAccess 000861601 8564_ $$uhttps://juser.fz-juelich.de/record/861601/files/2019_Book_ToolsForHighPerformanceComputi.pdf?subformat=pdfa$$xpdfa$$yOpenAccess 000861601 909CO $$ooai:juser.fz-juelich.de:861601$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)142180$$aForschungszentrum Jülich$$b0$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)142180$$a JSC$$b0 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132244$$aForschungszentrum Jülich$$b1$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)132244$$a JSC$$b1 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132249$$aForschungszentrum Jülich$$b2$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)132249$$a JSC$$b2 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132163$$aForschungszentrum Jülich$$b3$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)132163$$a JSC$$b3 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)168253$$aForschungszentrum Jülich$$b4$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)168253$$a JSC$$b4 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)168253$$a JARA-HPC$$b4 000861601 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132199$$aForschungszentrum Jülich$$b5$$kFZJ 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)132199$$a JSC$$b5 000861601 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)132199$$a JARA-HPC$$b5 000861601 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0 000861601 9141_ $$y2019 000861601 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 000861601 920__ $$lyes 000861601 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000861601 9201_ $$0I:(DE-82)080012_20140620$$kJARA-HPC$$lJARA - HPC$$x1 000861601 980__ $$acontrib 000861601 980__ $$aVDB 000861601 980__ $$aUNRESTRICTED 000861601 980__ $$acontb 000861601 980__ $$aI:(DE-Juel1)JSC-20090406 000861601 980__ $$aI:(DE-82)080012_20140620 000861601 9801_ $$aFullTexts