000910530 001__ 910530
000910530 005__ 20250314084120.0
000910530 0247_ $$2doi$$a10.1109/CLUSTER51413.2022.00066
000910530 0247_ $$2Handle$$a2128/32176
000910530 0247_ $$2WOS$$aWOS:000920273100051
000910530 037__ $$aFZJ-2022-03912
000910530 041__ $$aEnglish
000910530 1001_ $$0P:(DE-HGF)0$$aRojas, Elvis$$b0$$eCorresponding author
000910530 1112_ $$a2022 IEEE International Conference on Cluster Computing$$cHeidelberg$$d2022-09-06 - 2022-09-09$$gCLUSTER$$wGermany
000910530 245__ $$aEarly Experiences of Noise-Sensitivity Performance Analysis of a Distributed Deep Learning Framework
000910530 260__ $$bIEEE$$c2022
000910530 300__ $$a516-522
000910530 3367_ $$2ORCID$$aCONFERENCE_PAPER
000910530 3367_ $$033$$2EndNote$$aConference Paper
000910530 3367_ $$2BibTeX$$aINPROCEEDINGS
000910530 3367_ $$2DRIVER$$aconferenceObject
000910530 3367_ $$2DataCite$$aOutput Types/Conference Paper
000910530 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1666937882_5222
000910530 520__ $$aDeep Learning (DL) applications are used to solve complex problems efficiently. These applications require complex neural network models composed of millions of parameters and huge amounts of data for proper training. This is only possible by parallelizing the necessary computations by so-called distributed deep learning (DDL) frameworks over many GPUs distributed over multiple nodes of a HPC cluster. These frameworks mostly utilize the compute power of the GPUs and use only a small portion of the available compute power of the CPUs in the nodes for I/O and inter-process communication, leaving many CPU cores idle and unused. The more powerful the base CPU in the cluster nodes, the more compute resources are wasted. In this paper, we investigate how much of this unutilized compute resources could be used for executing other applications without lowering the performance of the DDL frameworks. In our experiments, we executed a noise-generation application, which generates a very-high memory, network or I/O load, in parallel with DDL frameworks, and use HPC profiling and tracing techniques to determine whether and how the generated noise is affecting the performance of the DDL frameworks. Early results indicate that it might be possible to utilize the idle cores for jobs of other users without affecting the performance of the DDL applications in a negative way.
000910530 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
000910530 536__ $$0G:(GEPRIS)449683531$$aExtraNoise – Leistungsanalyse von HPC-Anwendungen in verrauschten Umgebungen (449683531)$$c449683531$$x1
000910530 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x2
000910530 588__ $$aDataset connected to DataCite
000910530 7001_ $$0P:(DE-Juel1)132163$$aKnobloch, Michael$$b1$$ufzj
000910530 7001_ $$0P:(DE-Juel1)188664$$aDaoud, Nour$$b2$$ufzj
000910530 7001_ $$0P:(DE-HGF)0$$aMeneses, Esteban$$b3
000910530 7001_ $$0P:(DE-Juel1)132199$$aMohr, Bernd$$b4$$ufzj
000910530 773__ $$a10.1109/CLUSTER51413.2022.00066
000910530 8564_ $$uhttps://juser.fz-juelich.de/record/910530/files/HPCEuropeLatAm2022.pdf$$yOpenAccess$$zStatID:(DE-HGF)0510
000910530 909CO $$ooai:juser.fz-juelich.de:910530$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000910530 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a Costa Rica Institute of Technology$$b0
000910530 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a National University of Costa Rica$$b0
000910530 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132163$$aForschungszentrum Jülich$$b1$$kFZJ
000910530 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188664$$aForschungszentrum Jülich$$b2$$kFZJ
000910530 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a Costa Rica Institute of Technology$$b3
000910530 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a Costa Rica National High Technology Center$$b3
000910530 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132199$$aForschungszentrum Jülich$$b4$$kFZJ
000910530 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
000910530 9141_ $$y2022
000910530 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000910530 920__ $$lno
000910530 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000910530 980__ $$acontrib
000910530 980__ $$aVDB
000910530 980__ $$aUNRESTRICTED
000910530 980__ $$aI:(DE-Juel1)JSC-20090406
000910530 9801_ $$aFullTexts