001008234 001__ 1008234
001008234 005__ 20231027114406.0
001008234 0247_ $$2doi$$a10.1186/s40537-023-00765-w
001008234 0247_ $$2Handle$$a2128/34534
001008234 0247_ $$2WOS$$aWOS:001005042700001
001008234 037__ $$aFZJ-2023-02265
001008234 082__ $$a004
001008234 1001_ $$0P:(DE-Juel1)180916$$aAach, Marcel$$b0$$eCorresponding author$$ufzj
001008234 245__ $$aLarge scale performance analysis of distributed deep learning frameworks for convolutional neural networks
001008234 260__ $$aHeidelberg [u.a.]$$bSpringerOpen$$c2023
001008234 3367_ $$2DRIVER$$aarticle
001008234 3367_ $$2DataCite$$aOutput Types/Journal article
001008234 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1687164172_8962
001008234 3367_ $$2BibTeX$$aARTICLE
001008234 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001008234 3367_ $$00$$2EndNote$$aJournal Article
001008234 520__ $$aContinuously increasing data volumes from multiple sources, such as simulation and experimental measurements, demand efficient algorithms for an analysis within a realistic timeframe. Deep learning models have proven to be capable of understanding and analyzing large quantities of data with high accuracy. However, training them on massive datasets remains a challenge and requires distributed learning exploiting High-Performance Computing systems. This study presents a comprehensive analysis and comparison of three well-established distributed deep learning frameworks - Horovod, DeepSpeed, and Distributed Data Parallel by PyTorch - with a focus on their runtime performance and scalability. Additionally, the performance of two data loaders, the native PyTorch data loader and the DALI data loader by NVIDIA, is investigated. To evaluate these frameworks and data loaders, three standard ResNet architectures with 50, 101, and 152 layers are tested using the ImageNet dataset. The impact of different learning rate schedulers on validation accuracy is also assessed. The novel contribution lies in the detailed analysis and comparison of these frameworks and data loaders on the state-of-the-art Jülich Wizard for European Leadership Science (JUWELS) Booster system at the Jülich Supercomputing Centre, using up to 1024 A100 NVIDIA GPUs in parallel. Findings show that the DALI data loader significantly reduces the overall runtime of ResNet50 from more than 12 h on 4 GPUs to less than 200 s on 1024 GPUs. The outcomes of this work highlight the potential impact of distributed deep learning using efficient tools on accelerating scientific discoveries and data-driven applications.
001008234 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001008234 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x1
001008234 536__ $$0G:(EU-Grant)951733$$aRAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)$$c951733$$fH2020-INFRAEDI-2019-1$$x2
001008234 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001008234 7001_ $$0P:(DE-Juel1)188268$$aInanc, Eray$$b1$$ufzj
001008234 7001_ $$0P:(DE-Juel1)188513$$aSarma, Rakesh$$b2$$ufzj
001008234 7001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b3$$ufzj
001008234 7001_ $$0P:(DE-Juel1)165948$$aLintermann, Andreas$$b4$$ufzj
001008234 773__ $$0PERI:(DE-600)2780218-8$$a10.1186/s40537-023-00765-w$$gVol. 10, no. 1, p. 96$$n1$$p96$$tJournal of Big Data$$v10$$x2196-1115$$y2023
001008234 8564_ $$uhttps://juser.fz-juelich.de/record/1008234/files/2230d3ad-8ad6-4eae-b64b-6d9010e4082d.pdf$$yOpenAccess
001008234 8767_ $$8SN-2023-00596-b$$92023-10-06$$a1200197295$$d2023-10-18$$eAPC$$jZahlung erfolgt
001008234 909CO $$ooai:juser.fz-juelich.de:1008234$$pdnbdelivery$$popenCost$$pec_fundedresources$$pVDB$$pdriver$$pOpenAPC$$popen_access$$popenaire
001008234 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180916$$aForschungszentrum Jülich$$b0$$kFZJ
001008234 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188268$$aForschungszentrum Jülich$$b1$$kFZJ
001008234 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188513$$aForschungszentrum Jülich$$b2$$kFZJ
001008234 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich$$b3$$kFZJ
001008234 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165948$$aForschungszentrum Jülich$$b4$$kFZJ
001008234 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001008234 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x1
001008234 9141_ $$y2023
001008234 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set
001008234 915pc $$0PC:(DE-HGF)0001$$2APC$$aLocal Funding
001008234 915pc $$0PC:(DE-HGF)0002$$2APC$$aDFG OA Publikationskosten
001008234 915pc $$0PC:(DE-HGF)0003$$2APC$$aDOAJ Journal
001008234 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2022-11-16
001008234 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001008234 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2022-11-16
001008234 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2022-11-16
001008234 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001008234 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2022-11-16
001008234 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2023-05-02T09:11:15Z
001008234 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2023-05-02T09:11:15Z
001008234 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Anonymous peer review$$d2023-05-02T09:11:15Z
001008234 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bJ BIG DATA-GER : 2022$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology$$d2023-10-26
001008234 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bJ BIG DATA-GER : 2022$$d2023-10-26
001008234 920__ $$lyes
001008234 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001008234 9801_ $$aFullTexts
001008234 980__ $$ajournal
001008234 980__ $$aVDB
001008234 980__ $$aUNRESTRICTED
001008234 980__ $$aI:(DE-Juel1)JSC-20090406
001008234 980__ $$aAPC