001     1008234
005     20231027114406.0
024 7 _ |a 10.1186/s40537-023-00765-w
|2 doi
024 7 _ |a 2128/34534
|2 Handle
024 7 _ |a WOS:001005042700001
|2 WOS
037 _ _ |a FZJ-2023-02265
082 _ _ |a 004
100 1 _ |a Aach, Marcel
|0 P:(DE-Juel1)180916
|b 0
|e Corresponding author
|u fzj
245 _ _ |a Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks
260 _ _ |a Heidelberg [u.a.]
|c 2023
|b SpringerOpen
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1687164172_8962
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Continuously increasing data volumes from multiple sources, such as simulation and experimental measurements, demand efficient algorithms for an analysis within a realistic timeframe. Deep learning models have proven to be capable of understanding and analyzing large quantities of data with high accuracy. However, training them on massive datasets remains a challenge and requires distributed learning exploiting High-Performance Computing systems. This study presents a comprehensive analysis and comparison of three well-established distributed deep learning frameworks - Horovod, DeepSpeed, and Distributed Data Parallel by PyTorch - with a focus on their runtime performance and scalability. Additionally, the performance of two data loaders, the native PyTorch data loader and the DALI data loader by NVIDIA, is investigated. To evaluate these frameworks and data loaders, three standard ResNet architectures with 50, 101, and 152 layers are tested using the ImageNet dataset. The impact of different learning rate schedulers on validation accuracy is also assessed. The novel contribution lies in the detailed analysis and comparison of these frameworks and data loaders on the state-of-the-art Jülich Wizard for European Leadership Science (JUWELS) Booster system at the Jülich Supercomputing Centre, using up to 1024 A100 NVIDIA GPUs in parallel. Findings show that the DALI data loader significantly reduces the overall runtime of ResNet50 from more than 12 h on 4 GPUs to less than 200 s on 1024 GPUs. The outcomes of this work highlight the potential impact of distributed deep learning using efficient tools on accelerating scientific discoveries and data-driven applications.
536 _ _ |a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5111
|c POF4-511
|f POF IV
|x 0
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 1
536 _ _ |a RAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)
|0 G:(EU-Grant)951733
|c 951733
|f H2020-INFRAEDI-2019-1
|x 2
588 _ _ |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de
700 1 _ |a Inanc, Eray
|0 P:(DE-Juel1)188268
|b 1
|u fzj
700 1 _ |a Sarma, Rakesh
|0 P:(DE-Juel1)188513
|b 2
|u fzj
700 1 _ |a Riedel, Morris
|0 P:(DE-Juel1)132239
|b 3
|u fzj
700 1 _ |a Lintermann, Andreas
|0 P:(DE-Juel1)165948
|b 4
|u fzj
773 _ _ |a 10.1186/s40537-023-00765-w
|g Vol. 10, no. 1, p. 96
|0 PERI:(DE-600)2780218-8
|n 1
|p 96
|t Journal of Big Data
|v 10
|y 2023
|x 2196-1115
856 4 _ |u https://juser.fz-juelich.de/record/1008234/files/2230d3ad-8ad6-4eae-b64b-6d9010e4082d.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1008234
|p openaire
|p open_access
|p OpenAPC
|p driver
|p VDB
|p ec_fundedresources
|p openCost
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)180916
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)188268
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)188513
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)132239
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)165948
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5111
|x 0
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 1
914 1 _ |y 2023
915 p c |a APC keys set
|2 APC
|0 PC:(DE-HGF)0000
915 p c |a Local Funding
|2 APC
|0 PC:(DE-HGF)0001
915 p c |a DFG OA Publikationskosten
|2 APC
|0 PC:(DE-HGF)0002
915 p c |a DOAJ Journal
|2 APC
|0 PC:(DE-HGF)0003
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0160
|2 StatID
|b Essential Science Indicators
|d 2022-11-16
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
915 _ _ |a WoS
|0 StatID:(DE-HGF)0113
|2 StatID
|b Science Citation Index Expanded
|d 2022-11-16
915 _ _ |a Fees
|0 StatID:(DE-HGF)0700
|2 StatID
|d 2022-11-16
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Article Processing Charges
|0 StatID:(DE-HGF)0561
|2 StatID
|d 2022-11-16
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0501
|2 StatID
|b DOAJ Seal
|d 2023-05-02T09:11:15Z
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0500
|2 StatID
|b DOAJ
|d 2023-05-02T09:11:15Z
915 _ _ |a Peer Review
|0 StatID:(DE-HGF)0030
|2 StatID
|b DOAJ : Anonymous peer review
|d 2023-05-02T09:11:15Z
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b J BIG DATA-GER : 2022
|d 2023-10-26
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2023-10-26
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2023-10-26
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2023-10-26
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2023-10-26
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1160
|2 StatID
|b Current Contents - Engineering, Computing and Technology
|d 2023-10-26
915 _ _ |a IF >= 5
|0 StatID:(DE-HGF)9905
|2 StatID
|b J BIG DATA-GER : 2022
|d 2023-10-26
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 1 _ |a FullTexts
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a APC


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21