Parallel and Scalable Deep Learning to Reconstruct Actuated Turbulent Boundary Layer Flows. Part II: Autoencoder Training on HPC Systems

Inanc, Eray; Sarma, Rakesh; Albers, Marian; Aach, Marcel; Schröder, Wolfgang; Lintermann, Andreas
001007693 001__ 1007693
001007693 005__ 20240226075322.0
001007693 0247_ $$2Handle$$a2128/34556
001007693 037__ $$aFZJ-2023-02167
001007693 041__ $$aEnglish
001007693 1001_ $$0P:(DE-Juel1)188268$$aInanc, Eray$$b0$$eCorresponding author$$ufzj
001007693 1112_ $$a33rd International Conference on Parallel Computational Fluid Dynamics$$cAlba$$d2022-05-25 - 2022-05-27$$gParCFD2022$$wItaly
001007693 245__ $$aParallel and Scalable Deep Learning to Reconstruct Actuated Turbulent Boundary Layer Flows. Part II: Autoencoder Training on HPC Systems
001007693 260__ $$c2022
001007693 300__ $$a4 pages
001007693 3367_ $$2ORCID$$aCONFERENCE_PAPER
001007693 3367_ $$033$$2EndNote$$aConference Paper
001007693 3367_ $$2BibTeX$$aINPROCEEDINGS
001007693 3367_ $$2DRIVER$$aconferenceObject
001007693 3367_ $$2DataCite$$aOutput Types/Conference Paper
001007693 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1687327324_18222
001007693 520__ $$aConvolutional autoencoders are trained on exceptionally large actuated turbulent boundary layer simulation data (8.3 TB) on the high-performance computer JUWELS at the J\"ulich Supercomputing Centre. The parallelization of the training is based on a distributed data-parallelism approach. This method relies on distributing the training dataset to multiple workers, where the trainable parameters of the convolutional autoencoder network are occasionally exchanged between the workers. This allows the training times to be drastically reduced - almost linear scaling performance can be achieved by increasing the number of workers (up to 2,048 GPUs). As a consequence of this increase, the total batch size also increases. This directly affects the training accuracy and hence, the quality of the trained network. The training error, computed between the reference and the reconstructed turbulent boundary layer fields, becomes larger when the number of workers is increased. This behavior needs to be taken care of especially when going to a large number of workers, i.e., a compromise between parallel speed and accuracy needs to be found.
001007693 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001007693 536__ $$0G:(EU-Grant)951733$$aRAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)$$c951733$$fH2020-INFRAEDI-2019-1$$x1
001007693 7001_ $$0P:(DE-HGF)0$$aAlbers, Marian$$b1
001007693 7001_ $$0P:(DE-Juel1)188513$$aSarma, Rakesh$$b2$$ufzj
001007693 7001_ $$0P:(DE-Juel1)180916$$aAach, Marcel$$b3$$ufzj
001007693 7001_ $$0P:(DE-HGF)0$$aSchröder, Wolfgang$$b4
001007693 7001_ $$0P:(DE-Juel1)165948$$aLintermann, Andreas$$b5$$ufzj
001007693 8564_ $$uhttps://juser.fz-juelich.de/record/1007693/files/2022_ParCFD_Abstract_Inanc.pdf$$yOpenAccess
001007693 909CO $$ooai:juser.fz-juelich.de:1007693$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
001007693 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188268$$aForschungszentrum Jülich$$b0$$kFZJ
001007693 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-HGF)0$$aRWTH Aachen$$b1$$kRWTH
001007693 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188513$$aForschungszentrum Jülich$$b2$$kFZJ
001007693 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180916$$aForschungszentrum Jülich$$b3$$kFZJ
001007693 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165948$$aForschungszentrum Jülich$$b5$$kFZJ
001007693 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001007693 9141_ $$y2023
001007693 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001007693 920__ $$lyes
001007693 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001007693 980__ $$acontrib
001007693 980__ $$aVDB
001007693 980__ $$aUNRESTRICTED
001007693 980__ $$aI:(DE-Juel1)JSC-20090406
001007693 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe