Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods

Aach, Marcel; Sarma, Rakesh; Inanc, Eray; Riedel, Morris; Lintermann, Andreas
001022355 001__ 1022355
001022355 005__ 20240226075446.0
001022355 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-01461
001022355 037__ $$aFZJ-2024-01461
001022355 1001_ $$0P:(DE-Juel1)180916$$aAach, Marcel$$b0$$eCorresponding author$$ufzj
001022355 1112_ $$aSecond International Conference on Automated Machine Learning$$cPotsdam$$d2023-11-12 - 2023-11-15$$wGermany
001022355 245__ $$aOptimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods
001022355 260__ $$bPMLR$$c2023
001022355 300__ $$a12/1--17
001022355 3367_ $$2ORCID$$aCONFERENCE_PAPER
001022355 3367_ $$033$$2EndNote$$aConference Paper
001022355 3367_ $$2BibTeX$$aINPROCEEDINGS
001022355 3367_ $$2DRIVER$$aconferenceObject
001022355 3367_ $$2DataCite$$aOutput Types/Conference Paper
001022355 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1706860260_21171
001022355 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
001022355 4900_ $$aProceedings of Machine Learning Research
001022355 520__ $$aThe field of NAS has been significantly benefiting from the increased availability of parallel compute resources, as optimization algorithms typically require sampling and evaluating hundreds of model configurations. Consequently, to make use of these resources, the most commonly used early stopping-based NAS methods are suitable for running multiple trials in parallel. At the same time, also the training time of single model configurations can be reduced, e.g., by employing data-parallel training using multiple GPUs.   This paper investigates the optimal allocation of a fixed amount of parallel workers for conducting NAS. In practice, users have to decide if the computational resources are primarily used to assign more workers to the training of individual trials or to increase the number of trials executed in parallel. The first option accelerates the speed of the individual trials (exploitation) but reduces the parallelism of the NAS loop, whereas for the second option, the runtime of the trials is longer but a larger number of simultaneously processed trials in the NAS loop is achieved (exploration).   Our study encompasses both large- and small-scale scenarios, including tuning models in parallel on a single GPU, with data-parallel training on up to 16 GPUs, and measuring the scalability of NAS on up to 64 GPUs. Our empirical results using the HyperBand, Asynchronous Successive Halving, and Bayesian Optimization HyperBand methods offer valuable insights for users seeking to run NAS on both small and large computational budgets. By selecting the appropriate number of parallel evaluations, the NAS process can be accelerated by factors of ${\approx}$2–5 while preserving the test set accuracy compared to non-optimal resource allocations.}
001022355 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001022355 536__ $$0G:(EU-Grant)951733$$aRAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)$$c951733$$fH2020-INFRAEDI-2019-1$$x1
001022355 588__ $$aDataset connected to DataCite
001022355 7001_ $$0P:(DE-Juel1)188268$$aInanc, Eray$$b1$$ufzj
001022355 7001_ $$0P:(DE-Juel1)188513$$aSarma, Rakesh$$b2$$ufzj
001022355 7001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b3$$ufzj
001022355 7001_ $$0P:(DE-Juel1)165948$$aLintermann, Andreas$$b4$$ufzj
001022355 773__ $$v228
001022355 8564_ $$uhttps://juser.fz-juelich.de/record/1022355/files/aach23a.pdf$$yOpenAccess
001022355 8564_ $$uhttps://juser.fz-juelich.de/record/1022355/files/aach23a.gif?subformat=icon$$xicon$$yOpenAccess
001022355 8564_ $$uhttps://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001022355 8564_ $$uhttps://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001022355 8564_ $$uhttps://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001022355 909CO $$ooai:juser.fz-juelich.de:1022355$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
001022355 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180916$$aForschungszentrum Jülich$$b0$$kFZJ
001022355 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188268$$aForschungszentrum Jülich$$b1$$kFZJ
001022355 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188513$$aForschungszentrum Jülich$$b2$$kFZJ
001022355 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich$$b3$$kFZJ
001022355 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165948$$aForschungszentrum Jülich$$b4$$kFZJ
001022355 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001022355 9141_ $$y2023
001022355 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001022355 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001022355 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001022355 980__ $$acontrib
001022355 980__ $$aVDB
001022355 980__ $$aUNRESTRICTED
001022355 980__ $$acontb
001022355 980__ $$aI:(DE-Juel1)JSC-20090406
001022355 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe