Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods

Aach, Marcel; Sarma, Rakesh; Inanc, Eray; Riedel, Morris; Lintermann, Andreas

Items
Marc 21

001			1022355
005			20240226075446.0
024	7	_	\|a 10.34734/FZJ-2024-01461 \|2 datacite_doi
037	_	_	\|a FZJ-2024-01461
100	1	_	\|a Aach, Marcel \|0 P:(DE-Juel1)180916 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a Second International Conference on Automated Machine Learning \|c Potsdam \|d 2023-11-12 - 2023-11-15 \|w Germany
245	_	_	\|a Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods
260	_	_	\|c 2023 \|b PMLR
300	_	_	\|a 12/1--17
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1706860260_21171 \|2 PUB:(DE-HGF)
336	7	_	\|a Contribution to a book \|0 PUB:(DE-HGF)7 \|2 PUB:(DE-HGF) \|m contb
490	0	_	\|a Proceedings of Machine Learning Research
520	_	_	\|a The field of NAS has been significantly benefiting from the increased availability of parallel compute resources, as optimization algorithms typically require sampling and evaluating hundreds of model configurations. Consequently, to make use of these resources, the most commonly used early stopping-based NAS methods are suitable for running multiple trials in parallel. At the same time, also the training time of single model configurations can be reduced, e.g., by employing data-parallel training using multiple GPUs. This paper investigates the optimal allocation of a fixed amount of parallel workers for conducting NAS. In practice, users have to decide if the computational resources are primarily used to assign more workers to the training of individual trials or to increase the number of trials executed in parallel. The first option accelerates the speed of the individual trials (exploitation) but reduces the parallelism of the NAS loop, whereas for the second option, the runtime of the trials is longer but a larger number of simultaneously processed trials in the NAS loop is achieved (exploration). Our study encompasses both large- and small-scale scenarios, including tuning models in parallel on a single GPU, with data-parallel training on up to 16 GPUs, and measuring the scalability of NAS on up to 64 GPUs. Our empirical results using the HyperBand, Asynchronous Successive Halving, and Bayesian Optimization HyperBand methods offer valuable insights for users seeking to run NAS on both small and large computational budgets. By selecting the appropriate number of parallel evaluations, the NAS process can be accelerated by factors of ${\approx}$2–5 while preserving the test set accuracy compared to non-optimal resource allocations.}
536	_	_	\|a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5111 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a RAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733) \|0 G:(EU-Grant)951733 \|c 951733 \|f H2020-INFRAEDI-2019-1 \|x 1
588	_	_	\|a Dataset connected to DataCite
700	1	_	\|a Inanc, Eray \|0 P:(DE-Juel1)188268 \|b 1 \|u fzj
700	1	_	\|a Sarma, Rakesh \|0 P:(DE-Juel1)188513 \|b 2 \|u fzj
700	1	_	\|a Riedel, Morris \|0 P:(DE-Juel1)132239 \|b 3 \|u fzj
700	1	_	\|a Lintermann, Andreas \|0 P:(DE-Juel1)165948 \|b 4 \|u fzj
773	_	_	\|v 228
856	4	_	\|y OpenAccess \|u https://juser.fz-juelich.de/record/1022355/files/aach23a.pdf
856	4	_	\|y OpenAccess \|x icon \|u https://juser.fz-juelich.de/record/1022355/files/aach23a.gif?subformat=icon
856	4	_	\|y OpenAccess \|x icon-1440 \|u https://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-1440
856	4	_	\|y OpenAccess \|x icon-180 \|u https://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-180
856	4	_	\|y OpenAccess \|x icon-640 \|u https://juser.fz-juelich.de/record/1022355/files/aach23a.jpg?subformat=icon-640
909	C	O	\|o oai:juser.fz-juelich.de:1022355 \|p openaire \|p open_access \|p driver \|p VDB \|p ec_fundedresources \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)180916
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)188268
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)188513
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)132239
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 4 \|6 P:(DE-Juel1)165948
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5111 \|x 0
914	1	_	\|y 2023
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
915	_	_	\|a Creative Commons Attribution CC BY 4.0 \|0 LIC:(DE-HGF)CCBY4 \|2 HGFVOC
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a contrib
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a contb
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help