Parallel and Scalable Hyperparameter Optimization for Distributed Deep Learning Methods on High-Performance Computing Systems

Aach, Marcel
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{Aach:1043684,
      author       = {Aach, Marcel},
      title        = {{P}arallel and {S}calable {H}yperparameter {O}ptimization
                      for {D}istributed {D}eep {L}earning {M}ethods on
                      {H}igh-{P}erformance {C}omputing {S}ystems},
      school       = {University of Iceland},
      type         = {Dissertation},
      reportid     = {FZJ-2025-02982},
      isbn         = {978-9935-9807-8-6},
      pages        = {172p},
      year         = {2025},
      note         = {Additional Grant: Verbundprojekt: NXTAIM - NXT GEN
                      (01.01.2024-31.12.2026); Dissertation, University of
                      Iceland, 2025},
      abstract     = {The design of Deep Learning (DL) models is a complex task,
                      involving decisions on the general architecture of the model
                      (e.g., the number of layers of the Neural Network (NN)) and
                      on the optimization algorithms (e.g., the learning rate).
                      These so-called hyperparameters significantly influence the
                      performance (e.g., accuracy or error rates) of the final DL
                      model and are, therefore, of great importance. However,
                      optimizing these hyperparameters is a computationally
                      intensive process due to the necessity of evaluating many
                      combinations to identify the best-performing ones. Often,
                      the optimization is manually performed. This Ph.D. thesis
                      leverages the power of High-Performance Computing (HPC)
                      systems to perform automatic and efficient Hyperparameter
                      Optimization (HPO) for DL models that are trained on large
                      quantities of scientific data. On modern HPO systems,
                      equipped with a high number of Graphics Processing Units
                      (GPUs), it becomes possible to not only evaluate multiple
                      models with different hyperparameter combinations in
                      parallel but also to distribute the training of the models
                      themselves to multiple GPUs. State-of-the-art HPO methods,
                      based on the concepts of early stopping, have demonstrated
                      significant reductions in the runtime of the HPO process.
                      Their performance at scale, particularly in the context of
                      HPC environments and when applied to large scientific
                      datasets, has remained unexplored. This thesis thus
                      researches parallel and scalable HPO methods that leverage
                      new inherent capabilities of HPC systems and innovative
                      workflows incorporating novel computing paradigms. The
                      developed HPO methods are validated on different scientific
                      datasets ranging from the Computational Fluid Dynamics (CFD)
                      to Remote Sensing (RS) domain, spanning multiple hundred
                      Gigabytes (GBs) to several Terabytes (TBs) in size.},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {5111 - Domain-Specific Simulation $\&$ Data Life Cycle Labs
                      (SDLs) and Research Groups (POF4-511) / RAISE - Research on
                      AI- and Simulation-Based Engineering at Exascale (951733) /
                      nxtAIM - nxtAIM – NXT GEN AI Methods (19A23014l)},
      pid          = {G:(DE-HGF)POF4-5111 / G:(EU-Grant)951733 /
                      G:(BMWK)19A23014l},
      typ          = {PUB:(DE-HGF)3 / PUB:(DE-HGF)11},
      doi          = {10.34734/FZJ-2025-02982},
      url          = {https://juser.fz-juelich.de/record/1043684},
}
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe