001     1049808
005     20260104202249.0
024 7 _ |a 10.48550/ARXIV.2504.10013
|2 doi
037 _ _ |a FZJ-2025-05592
088 _ _ |a 2504.10013
|2 Other
100 1 _ |a Penke, Carolin
|0 P:(DE-Juel1)192254
|b 0
|e Corresponding author
245 _ _ |a Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
260 _ _ |c 2025
|b arXiv
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1767539413_14297
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
520 _ _ |a The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. This report presents best practices and insights gained from the OpenGPT-X project, a German initiative focused on developing open, multilingual LLMs optimized for European languages. We detail the use of high-performance computing (HPC) systems, primarily JUWELS Booster at JSC, for training Teuken-7B, a 7-billion-parameter transformer model. The report covers system architecture, training infrastructure, software choices, profiling and benchmarking tools, as well as engineering and operational challenges.
536 _ _ |a 5122 - Future Computing & Big Data Systems (POF4-512)
|0 G:(DE-HGF)POF4-5122
|c POF4-512
|f POF IV
|x 0
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 1
536 _ _ |a ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)
|0 G:(DE-Juel-1)ATML-X-DEV
|c ATML-X-DEV
|x 2
536 _ _ |a OpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F)
|0 G:(DE-Juel-1)68GX21007F
|c 68GX21007F
|x 3
588 _ _ |a Dataset connected to DataCite
650 _ 7 |a Distributed, Parallel, and Cluster Computing (cs.DC)
|2 Other
650 _ 7 |a FOS: Computer and information sciences
|2 Other
650 _ 7 |a C.4; I.2.11; I.2.7; K.6
|2 Other
700 1 _ |a John, Chelsea Maria
|0 P:(DE-Juel1)187395
|b 1
|u fzj
700 1 _ |a Ebert, Jan
|0 P:(DE-Juel1)187002
|b 2
|u fzj
700 1 _ |a Kesselheim, Stefan
|0 P:(DE-Juel1)185654
|b 3
|u fzj
700 1 _ |a Herten, Andreas
|0 P:(DE-Juel1)145478
|b 4
|u fzj
773 _ _ |a 10.48550/ARXIV.2504.10013
856 4 _ |u https://juser.fz-juelich.de/record/1049808/files/2504.10013v1.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:1049808
|p extern4vita
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)192254
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)187395
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)187002
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)185654
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)145478
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-512
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Supercomputing & Big Data Infrastructures
|9 G:(DE-HGF)POF4-5122
|x 0
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 1
920 _ _ |l yes
980 1 _ |a EXTERN4VITA
980 _ _ |a preprint
980 _ _ |a EDITORS
980 _ _ |a I:(DE-Juel1)JSC-20090406


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21