001049808 001__ 1049808
001049808 005__ 20260104202249.0
001049808 0247_ $$2doi$$a10.48550/ARXIV.2504.10013
001049808 037__ $$aFZJ-2025-05592
001049808 088__ $$2Other$$a2504.10013
001049808 1001_ $$0P:(DE-Juel1)192254$$aPenke, Carolin$$b0$$eCorresponding author
001049808 245__ $$aTraining LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
001049808 260__ $$barXiv$$c2025
001049808 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1767539413_14297
001049808 3367_ $$2ORCID$$aWORKING_PAPER
001049808 3367_ $$028$$2EndNote$$aElectronic Article
001049808 3367_ $$2DRIVER$$apreprint
001049808 3367_ $$2BibTeX$$aARTICLE
001049808 3367_ $$2DataCite$$aOutput Types/Working Paper
001049808 520__ $$aThe training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. This report presents best practices and insights gained from the OpenGPT-X project, a German initiative focused on developing open, multilingual LLMs optimized for European languages. We detail the use of high-performance computing (HPC) systems, primarily JUWELS Booster at JSC, for training Teuken-7B, a 7-billion-parameter transformer model. The report covers system architecture, training infrastructure, software choices, profiling and benchmarking tools, as well as engineering and operational challenges.
001049808 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x0
001049808 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x1
001049808 536__ $$0G:(DE-Juel-1)ATML-X-DEV$$aATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)$$cATML-X-DEV$$x2
001049808 536__ $$0G:(DE-Juel-1)68GX21007F$$aOpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F)$$c68GX21007F$$x3
001049808 588__ $$aDataset connected to DataCite
001049808 650_7 $$2Other$$aDistributed, Parallel, and Cluster Computing (cs.DC)
001049808 650_7 $$2Other$$aFOS: Computer and information sciences
001049808 650_7 $$2Other$$aC.4; I.2.11; I.2.7; K.6
001049808 7001_ $$0P:(DE-Juel1)187395$$aJohn, Chelsea Maria$$b1$$ufzj
001049808 7001_ $$0P:(DE-Juel1)187002$$aEbert, Jan$$b2$$ufzj
001049808 7001_ $$0P:(DE-Juel1)185654$$aKesselheim, Stefan$$b3$$ufzj
001049808 7001_ $$0P:(DE-Juel1)145478$$aHerten, Andreas$$b4$$ufzj
001049808 773__ $$a10.48550/ARXIV.2504.10013
001049808 8564_ $$uhttps://juser.fz-juelich.de/record/1049808/files/2504.10013v1.pdf$$yRestricted
001049808 909CO $$ooai:juser.fz-juelich.de:1049808$$pextern4vita
001049808 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192254$$aForschungszentrum Jülich$$b0$$kFZJ
001049808 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187395$$aForschungszentrum Jülich$$b1$$kFZJ
001049808 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187002$$aForschungszentrum Jülich$$b2$$kFZJ
001049808 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185654$$aForschungszentrum Jülich$$b3$$kFZJ
001049808 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145478$$aForschungszentrum Jülich$$b4$$kFZJ
001049808 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x0
001049808 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x1
001049808 920__ $$lyes
001049808 9801_ $$aEXTERN4VITA
001049808 980__ $$apreprint
001049808 980__ $$aEDITORS
001049808 980__ $$aI:(DE-Juel1)JSC-20090406