001034059 001__ 1034059 001034059 005__ 20250109211141.0 001034059 037__ $$aFZJ-2024-06880 001034059 1001_ $$0P:(DE-Juel1)192254$$aPenke, Carolin$$b0$$eCorresponding author$$ufzj 001034059 1112_ $$aWomen in Data Science Conference Chemnitz$$cChemnitz$$d2024-06-06 - 2024-06-07$$wGermany 001034059 245__ $$aAn Introduction to Large Language Models 001034059 260__ $$c2024 001034059 3367_ $$033$$2EndNote$$aConference Paper 001034059 3367_ $$2DataCite$$aOther 001034059 3367_ $$2BibTeX$$aINPROCEEDINGS 001034059 3367_ $$2DRIVER$$aconferenceObject 001034059 3367_ $$2ORCID$$aLECTURE_SPEECH 001034059 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1736416151_16756$$xPlenary/Keynote 001034059 520__ $$aLarge Language Models (LLMs) have revolutionized the field of artificial intelligence, enabling advanced text generation and understanding. This talk provides a concise overview of LLMs, focusing on their development, architecture, and implementation. We explain key concepts, and give details on the backbone of modern LLMs: the transformer architecture and its innovative attention mechanism. To be able to train these models on supercomputers, advanced parallelization techniques are needed. Recent advancements and promising trends are identified. Through the lens of the OpenGPT-X project, this presentation will highlight the collaborative efforts in developing multilingual, open-source LLMs. 001034059 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0 001034059 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x1 001034059 536__ $$0G:(DE-Juel-1)68GX21007F$$aOpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F)$$c68GX21007F$$x2 001034059 536__ $$0G:(DE-Juel-1)JuWinHPC$$aJuWinHPC - Jülich Women in HPC (JuWinHPC)$$cJuWinHPC$$x3 001034059 8564_ $$uhttps://juser.fz-juelich.de/record/1034059/files/IntroToLLMS_WiDS.pdf$$yRestricted 001034059 909CO $$ooai:juser.fz-juelich.de:1034059$$pVDB 001034059 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192254$$aForschungszentrum Jülich$$b0$$kFZJ 001034059 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0 001034059 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x1 001034059 9141_ $$y2024 001034059 920__ $$lyes 001034059 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 001034059 980__ $$aconf 001034059 980__ $$aVDB 001034059 980__ $$aI:(DE-Juel1)JSC-20090406 001034059 980__ $$aUNRESTRICTED