Communication FZJ-2022-06209

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
A mathematician's introduction to transformers and large language models



2022

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: The field of Natural Language Processing (NLP) has been undergoing a revolution in recent years. Large-scale language models (LLMs), most notably a series of Generative Pre-trained Transformers (GPTs), exceeded all expectations in benchmark scenarios and real life applications such as text generation, translation, question-answering and summarization. The engine of the NLP revolution is the so-called attention mechanism, which now allows to process longer sentences without 'forgetting' important words. This mechanism is implemented in form of a series of matrix products and lends itself to intense parallelization. The pre-training of transformers requires great computational resources and is one example of the increasing AI workload of large High Performance Computing (HPC) facilities. OpenGPT-X is a joint effort of 10 partners from science and industry to train and provide access to an open LLM based in Europe in order to guarantee digital and economic sovereignty. Within the project, the pre-training of the LLM is performed at the Jülich Supercomputing centre. This blog post aims to give an introduction to the state of current large language models, the OpenGPT-X project, and the transformer neural network architecture for people unfamiliar with the subject with a working knowledge of linear algebra.

Keyword(s): Workshop ; OpenGPTX


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2022
Click to display QR Code for this record

The record appears in these collections:
Document types > Other Resources > Communication
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2022-12-20, last modified 2022-12-21



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)