Novel Architecture Exploration - OpenGPT-X: Open Large Language Models

John, Chelsea Maria; Herten, Andreas
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{John:1018546,
      author       = {John, Chelsea Maria and Herten, Andreas},
      title        = {{N}ovel {A}rchitecture {E}xploration - {O}pen{GPT}-{X}:
                      {O}pen {L}arge {L}anguage {M}odels},
      reportid     = {FZJ-2023-04874},
      year         = {2023},
      abstract     = {The OpenGPT-X project is a German initiative with ten
                      collaborators to build, train, and deploy a multilingual
                      open-source language model. Models trained within the
                      project will be used for pilot cases by industry partners
                      and commercialized through the Gaia-X Federation. Due to the
                      substantial memory and compute resources required for
                      efficiently training large language models, high-performance
                      computing systems such as JUWELS Booster are essential. This
                      paper presents the results of the exploration of novel
                      hardware architecture conducted within the scope of the
                      project.},
      month         = {Nov},
      date          = {2023-11-12},
      organization  = {WHPC@SC23: 16th International Women in
                       HPC Workshop, Denver, Colorado (USA),
                       12 Nov 2023 - 17 Nov 2023},
      subtyp        = {Invited},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs)
                      and Research Groups (POF4-511) / 5121 - Supercomputing $\&$
                      Big Data Facilities (POF4-512) / OpenGPT-X - Aufbau eines
                      Gaia-X Knotens für große KI-Sprachmodelle und innovative
                      Sprachapplikations-Services; Teilvorhaben: Optimierung und
                      Skalierung auf großen HPC-Systemen (68GX21007F) /
                      ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)},
      pid          = {G:(DE-HGF)POF4-5112 / G:(DE-HGF)POF4-5121 /
                      G:(DE-Juel-1)68GX21007F / G:(DE-Juel-1)ATML-X-DEV},
      typ          = {PUB:(DE-HGF)6},
      doi          = {10.34734/FZJ-2023-04874},
      url          = {https://juser.fz-juelich.de/record/1018546},
}
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe