Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training

Penke, Carolin

TY  - CONF
AU  - Penke, Carolin
TI  - Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training
M1  - FZJ-2024-06889
PY  - 2024
AB  - The OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
T2  - LoRAINNe’24: workshop on LOw-Rank Approximations and their Interactions with Neural NEtworks
CY  - 26 Nov 2024 - 27 Nov 2024, Nancy (France)
Y2  - 26 Nov 2024 - 27 Nov 2024
M2  - Nancy, France
LB  - PUB:(DE-HGF)6
DO  - DOI:10.34734/FZJ-2024-06889
UR  - https://juser.fz-juelich.de/record/1034068
ER  -

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe