TY - CONF
AU - Penke, Carolin
TI - Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training
M1 - FZJ-2024-06889
PY - 2024
AB - The OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
T2 - LoRAINNe’24: workshop on LOw-Rank Approximations and their Interactions with Neural NEtworks
CY - 26 Nov 2024 - 27 Nov 2024, Nancy (France)
Y2 - 26 Nov 2024 - 27 Nov 2024
M2 - Nancy, France
LB - PUB:(DE-HGF)6
DO - DOI:10.34734/FZJ-2024-06889
UR - https://juser.fz-juelich.de/record/1034068
ER -