%0 Conference Paper
%A Penke, Carolin
%T Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training
%M FZJ-2024-06889
%D 2024
%X The OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
%B LoRAINNe’24: workshop on LOw-Rank Approximations and their Interactions with Neural NEtworks
%C 26 Nov 2024 - 27 Nov 2024, Nancy (France)
Y2 26 Nov 2024 - 27 Nov 2024
M2 Nancy, France
%F PUB:(DE-HGF)6
%9 Conference Presentation
%R 10.34734/FZJ-2024-06889
%U https://juser.fz-juelich.de/record/1034068